Hack The Garden 05/2026 Wrap Up β
- ποΈ Date: 04.05.2026 β 08.05.2026
- π Location: Schlosshof Freizeitheim, Schelklingen
- π€ Organizer: x-cellent
- π Topics: hackathon#72
- π€ Review Meeting Summary: https://gardener.cloud/community/review-meetings/2026-reviews/#_2026-05-13-hack-the-garden-wrap-up


π± Complete the ManagedSeedSet Implementation (#52) β
Tracking: hackathon#52
Problem Statement β
The ManagedSeedSet proposal has never been fully implemented. The existing implementation supports creating Shoots and ManagedSeeds but not updating them. A complete implementation would simplify operations and lay the groundwork for more automated setups such as a "seed autoscaler".
Outcome β
After implementing and evaluating a WIP controller (branch: timebertt/gardener:managedseedset), the team concluded that completing and maintaining ManagedSeedSet is not worth the effort, and proposed to drop it from Gardener entirely. The key reasons:
- Keeping gardenlet/
Seedconfig in sync across all set members is either impossible or very hard to do reliably β gardenlet version propagation via the parent gardenlet bypasses any controller trying to keep members uniform. ManagedSeedautoscaling is infrequent in practice; production landscapes prefer declarative, manual seed management.- Staging feature rollouts across a
ManagedSeedSetusing partitions remains difficult due to config drift. - The feature is partially implemented with no known production users, and full implementation is not considered worth the maintenance cost.
Next steps β
If no one from the community strongly disagrees with this decision, we would go ahead and remove the API in the near future.
π Improve Debugability of Failed Node Joins (#68) β
Tracking: hackathon#68
Problem Statement β
Users have limited visibility into node join failures. The gardener-node-agent bootstrap process uses StandardOutput=journal+console in its systemd unit file, but this only covers the download and initial configuration file writing phase. Once the agent attempts to connect to the cluster's API server, logs are no longer written to the console β leaving operators with no choice but to SSH into the node to diagnose failures.
Achievements β
Both proposed solutions were implemented: the gardener-node-agent now performs a connection test during bootstrap, and fatal errors after bootstrapping are also written to the node's console log.
Code & Pull Requests β
π Add Support for Virtual Garden to ACL Extension (#47) β
Tracking: hackathon#47
Problem Statement β
With gardener/gardener#14420 it became possible to handle client IP addresses for the virtual garden cluster. The ACL extension should be extended to support IP allowlisting for this scenario as well.
Achievements β
- Implemented a basic extension integration for the virtual garden scenario.
- Identified that the virtual garden API server domain needs to be exposed in the
Gardenstatus for easy retrieval by the extension β a WIP change was opened: hown3d/gardener:garden-advertised-addresses.
Next Steps β
- Reconcile the extension when a new
ManagedSeedis added to update theEnvoyFilter.
Code & Pull Requests β
π‘οΈ Replace OpenVPN with WireGuard (#70) β
Tracking: hackathon#70
Problem Statement β
The Gardener VPN implementation between control plane and data plane uses OpenVPN. This topic, continued from the June 2025 hackathon, aims to complete the replacement with WireGuard using wpex (a WireGuard multiplexer sidecar) to route connections from a single Istio ingress to the correct vpn-seed-server per shoot.
Achievements β
- Multiple shoots can now be created with the WireGuard-based setup.
- Istio Ingress can be scaled to more than one replica.
wpexsidecar correctly multiplexes incoming connections: the first packet is forwarded to all configuredvpn-seed-servers, each of which either answers or silently ignores based on the public key.NetworkPolicys are managed with appropriate labels onistio-ingressandvpn-seed-server.- A related bug was found and fixed along the way.
Code & Pull Requests β
- Branch: metal-stack/gardener:wireguard-3
- Branch: metal-stack/vpn2:wireguard-2
- Branch: majst01/wpex:go-1.26
- Fix related VPN bug β gardener/gardener#14776
π Make Internal Domain Optional/Mutable (#53) β
Tracking: hackathon#53
Problem Statement β
Every Gardener environment requires a DNS zone for managing shoot "internal domains". Some setups (e.g., those without customer-configured external domains, or with private DNS requirements) would benefit from making the internal domain optional per shoot, mutable after creation, or changeable across all shoots on a seed.
Achievements β
Three use cases were scoped and designed during the hackathon:
- Make internal domain optional per shoot β A shoot can be created without an internal domain using a new
Shoot.spec.dns.internalDomain.enabledfield. Disabling the internal domain of an existing shoot requires triggering a CA rotation in the same request, coordinated across node rollout andDNSRecordlifecycle. - Change the external domain of a shoot β Switching external domains is modelled as a two-phase CA rotation: the new domain is added to certificates and DNS records in the preparing phase; the old domain is removed in the completing phase.
- Change the internal domain of all shoots on a seed β The seed spec would carry a list of internal domains; a CA rotation initiated by the shoot owner migrates the shoot to the new domain.
A WIP branch implementing the first use case was developed: timebertt/gardener:internal-domain-optional.
Next Steps β
- The STACKIT team plans to open an enhancement proposal covering all three use cases in detail.
πΏ [GEP-28] Experiment with shoot/shoot Controller in Self-Hosted Shoot Clusters (#45) β
TIP
You can find out more about Self-Hosted Shoot Clusters in GEP-28.
Tracking: hackathon#45
Problem Statement β
Almost all gardenlet controllers are deployed during gardenadm connect, but the critical shoot/shoot controller is not yet included. Without it, many required components are not deployed to self-hosted shoot clusters. The challenge is how to reuse the existing hosted Shoot flow tasks without duplicating maintenance.
Achievements β
- Analyzed and implemented the necessary changes to run the
gardenadm initflow insidegardenlet'sShootcontroller. - Augmented the
flowpackage to supportTaskGroups for reusable flow tasks. - Evaluated an approach to harmonize
gardenadm init/bootstrapandshoot/shootflows. - Moved self-hosted shoot specific logic from
GardenadmBotanisttoBotanistand extracted shared flow tasks.
Code & Pull Requests β
π [GEP-28] Implement Public CA Bundle Discovery Mechanism (#15) β
Tracking: hackathon#15
Problem Statement β
When running gardenadm join or gardenadm connect, the CA bundle of the cluster must currently be provided via command line arguments. GEP-28 describes a kubeadm-inspired discovery mechanism using "discovery tokens" that allows securely obtaining the CA bundle without pre-sharing it.
Achievements β
gardenadm initnow publisheskube-public/cluster-info(a kubeconfig with cluster CA and endpoint, no credentials) signed by thebootstrap-signercontroller via bootstrap tokens.- A path-scoped
AuthenticationConfigurationis injected on the shoot's kube-apiserver, enabling anonymous access only on the cluster-info endpoint. gardenadm join/connectgained a new--discovery-token-ca-cert-hash sha256:<64-hex>flag; theDiscoverflow performs an insecure fetch, verifies the JWS signature using the bootstrap token secret, verifies the CA matches the supplied SPKI pin, then re-fetches over verified TLS.gardenadm token create --print-join-commandnow emits--discovery-token-ca-cert-hashinstead of the full base64-encoded CA bundle.- On the
gardener-operatorside, a newGarden.Spec.VirtualCluster.Kubernetes.KubeAPIServer.EnableBootstrapDiscoveryfield was added; when enabled, the same cluster-info ConfigMap and RBAC setup is published on the virtual garden cluster. - Unit tests and an e2e test (unmanaged infra scenario) asserting that
gardenadm joinrejects a bad pin.
Next Steps β
- Ensure
AuthenticationConfigurationfor the virtual garden uses the same merging logic as for shoots. - Run an end-to-end smoke test:
gardenadm connectagainst a realGardenwithEnableBootstrapDiscovery: true. - Restructure WIP commits and open PRs against
gardener/gardener.
Code & Pull Requests β
π [GEP-28] SelfHostedShootExposure in Cilium Extension (#46) β
Tracking: hackathon#46
Problem Statement β
GEP-36 introduced the SelfHostedShootExposure API for exposing the API server of self-hosted shoots. This topic investigates implementing that API in the Cilium extension using Cilium LB IPAM to allocate virtual IPs and advertise them via L2 Announcements or BGP.
Achievements β
- Identified a connectivity issue: when the API server rolls over to manage itself, Cilium loses its connection to the default Kubernetes service cluster IP.
- Resolved the issue by configuring Cilium to connect to the API server via localhost using a
CiliumNodeConfigresource targeting only control plane nodes. - Confirmed that after reconnecting, Cilium reconciles
NetworkPolicys correctly. - Implemented the extension API using L2 Announcements
Next Steps β
- Consider other exposure possibilities with Cilium, like BGP
π€ [GEP-28] Support Joining Control Plane Nodes in Managed Infrastructure (#54) β
Tracking: hackathon#54
Problem Statement β
gardenadm join was extended to support joining control plane nodes by generating node-specific etcd certificates and writing them to the OperatingSystemConfig (OSC). This works in the unmanaged infrastructure scenario, but in managed infrastructure (orchestrated via machine-controller-manager), several conceptual issues arise: node-specific etcd assets cannot be provisioned before the node's IP is known, requiring dynamic OSC updates after node creation and coordination between multiple controllers.
Achievements β
- Identified the fundamental issues with the current OSC-based approach in managed infra: dynamic OSC contents are complicated to orchestrate, etcd members are rolled additional times, and control plane nodes cannot become ready when
gardenletis unavailable. - Designed a target architecture where the OSC contains no node-specific or dynamic content: CA secrets (including private keys) are pushed via OSC, and
backup-restoregenerates node-specific certificates on startup.etcd-druid/backup-restoredynamically manage the etcd member list without needingEtcd.spec.externallyManagedMemberAddresses.
Next Steps β
- Implement the target picture:
backup-restoregenerates node-specific certificates from CA secrets in the OSC on startup. - Adapt CA rotation: preparing phase pushes new CA bundle to all nodes via OSC and waits; completing phase removes the old CA.
- Remove
Etcd.spec.externallyManagedMemberAddressesfor self-hosted shoots once etcd-druid/backup-restore handle member management dynamically.
Code & Pull Requests β
βοΈ [GEP-28] Run Garden and Seed in Self-Hosted Shoot Cluster on Managed Infrastructure (#55) β
Tracking: hackathon#55
Problem Statement β
Running gardener-operator and a Seed inside a self-hosted shoot cluster was already demonstrated for the unmanaged infrastructure scenario during the March 2026 hackathon. This topic extends that work to the managed infrastructure scenario, where nodes are provisioned via machine-controller-manager.
Achievements β
- Continued from rfranzke/gardener:gep28/promote-shs-as-seed and got a
Shootfully created in the managed infra local setup. - Added
make ginkind-up SCENARIO=fullto automate all required steps to reach a healthy seed. - Adjusted validations in
provider-local: allow akubeconfigsecret in the provider secret; validate that a shoot uses the seed'spodCIDRas itsnodeCIDR. - Worked around overlay-fs nesting limitations by switching to native snapshotting for double-nested machine pods.
- Adjusted
NetworkPolicys to allow traffic to registries. - As a side-track, explored running local provider "machines" directly as Docker containers to avoid nesting issues and enable control plane migrations without machine recreation β a first successful local control plane migration without workload downtime was completed.
Next Steps β
- Merge prerequisite gardener/gardener#14747, then clean up and open a PR for the managed infra scenario.
- Propose the
machine-provider-local:docker-modechange separately.
Code & Pull Requests β
- Branch: oliver-goetz/gardener:hack/operator-seed-managed-infra
- Branch: oliver-goetz/gardener:hack/provider-local-docker
ποΈ Allow Admins to Easily Use a Viewer Kubeconfig by Default (#71) β
Tracking: hackathon#71
Problem Statement β
When using gardenctl, admins always receive an admin kubeconfig when connecting to shoots or seeds. There is no way to default to a viewer kubeconfig β reducing the blast radius of accidental writes while browsing clusters.
Achievements β
- Implemented a new
defaultKubeconfigAccessLevelconfiguration field per garden in thegardenctlconfig, supportingadmin,viewer, andautovalues for bothshootsandmanagedSeeds. - Added convenience support:
gardenctl config set-gardencan modify the access level and automatically refreshes the symlinked kubeconfig when the current garden's access level changes.
Code & Pull Requests β
π Stage confineSpecUpdateRollout Changes in Annotation (#64) β
Tracking: hackathon#64
Problem Statement β
With confineSpecUpdateRollout enabled, spec changes to a Shoot are written directly to .spec but only reconciled during the next maintenance window. This means .spec reflects a desired future state rather than what is currently running β confusing for users and tooling alike.
Achievements β
- Demonstrated an approach where an admission plugin blocks direct user updates to
.specwhenconfineSpecUpdateRolloutis set, staging the desired changes in a ConfigMap instead. - At the start of the maintenance window, the staged state is applied to
.specand reconciliation proceeds as usual. - This keeps
.specas a faithful representation of the currently running state, while making staged changes inspectable and cancellable.
πΎ GardenState Resource for Automated Garden Cluster Disaster Recovery (#44) β
Tracking: hackathon#44
Problem Statement β
Recovering a destroyed Garden cluster today is fully manual: operators must know which secrets to back up (CA certs, etcd encryption keys, ServiceAccount signing keys), where to find them, and how to restore the Garden resource's status. There is no standardised, machine-readable snapshot, making disaster recovery error-prone.
Achievements β
- After considering a new CRD, settled on encoding the
GardenStateas a KubernetesSecret(labeloperator.gardener.cloud/purpose: garden-state) in the runtime cluster'sgardennamespace, containing a JSON-encoded snapshot. Choosing a Secret over a CRD means it is available before CRDs are installed, trivially extractable withkubectl, and easy to back up externally. - The
GardenStatetype (modelled afterShootState) captures:persist=true-labelled secrets (CA certs, etcd encryption key, SA signing key),Gardenmetadata and spec, extension state fromDNSRecord/BackupEntry/ExtensionCRDs, and objects referenced by extension state. The Garden UID is persisted explicitly to preserve the etcd backup bucket name. - A bootstrapper was added to
gardener-operatorthat detects agarden-statesecret without aGardenresource on startup and initiates the restore: restores secrets, recreates theGardenresource, and patches its status type toRestore. - After successful
Gardenreconciliation, thegarden-statesecret is updated.
Next Steps β
- Draft a GEP and present it in a TSC meeting.
- Implement and wire up e2e tests.
- Open a PR against
gardener/gardener.
Code & Pull Requests β
π Separately Encrypt etcd Backups (#69) β
Tracking: hackathon#69
Problem Statement β
etcd backups for all shoots on a seed are stored in a shared bucket. In case of a control plane compromise, a shoot could read the backups of other shoots on the same seed. A shoot-specific encryption key for each backup would limit the blast radius to the compromised shoot's own data.
Achievements β
- Implemented a prototype across three repositories: gardener generates a new shoot-specific encryption secret (persisted in
ShootState), etcd-druid deploysetcd-backup-restorewith anEncryptionConfigfile (analogous to kube-apiserver encryption at rest), andetcd-backup-restoreimplements an AESGCM encryption package with full key rotation support. - Key rotation works via an encrypted keyring file stored alongside the backups β the keyring contains all keys used over time and is always re-encrypted with the latest key on startup.
- The encryption package is wired into the backup and restore flows.
Next Steps β
- More extensive testing.
- Hide behind a feature gate.
- Support additional encryption provider types and provider type migration.
Code & Pull Requests β
- Branch: metal-stack/gardener:etcd-backup-encryption
- Branch: Gerrit91/etcd-druid:etcd-backup-encryption
- Branch: RadaBDimitrova/etcd-backup-restore:add-backup-encryption
β‘ Reduce Secret Watch Pressure by Splitting ManagedResource Data (#61) β
Tracking: hackathon#61
Problem Statement β
gardener-resource-manager stores all rendered manifests in Secret objects referenced by ManagedResource.spec.secretRefs, even when those manifests contain no sensitive data. This causes elevated memory usage in the informer cache and unnecessary etcd watch pressure on seeds.
Achievements β
Initial measurements on production-like seed clusters confirmed the problem:
| Cluster | Shoots | ManagedResources | Secrets | In-memory size |
|---|---|---|---|---|
| 1 | 193 | 11,753 | 12,453 | 204.90 MB |
| 2 | 272 | 15,605 | 16,374 | 281.22 MB |
On average, ManagedResource secret data accounts for more than half of all secret data on seed clusters, roughly 1 MB per shoot control plane.
A sample implementation was developed introducing a new ManagedResourceData CRD for non-sensitive manifests. The ManagedResource spec was extended with dataRefs alongside the existing secretRefs. The split is determined at write time β Secret-kind manifests stay in Secret objects; everything else moves to ManagedResourceData. In a local test with a single shoot, the vast majority of manifests (38 out of 51 in the shoot control plane) moved out of Secrets.
Next Steps β
- Decide on the
ManagedResourceDataschema (spec/status fields vs. flat key/value). - Replace the fragile key-name-based classification with an explicit interface so callers declare intent.
- Identify which controllers still watch
Secrets and measure whether the feature remains worthwhile. - Design the garbage collection strategy for
ManagedResourceData(currently mutable, unlike the immutable hashed Secrets).
Code & Pull Requests β
