Skip to content

Hack The Garden 05/2026 Wrap Up ​

🌱 Complete the ManagedSeedSet Implementation (#52) ​

Tracking: hackathon#52

Problem Statement ​

The ManagedSeedSet proposal has never been fully implemented. The existing implementation supports creating Shoots and ManagedSeeds but not updating them. A complete implementation would simplify operations and lay the groundwork for more automated setups such as a "seed autoscaler".

Outcome ​

After implementing and evaluating a WIP controller (branch: timebertt/gardener:managedseedset), the team concluded that completing and maintaining ManagedSeedSet is not worth the effort, and proposed to drop it from Gardener entirely. The key reasons:

  • Keeping gardenlet/Seed config in sync across all set members is either impossible or very hard to do reliably β€” gardenlet version propagation via the parent gardenlet bypasses any controller trying to keep members uniform.
  • ManagedSeed autoscaling is infrequent in practice; production landscapes prefer declarative, manual seed management.
  • Staging feature rollouts across a ManagedSeedSet using partitions remains difficult due to config drift.
  • The feature is partially implemented with no known production users, and full implementation is not considered worth the maintenance cost.

Next steps ​

If no one from the community strongly disagrees with this decision, we would go ahead and remove the API in the near future.

πŸ” Improve Debugability of Failed Node Joins (#68) ​

Tracking: hackathon#68

Problem Statement ​

Users have limited visibility into node join failures. The gardener-node-agent bootstrap process uses StandardOutput=journal+console in its systemd unit file, but this only covers the download and initial configuration file writing phase. Once the agent attempts to connect to the cluster's API server, logs are no longer written to the console β€” leaving operators with no choice but to SSH into the node to diagnose failures.

Achievements ​

Both proposed solutions were implemented: the gardener-node-agent now performs a connection test during bootstrap, and fatal errors after bootstrapping are also written to the node's console log.

Code & Pull Requests ​

πŸ”’ Add Support for Virtual Garden to ACL Extension (#47) ​

Tracking: hackathon#47

Problem Statement ​

With gardener/gardener#14420 it became possible to handle client IP addresses for the virtual garden cluster. The ACL extension should be extended to support IP allowlisting for this scenario as well.

Achievements ​

  • Implemented a basic extension integration for the virtual garden scenario.
  • Identified that the virtual garden API server domain needs to be exposed in the Garden status for easy retrieval by the extension β€” a WIP change was opened: hown3d/gardener:garden-advertised-addresses.

Next Steps ​

  • Reconcile the extension when a new ManagedSeed is added to update the EnvoyFilter.

Code & Pull Requests ​

πŸ›‘οΈ Replace OpenVPN with WireGuard (#70) ​

Tracking: hackathon#70

Problem Statement ​

The Gardener VPN implementation between control plane and data plane uses OpenVPN. This topic, continued from the June 2025 hackathon, aims to complete the replacement with WireGuard using wpex (a WireGuard multiplexer sidecar) to route connections from a single Istio ingress to the correct vpn-seed-server per shoot.

Achievements ​

  • Multiple shoots can now be created with the WireGuard-based setup.
  • Istio Ingress can be scaled to more than one replica.
  • wpex sidecar correctly multiplexes incoming connections: the first packet is forwarded to all configured vpn-seed-servers, each of which either answers or silently ignores based on the public key.
  • NetworkPolicys are managed with appropriate labels on istio-ingress and vpn-seed-server.
  • A related bug was found and fixed along the way.

Code & Pull Requests ​

🌐 Make Internal Domain Optional/Mutable (#53) ​

Tracking: hackathon#53

Problem Statement ​

Every Gardener environment requires a DNS zone for managing shoot "internal domains". Some setups (e.g., those without customer-configured external domains, or with private DNS requirements) would benefit from making the internal domain optional per shoot, mutable after creation, or changeable across all shoots on a seed.

Achievements ​

Three use cases were scoped and designed during the hackathon:

  1. Make internal domain optional per shoot β€” A shoot can be created without an internal domain using a new Shoot.spec.dns.internalDomain.enabled field. Disabling the internal domain of an existing shoot requires triggering a CA rotation in the same request, coordinated across node rollout and DNSRecord lifecycle.
  2. Change the external domain of a shoot β€” Switching external domains is modelled as a two-phase CA rotation: the new domain is added to certificates and DNS records in the preparing phase; the old domain is removed in the completing phase.
  3. Change the internal domain of all shoots on a seed β€” The seed spec would carry a list of internal domains; a CA rotation initiated by the shoot owner migrates the shoot to the new domain.

A WIP branch implementing the first use case was developed: timebertt/gardener:internal-domain-optional.

Next Steps ​

  • The STACKIT team plans to open an enhancement proposal covering all three use cases in detail.

🌿 [GEP-28] Experiment with shoot/shoot Controller in Self-Hosted Shoot Clusters (#45) ​

TIP

You can find out more about Self-Hosted Shoot Clusters in GEP-28.

Tracking: hackathon#45

Problem Statement ​

Almost all gardenlet controllers are deployed during gardenadm connect, but the critical shoot/shoot controller is not yet included. Without it, many required components are not deployed to self-hosted shoot clusters. The challenge is how to reuse the existing hosted Shoot flow tasks without duplicating maintenance.

Achievements ​

  • Analyzed and implemented the necessary changes to run the gardenadm init flow inside gardenlet's Shoot controller.
  • Augmented the flow package to support TaskGroups for reusable flow tasks.
  • Evaluated an approach to harmonize gardenadm init/bootstrap and shoot/shoot flows.
  • Moved self-hosted shoot specific logic from GardenadmBotanist to Botanist and extracted shared flow tasks.

Code & Pull Requests ​

πŸ”‘ [GEP-28] Implement Public CA Bundle Discovery Mechanism (#15) ​

Tracking: hackathon#15

Problem Statement ​

When running gardenadm join or gardenadm connect, the CA bundle of the cluster must currently be provided via command line arguments. GEP-28 describes a kubeadm-inspired discovery mechanism using "discovery tokens" that allows securely obtaining the CA bundle without pre-sharing it.

Achievements ​

  • gardenadm init now publishes kube-public/cluster-info (a kubeconfig with cluster CA and endpoint, no credentials) signed by the bootstrap-signer controller via bootstrap tokens.
  • A path-scoped AuthenticationConfiguration is injected on the shoot's kube-apiserver, enabling anonymous access only on the cluster-info endpoint.
  • gardenadm join/connect gained a new --discovery-token-ca-cert-hash sha256:<64-hex> flag; the Discover flow performs an insecure fetch, verifies the JWS signature using the bootstrap token secret, verifies the CA matches the supplied SPKI pin, then re-fetches over verified TLS.
  • gardenadm token create --print-join-command now emits --discovery-token-ca-cert-hash instead of the full base64-encoded CA bundle.
  • On the gardener-operator side, a new Garden.Spec.VirtualCluster.Kubernetes.KubeAPIServer.EnableBootstrapDiscovery field was added; when enabled, the same cluster-info ConfigMap and RBAC setup is published on the virtual garden cluster.
  • Unit tests and an e2e test (unmanaged infra scenario) asserting that gardenadm join rejects a bad pin.

Next Steps ​

  • Ensure AuthenticationConfiguration for the virtual garden uses the same merging logic as for shoots.
  • Run an end-to-end smoke test: gardenadm connect against a real Garden with EnableBootstrapDiscovery: true.
  • Restructure WIP commits and open PRs against gardener/gardener.

Code & Pull Requests ​

🐝 [GEP-28] SelfHostedShootExposure in Cilium Extension (#46) ​

Tracking: hackathon#46

Problem Statement ​

GEP-36 introduced the SelfHostedShootExposure API for exposing the API server of self-hosted shoots. This topic investigates implementing that API in the Cilium extension using Cilium LB IPAM to allocate virtual IPs and advertise them via L2 Announcements or BGP.

Achievements ​

  • Identified a connectivity issue: when the API server rolls over to manage itself, Cilium loses its connection to the default Kubernetes service cluster IP.
  • Resolved the issue by configuring Cilium to connect to the API server via localhost using a CiliumNodeConfig resource targeting only control plane nodes.
  • Confirmed that after reconnecting, Cilium reconciles NetworkPolicys correctly.
  • Implemented the extension API using L2 Announcements

Next Steps ​

  • Consider other exposure possibilities with Cilium, like BGP

🀝 [GEP-28] Support Joining Control Plane Nodes in Managed Infrastructure (#54) ​

Tracking: hackathon#54

Problem Statement ​

gardenadm join was extended to support joining control plane nodes by generating node-specific etcd certificates and writing them to the OperatingSystemConfig (OSC). This works in the unmanaged infrastructure scenario, but in managed infrastructure (orchestrated via machine-controller-manager), several conceptual issues arise: node-specific etcd assets cannot be provisioned before the node's IP is known, requiring dynamic OSC updates after node creation and coordination between multiple controllers.

Achievements ​

  • Identified the fundamental issues with the current OSC-based approach in managed infra: dynamic OSC contents are complicated to orchestrate, etcd members are rolled additional times, and control plane nodes cannot become ready when gardenlet is unavailable.
  • Designed a target architecture where the OSC contains no node-specific or dynamic content: CA secrets (including private keys) are pushed via OSC, and backup-restore generates node-specific certificates on startup. etcd-druid/backup-restore dynamically manage the etcd member list without needing Etcd.spec.externallyManagedMemberAddresses.

Next Steps ​

  • Implement the target picture: backup-restore generates node-specific certificates from CA secrets in the OSC on startup.
  • Adapt CA rotation: preparing phase pushes new CA bundle to all nodes via OSC and waits; completing phase removes the old CA.
  • Remove Etcd.spec.externallyManagedMemberAddresses for self-hosted shoots once etcd-druid/backup-restore handle member management dynamically.

Code & Pull Requests ​

βš™οΈ [GEP-28] Run Garden and Seed in Self-Hosted Shoot Cluster on Managed Infrastructure (#55) ​

Tracking: hackathon#55

Problem Statement ​

Running gardener-operator and a Seed inside a self-hosted shoot cluster was already demonstrated for the unmanaged infrastructure scenario during the March 2026 hackathon. This topic extends that work to the managed infrastructure scenario, where nodes are provisioned via machine-controller-manager.

Achievements ​

  • Continued from rfranzke/gardener:gep28/promote-shs-as-seed and got a Shoot fully created in the managed infra local setup.
  • Added make ginkind-up SCENARIO=full to automate all required steps to reach a healthy seed.
  • Adjusted validations in provider-local: allow a kubeconfig secret in the provider secret; validate that a shoot uses the seed's podCIDR as its nodeCIDR.
  • Worked around overlay-fs nesting limitations by switching to native snapshotting for double-nested machine pods.
  • Adjusted NetworkPolicys to allow traffic to registries.
  • As a side-track, explored running local provider "machines" directly as Docker containers to avoid nesting issues and enable control plane migrations without machine recreation β€” a first successful local control plane migration without workload downtime was completed.

Next Steps ​

  • Merge prerequisite gardener/gardener#14747, then clean up and open a PR for the managed infra scenario.
  • Propose the machine-provider-local:docker-mode change separately.

Code & Pull Requests ​

πŸ‘οΈ Allow Admins to Easily Use a Viewer Kubeconfig by Default (#71) ​

Tracking: hackathon#71

Problem Statement ​

When using gardenctl, admins always receive an admin kubeconfig when connecting to shoots or seeds. There is no way to default to a viewer kubeconfig β€” reducing the blast radius of accidental writes while browsing clusters.

Achievements ​

  • Implemented a new defaultKubeconfigAccessLevel configuration field per garden in the gardenctl config, supporting admin, viewer, and auto values for both shoots and managedSeeds.
  • Added convenience support: gardenctl config set-garden can modify the access level and automatically refreshes the symlinked kubeconfig when the current garden's access level changes.

Code & Pull Requests ​

πŸ“ Stage confineSpecUpdateRollout Changes in Annotation (#64) ​

Tracking: hackathon#64

Problem Statement ​

With confineSpecUpdateRollout enabled, spec changes to a Shoot are written directly to .spec but only reconciled during the next maintenance window. This means .spec reflects a desired future state rather than what is currently running β€” confusing for users and tooling alike.

Achievements ​

  • Demonstrated an approach where an admission plugin blocks direct user updates to .spec when confineSpecUpdateRollout is set, staging the desired changes in a ConfigMap instead.
  • At the start of the maintenance window, the staged state is applied to .spec and reconciliation proceeds as usual.
  • This keeps .spec as a faithful representation of the currently running state, while making staged changes inspectable and cancellable.

πŸ’Ύ GardenState Resource for Automated Garden Cluster Disaster Recovery (#44) ​

Tracking: hackathon#44

Problem Statement ​

Recovering a destroyed Garden cluster today is fully manual: operators must know which secrets to back up (CA certs, etcd encryption keys, ServiceAccount signing keys), where to find them, and how to restore the Garden resource's status. There is no standardised, machine-readable snapshot, making disaster recovery error-prone.

Achievements ​

  • After considering a new CRD, settled on encoding the GardenState as a Kubernetes Secret (label operator.gardener.cloud/purpose: garden-state) in the runtime cluster's garden namespace, containing a JSON-encoded snapshot. Choosing a Secret over a CRD means it is available before CRDs are installed, trivially extractable with kubectl, and easy to back up externally.
  • The GardenState type (modelled after ShootState) captures: persist=true-labelled secrets (CA certs, etcd encryption key, SA signing key), Garden metadata and spec, extension state from DNSRecord/BackupEntry/Extension CRDs, and objects referenced by extension state. The Garden UID is persisted explicitly to preserve the etcd backup bucket name.
  • A bootstrapper was added to gardener-operator that detects a garden-state secret without a Garden resource on startup and initiates the restore: restores secrets, recreates the Garden resource, and patches its status type to Restore.
  • After successful Garden reconciliation, the garden-state secret is updated.

Next Steps ​

  • Draft a GEP and present it in a TSC meeting.
  • Implement and wire up e2e tests.
  • Open a PR against gardener/gardener.

Code & Pull Requests ​

πŸ” Separately Encrypt etcd Backups (#69) ​

Tracking: hackathon#69

Problem Statement ​

etcd backups for all shoots on a seed are stored in a shared bucket. In case of a control plane compromise, a shoot could read the backups of other shoots on the same seed. A shoot-specific encryption key for each backup would limit the blast radius to the compromised shoot's own data.

Achievements ​

  • Implemented a prototype across three repositories: gardener generates a new shoot-specific encryption secret (persisted in ShootState), etcd-druid deploys etcd-backup-restore with an EncryptionConfig file (analogous to kube-apiserver encryption at rest), and etcd-backup-restore implements an AESGCM encryption package with full key rotation support.
  • Key rotation works via an encrypted keyring file stored alongside the backups β€” the keyring contains all keys used over time and is always re-encrypted with the latest key on startup.
  • The encryption package is wired into the backup and restore flows.

Next Steps ​

  • More extensive testing.
  • Hide behind a feature gate.
  • Support additional encryption provider types and provider type migration.

Code & Pull Requests ​

⚑ Reduce Secret Watch Pressure by Splitting ManagedResource Data (#61) ​

Tracking: hackathon#61

Problem Statement ​

gardener-resource-manager stores all rendered manifests in Secret objects referenced by ManagedResource.spec.secretRefs, even when those manifests contain no sensitive data. This causes elevated memory usage in the informer cache and unnecessary etcd watch pressure on seeds.

Achievements ​

Initial measurements on production-like seed clusters confirmed the problem:

ClusterShootsManagedResourcesSecretsIn-memory size
119311,75312,453204.90 MB
227215,60516,374281.22 MB

On average, ManagedResource secret data accounts for more than half of all secret data on seed clusters, roughly 1 MB per shoot control plane.

A sample implementation was developed introducing a new ManagedResourceData CRD for non-sensitive manifests. The ManagedResource spec was extended with dataRefs alongside the existing secretRefs. The split is determined at write time β€” Secret-kind manifests stay in Secret objects; everything else moves to ManagedResourceData. In a local test with a single shoot, the vast majority of manifests (38 out of 51 in the shoot control plane) moved out of Secrets.

Next Steps ​

  • Decide on the ManagedResourceData schema (spec/status fields vs. flat key/value).
  • Replace the fragile key-name-based classification with an explicit interface so callers declare intent.
  • Identify which controllers still watch Secrets and measure whether the feature remains worthwhile.
  • Design the garbage collection strategy for ManagedResourceData (currently mutable, unlike the immutable hashed Secrets).

Code & Pull Requests ​


ApeiroRA