Skip to main content

Helm‑Based Operator for Cloudflare Zero Trust — Declarative Tunnels, DNS, and Access from Kubernetes

I already run cloudflared in my cluster — all external access flows through a Cloudflare tunnel. What I didn’t have was a clean way to publish new applications to that tunnel and apply my existing Access policies to them without clicking through the Cloudflare dashboard every time. This operator fix

Helm‑Based Operator for Cloudflare Zero Trust — Declarative Tunnels, DNS, and Access from Kubernetes
hero-cloudflare-k8s-gateway

I run OpenShift clusters at work — the whole Red Hat ecosystem with RHCOS nodes, Operators everywhere, and API‑driven management. So when I tackled Cloudflare Zero Trust in my homelab, I didn't want to click through dashboards or write fragile shell scripts. I already run cloudflared in my cluster — all external access flows through a single Cloudflare tunnel — and I have Access policies already defined in Cloudflare that I wanted to reuse. What I didn't have was a clean, automated way to publish new applications to that tunnel and apply those existing policies to them without manually clicking through the Cloudflare dashboard every single time. I also wanted to stay on the Kubernetes Gateway API so that if I ever swap Traefik out for Nginx or anything else, most of my code doesn't need to change — I'll have a whole separate post on Gateway API. The goal was simple: add a handful of annotations to an HTTPRoute and have everything wired up automatically — tunnel routes, DNS records, Access application bindings, service tokens — using reusable templates where possible. And equally important: if I remove that route from the cluster or want to swap out an access policy, it should clean everything up automatically too. That's how the Cloudflare Zero Trust Operator came to be.

Why This Operator?

There are plenty of ways to manage Cloudflare Zero Trust: the web UI, cloudflared CLI, Terraform, or even K3s with a few Helm charts. But I wanted one unified control plane that:

  • Listens to Gateway API HTTPRoute resources (the modern way to define HTTP routing in Kubernetes)
  • Applies templates for reusable configuration
  • Stores state in Kubernetes so I can inspect and debug
  • Uses least‑privilege secrets and RBAC
  • Is self‑healing — if a tunnel disappears or DNS gets touched manually, the operator puts it back
  • Runs in my Talos‑powered Raspberry Pi cluster alongside everything else

The result is an operator that treats Cloudflare Zero Trust as a first‑class Kubernetes extension. You annotate a route, and the operator provisions the corresponding Cloudflare resources automatically. No external CI/CD pipelines, no separate state store — just pure Kubernetes reconciliation.

Architecture at a Glance

The operator is a single‑process Python container built on the kopf framework. It watches three resource types:

ResourcePurpose
HTTPRoute (gateway.networking.k8s.io)Trigger reconciliation when annotated
CloudflareZeroTrustTenantHolds Cloudflare account ID, tunnel ID, credential secret
CloudflareZeroTrustTemplateReusable configuration presets

All Cloudflare interactions happen through the official cloudflare-python SDK. State is stored in ConfigMaps in the operator namespace — one per managed HTTPRoute. No database, no PVC, just native Kubernetes objects.

Processing an Event: The Full Flow

When an HTTPRoute is created, updated, or resumed, the operator runs this sequence:

Step‑by‑Step

  1. Annotation check — The operator aborts early if cfzt.cloudflare.com/tenant or cfzt.cloudflare.com/hostname is missing.
  2. Tenant lookup — Searches for the CloudflareZeroTrustTenant CR first in the route's namespace, then in the operator's namespace. Not found? kopf.TemporaryError(delay=30) lets you create it later.
  3. Template lookup — Gathers the per‑route template (if cfzt.cloudflare.com/template annotation is set) and the base template (base-{tenant}). Both are looked up in the route's namespace first, then the operator's namespace.
  4. Settings mergeconfig.merge_settings() combines annotations, per‑route template, base template, and hard‑coded defaults in that priority order. The special {{ hostname }} variable gets replaced with the actual hostname.
  5. Change detectionconfig.compute_annotation_hash() hashes the annotations + template specs. If the hash matches the stored annotation_hash in the state ConfigMap, the reconcile is skipped. This is crucial for performance and for making template changes propagate correctly.
  6. Cloudflare client — Read the API token from the Secret referenced by the tenant's spec.credentialRef. Build the cloudflare.Cloudflare client.
  7. Zone resolution — Call cloudflare_api.resolve_zone_id() to find the zone ID from the hostname. Fail? TemporaryError(delay=60) retries later.
  8. Sub‑reconcilers — Depending on dns_only.enabled:
    • Tunnel mode (_reconcile_tunnel) creates a tunnel hostname route and a CNAME record
    • DNS‑only mode (_reconcile_dns_only) creates an A record (IP from staticIp or a LoadBalancer Service)
  9. Optional features — If access.enabled is true, _reconcile_access creates an Access Application and optionally a policy. If serviceToken.enabled is true, _reconcile_service_token creates a Service Token and stores credentials in a K8s Secret.
  10. Persist state — Write all Cloudflare resource IDs, the hash, and timestamps to the state ConfigMap cfzt-{route_namespace}-{route_name}.
  11. Patch annotations — Write result IDs (hostnameRouteId, cnameRecordId, accessAppId, etc.) and lastReconcile back to the HTTPRoute. If the route was deleted mid‑reconcile, the 404 is caught and logged.

Template System: 3‑Way Merge with Variable Substitution

Templates let you define reusable configuration blocks — think of them as classes for Cloudflare settings. The merge chain is:

HTTPRoute Annotations   (highest priority)
   ↓
Per‑route Template      (cfzt.cloudflare.com/template)
   ↓
Base Template           (base-{tenant})
   ↓
Hardcoded Defaults      (lowest)

For any setting, the first non‑empty source wins. The {{ hostname }} placeholder in template fields (e.g., originServerName) gets replaced with the actual hostname at merge time.

Example base template (base-home):

apiVersion: cfzt.cloudflare.com/v1alpha1
kind: CloudflareZeroTrustTemplate
metadata:
  name: base-home
  namespace: cloudflare-zero-trust
spec:
  originService:
    url: "https://traefik.traefik.svc:443"
    httpRedirect: true
    originTLS:
      noTLSVerify: false
      originServerName: "{{ hostname }}"
      tlsTimeout: 10
  accessApplication:
    enabled: false
  serviceToken:
    enabled: false

Per‑route template (protected) overriding Access settings:

spec:
  accessApplication:
    enabled: true
    sessionDuration: "8h"
    existingPolicyNames:
      - "Allow Home Users"

Route annotations activating the template:

metadata:
  annotations:
    cfzt.cloudflare.com/enabled: "true"
    cfzt.cloudflare.com/tenant: "home"
    cfzt.cloudflare.com/hostname: "admin.example.com"
    cfzt.cloudflare.com/template: "protected"

Sub‑Reconcilers in Detail

Tunnel Mode

Creates a Cloudflare Tunnel hostname route and a CNAME DNS record.

Cloudflare fieldSource
hostnamesettings.hostname
servicesettings.origin_service (merged URL)
originRequest.noTLSVerifysettings.origin_tls.no_tls_verify
originRequest.originServerNamesettings.origin_tls.origin_server_name (auto‑defaults to hostname if empty)
originRequest.caPoolsettings.origin_tls.ca_pool
originRequest.connectTimeoutsettings.origin_tls.tls_timeout (as "{n}s" string)
originRequest.http2Originsettings.origin_tls.http2_origin
originRequest.httpHostHeaderhostname (when match_sni_to_host is true)

The CNAME points to {tunnel_id}.cfargotunnel.com.

DNS‑Only Mode

Creates a plain A record. IP comes from:

  1. dnsOnly.staticIp (direct string) or
  2. dnsOnly.ingressServiceRef → look up LoadBalancer IP of the referenced Service

Additional settings: proxied (orange cloud) and ttl. No tunnel route is created.

Access Application

Creates/updates a Cloudflare Access Application named cfzt-{hostname}. If existingPolicyNames are listed, they're resolved to UUIDs and attached. If allowGroups or allowEmails are set, a new policy cfzt-{hostname}-allow is created.

All Access Application settings (sessionDuration, skipInterstitial, autoRedirectToIdentity, etc.) are configurable via template or per‑route annotations.

Service Token

Creates a Cloudflare Service Token (machine‑to‑machine) and stores client_id / client_secret in a Kubernetes Secret named cfzt-svctoken-{route_name}. Tokens are created only once — subsequent reconciles carry forward the existing token ID and secret name.

State Management & Change Detection

Each managed HTTPRoute gets a state ConfigMap in the operator namespace:

  • Name: cfzt-{route_namespace}-{route_name}
  • Labels: app.kubernetes.io/managed-by=cfzt-operator, cfzt.cloudflare.com/httproute-name, cfzt.cloudflare.com/httproute-namespace, cfzt.cloudflare.com/tenant
  • Data keys include:
    • annotation_hash — SHA‑256 over annotations + template specs
    • hostname, tenant_name, tunnel_id, zone_id
    • hostname_route_id, cname_record_id (tunnel mode)
    • dns_record_id, dns_record_ip (DNS‑only mode)
    • access_app_id, access_policy_ids
    • service_token_id, service_token_secret_name
    • last_reconcile, httproute_namespace, httproute_name

The hash check prevents unnecessary Cloudflare API calls. By including the template specs in the hash, any change to a used template automatically invalidates the cache for all routes that reference it — even if their own annotations haven't changed. This is why updating a CloudflareZeroTrustTemplate triggers a full re‑reconcile for all matching routes.

Delete Flow & Orphan Cleanup

When an HTTPRoute is deleted, the operator (with optional=True on the delete handler) still gets a best‑effort chance to clean up. The sequence:

  1. Load the state ConfigMap
  2. Look up the tenant
  3. Call delete_all_resources() (Access App → Service Token → Tunnel Route → DNS Record)
  4. Delete the service token Secret (if any) and the state ConfigMap

If Cloudflare objects are already gone (or credentials missing), deletions are silent or logged as warnings — the state is still removed to prevent infinite retries.

Orphan timer — A kopf timer runs on each CloudflareZeroTrustTenant every 300 s (initial delay 60 s). It scans the tenant's state ConfigMaps and deletes any that no longer have a corresponding, enabled HTTPRoute. This catches cases where the operator was down when a route was removed, or where the route was manually deleted from the cluster.

Error Handling & Observability

Situationkopf response
Tenant not foundTemporaryError(delay=30) — retry after 30 s
Zone resolution failedTemporaryError(delay=60)
No IP for DNS‑only modePermanentError — user must fix config
Invalid IP addressPermanentError
HTTPRoute 404 mid‑reconcileCaught, logged, skip annotation patch
Credential read failure on deleteLog warning, delete state only
General exceptionExponential back‑off: 1 s → 5 s → 15 s (max)

Log levels controlled via LOG_LEVEL env var (DEBUG/INFO/WARNING/ERROR). DEBUG gives full handler traces; INFO shows reconcile actions; WARNING/ERROR hide successes.

Inspect state:

kubectl get configmap -n cloudflare-zero-trust \
  -l app.kubernetes.io/managed-by=cfzt-operator -o yaml

Watch logs:

kubectl logs -n cloudflare-zero-trust deployment/cloudflare-zero-trust-operator -f

List managed routes:

kubectl get httproutes -A -o jsonpath='{range .items[?(@.metadata.annotations.cfzt\.cloudflare\.com/enabled=="true")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'

Real‑World Usage in My Homelab

In my Talos‑powered Raspberry Pi cluster, this operator manages:

  • www.swheetlife.com — my Ghost blog, exposed via tunnel behind Cloudflare
  • admin.swheetlife.com — an Access‑protected dashboard
  • api.swheetlife.com — DNS‑only A record pointing at my Traefik LoadBalancer
  • internal-api.swheetlife.com — service token for automated clients

All of these routes live in different application namespaces, yet they share the same home tenant and base-home template defined in the cloudflare-zero-trust namespace. The operator is deployed via Helm with 1 replica, 200 Mi memory limit, and runs as non‑root with a read‑only root filesystem.

Deploying the Operator

# Add the Helm repo
helm repo add wheetazlab https://wheetazlab.github.io/cloudflare-zero-trust-operator
helm repo update

# Install
helm install czt-operator wheetazlab/cloudflare-zero-trust-operator \
  -n cloudflare-zero-trust --create-namespace \
  --set operator.logLevel=INFO

Then create your CloudflareZeroTrustTenant and CloudflareZeroTrustTemplate resources, and start annotating HTTPRoute objects. The operator takes care of the rest.

TL;DR

  • What: A Helm‑deployed operator that turns HTTPRoute annotations into Cloudflare Zero Trust resources.
  • How: Template‑driven merge, hash‑based change detection, state stored in ConfigMaps.
  • Why: Declarative, GitOps‑friendly, self‑healing, no external dependencies.
  • Works on: Any Kubernetes cluster — including my Talos‑based Raspberry Pi homelab.

If you're already using Gateway API and want to bring Cloudflare Zero Trust under the same declarative umbrella, this operator is the missing piece. Check out the repo for installation details, examples, and troubleshooting tips.

All technical details are sourced from the operator's own architecture and flow documentation.

Get new posts delivered to your inbox