Helm‑Based Operator for Cloudflare Zero Trust — Declarative Tunnels, DNS, and Access from Kubernetes
I already run cloudflared in my cluster — all external access flows through a Cloudflare tunnel. What I didn’t have was a clean way to publish new applications to that tunnel and apply my existing Access policies to them without clicking through the Cloudflare dashboard every time. This operator fix
I run OpenShift clusters at work — the whole Red Hat ecosystem with RHCOS nodes, Operators everywhere, and API‑driven management. So when I tackled Cloudflare Zero Trust in my homelab, I didn't want to click through dashboards or write fragile shell scripts. I already run cloudflared in my cluster — all external access flows through a single Cloudflare tunnel — and I have Access policies already defined in Cloudflare that I wanted to reuse. What I didn't have was a clean, automated way to publish new applications to that tunnel and apply those existing policies to them without manually clicking through the Cloudflare dashboard every single time. I also wanted to stay on the Kubernetes Gateway API so that if I ever swap Traefik out for Nginx or anything else, most of my code doesn't need to change — I'll have a whole separate post on Gateway API. The goal was simple: add a handful of annotations to an HTTPRoute and have everything wired up automatically — tunnel routes, DNS records, Access application bindings, service tokens — using reusable templates where possible. And equally important: if I remove that route from the cluster or want to swap out an access policy, it should clean everything up automatically too. That's how the Cloudflare Zero Trust Operator came to be.
Why This Operator?
There are plenty of ways to manage Cloudflare Zero Trust: the web UI, cloudflared CLI, Terraform, or even K3s with a few Helm charts. But I wanted one unified control plane that:
- Listens to Gateway API
HTTPRouteresources (the modern way to define HTTP routing in Kubernetes) - Applies templates for reusable configuration
- Stores state in Kubernetes so I can inspect and debug
- Uses least‑privilege secrets and RBAC
- Is self‑healing — if a tunnel disappears or DNS gets touched manually, the operator puts it back
- Runs in my Talos‑powered Raspberry Pi cluster alongside everything else
The result is an operator that treats Cloudflare Zero Trust as a first‑class Kubernetes extension. You annotate a route, and the operator provisions the corresponding Cloudflare resources automatically. No external CI/CD pipelines, no separate state store — just pure Kubernetes reconciliation.
Architecture at a Glance
The operator is a single‑process Python container built on the kopf framework. It watches three resource types:
| Resource | Purpose |
|---|---|
HTTPRoute (gateway.networking.k8s.io) | Trigger reconciliation when annotated |
CloudflareZeroTrustTenant | Holds Cloudflare account ID, tunnel ID, credential secret |
CloudflareZeroTrustTemplate | Reusable configuration presets |
All Cloudflare interactions happen through the official cloudflare-python SDK. State is stored in ConfigMaps in the operator namespace — one per managed HTTPRoute. No database, no PVC, just native Kubernetes objects.
Processing an Event: The Full Flow
When an HTTPRoute is created, updated, or resumed, the operator runs this sequence:
Step‑by‑Step
- Annotation check — The operator aborts early if
cfzt.cloudflare.com/tenantorcfzt.cloudflare.com/hostnameis missing. - Tenant lookup — Searches for the
CloudflareZeroTrustTenantCR first in the route's namespace, then in the operator's namespace. Not found?kopf.TemporaryError(delay=30)lets you create it later. - Template lookup — Gathers the per‑route template (if
cfzt.cloudflare.com/templateannotation is set) and the base template (base-{tenant}). Both are looked up in the route's namespace first, then the operator's namespace. - Settings merge —
config.merge_settings()combines annotations, per‑route template, base template, and hard‑coded defaults in that priority order. The special{{ hostname }}variable gets replaced with the actual hostname. - Change detection —
config.compute_annotation_hash()hashes the annotations + template specs. If the hash matches the storedannotation_hashin the state ConfigMap, the reconcile is skipped. This is crucial for performance and for making template changes propagate correctly. - Cloudflare client — Read the API token from the Secret referenced by the tenant's
spec.credentialRef. Build thecloudflare.Cloudflareclient. - Zone resolution — Call
cloudflare_api.resolve_zone_id()to find the zone ID from the hostname. Fail?TemporaryError(delay=60)retries later. - Sub‑reconcilers — Depending on
dns_only.enabled:- Tunnel mode (
_reconcile_tunnel) creates a tunnel hostname route and a CNAME record - DNS‑only mode (
_reconcile_dns_only) creates an A record (IP fromstaticIpor a LoadBalancer Service)
- Tunnel mode (
- Optional features — If
access.enabledis true,_reconcile_accesscreates an Access Application and optionally a policy. IfserviceToken.enabledis true,_reconcile_service_tokencreates a Service Token and stores credentials in a K8s Secret. - Persist state — Write all Cloudflare resource IDs, the hash, and timestamps to the state ConfigMap
cfzt-{route_namespace}-{route_name}. - Patch annotations — Write result IDs (
hostnameRouteId,cnameRecordId,accessAppId, etc.) andlastReconcileback to theHTTPRoute. If the route was deleted mid‑reconcile, the 404 is caught and logged.
Template System: 3‑Way Merge with Variable Substitution
Templates let you define reusable configuration blocks — think of them as classes for Cloudflare settings. The merge chain is:
HTTPRoute Annotations (highest priority)
↓
Per‑route Template (cfzt.cloudflare.com/template)
↓
Base Template (base-{tenant})
↓
Hardcoded Defaults (lowest)
For any setting, the first non‑empty source wins. The {{ hostname }} placeholder in template fields (e.g., originServerName) gets replaced with the actual hostname at merge time.
Example base template (base-home):
apiVersion: cfzt.cloudflare.com/v1alpha1
kind: CloudflareZeroTrustTemplate
metadata:
name: base-home
namespace: cloudflare-zero-trust
spec:
originService:
url: "https://traefik.traefik.svc:443"
httpRedirect: true
originTLS:
noTLSVerify: false
originServerName: "{{ hostname }}"
tlsTimeout: 10
accessApplication:
enabled: false
serviceToken:
enabled: false
Per‑route template (protected) overriding Access settings:
spec:
accessApplication:
enabled: true
sessionDuration: "8h"
existingPolicyNames:
- "Allow Home Users"
Route annotations activating the template:
metadata:
annotations:
cfzt.cloudflare.com/enabled: "true"
cfzt.cloudflare.com/tenant: "home"
cfzt.cloudflare.com/hostname: "admin.example.com"
cfzt.cloudflare.com/template: "protected"
Sub‑Reconcilers in Detail
Tunnel Mode
Creates a Cloudflare Tunnel hostname route and a CNAME DNS record.
| Cloudflare field | Source |
|---|---|
hostname | settings.hostname |
service | settings.origin_service (merged URL) |
originRequest.noTLSVerify | settings.origin_tls.no_tls_verify |
originRequest.originServerName | settings.origin_tls.origin_server_name (auto‑defaults to hostname if empty) |
originRequest.caPool | settings.origin_tls.ca_pool |
originRequest.connectTimeout | settings.origin_tls.tls_timeout (as "{n}s" string) |
originRequest.http2Origin | settings.origin_tls.http2_origin |
originRequest.httpHostHeader | hostname (when match_sni_to_host is true) |
The CNAME points to {tunnel_id}.cfargotunnel.com.
DNS‑Only Mode
Creates a plain A record. IP comes from:
dnsOnly.staticIp(direct string) ordnsOnly.ingressServiceRef→ look up LoadBalancer IP of the referenced Service
Additional settings: proxied (orange cloud) and ttl. No tunnel route is created.
Access Application
Creates/updates a Cloudflare Access Application named cfzt-{hostname}. If existingPolicyNames are listed, they're resolved to UUIDs and attached. If allowGroups or allowEmails are set, a new policy cfzt-{hostname}-allow is created.
All Access Application settings (sessionDuration, skipInterstitial, autoRedirectToIdentity, etc.) are configurable via template or per‑route annotations.
Service Token
Creates a Cloudflare Service Token (machine‑to‑machine) and stores client_id / client_secret in a Kubernetes Secret named cfzt-svctoken-{route_name}. Tokens are created only once — subsequent reconciles carry forward the existing token ID and secret name.
State Management & Change Detection
Each managed HTTPRoute gets a state ConfigMap in the operator namespace:
- Name:
cfzt-{route_namespace}-{route_name} - Labels:
app.kubernetes.io/managed-by=cfzt-operator,cfzt.cloudflare.com/httproute-name,cfzt.cloudflare.com/httproute-namespace,cfzt.cloudflare.com/tenant - Data keys include:
annotation_hash— SHA‑256 over annotations + template specshostname,tenant_name,tunnel_id,zone_idhostname_route_id,cname_record_id(tunnel mode)dns_record_id,dns_record_ip(DNS‑only mode)access_app_id,access_policy_idsservice_token_id,service_token_secret_namelast_reconcile,httproute_namespace,httproute_name
The hash check prevents unnecessary Cloudflare API calls. By including the template specs in the hash, any change to a used template automatically invalidates the cache for all routes that reference it — even if their own annotations haven't changed. This is why updating a CloudflareZeroTrustTemplate triggers a full re‑reconcile for all matching routes.
Delete Flow & Orphan Cleanup
When an HTTPRoute is deleted, the operator (with optional=True on the delete handler) still gets a best‑effort chance to clean up. The sequence:
- Load the state ConfigMap
- Look up the tenant
- Call
delete_all_resources()(Access App → Service Token → Tunnel Route → DNS Record) - Delete the service token Secret (if any) and the state ConfigMap
If Cloudflare objects are already gone (or credentials missing), deletions are silent or logged as warnings — the state is still removed to prevent infinite retries.
Orphan timer — A kopf timer runs on each CloudflareZeroTrustTenant every 300 s (initial delay 60 s). It scans the tenant's state ConfigMaps and deletes any that no longer have a corresponding, enabled HTTPRoute. This catches cases where the operator was down when a route was removed, or where the route was manually deleted from the cluster.
Error Handling & Observability
| Situation | kopf response |
|---|---|
| Tenant not found | TemporaryError(delay=30) — retry after 30 s |
| Zone resolution failed | TemporaryError(delay=60) |
| No IP for DNS‑only mode | PermanentError — user must fix config |
| Invalid IP address | PermanentError |
| HTTPRoute 404 mid‑reconcile | Caught, logged, skip annotation patch |
| Credential read failure on delete | Log warning, delete state only |
| General exception | Exponential back‑off: 1 s → 5 s → 15 s (max) |
Log levels controlled via LOG_LEVEL env var (DEBUG/INFO/WARNING/ERROR). DEBUG gives full handler traces; INFO shows reconcile actions; WARNING/ERROR hide successes.
Inspect state:
kubectl get configmap -n cloudflare-zero-trust \
-l app.kubernetes.io/managed-by=cfzt-operator -o yaml
Watch logs:
kubectl logs -n cloudflare-zero-trust deployment/cloudflare-zero-trust-operator -f
List managed routes:
kubectl get httproutes -A -o jsonpath='{range .items[?(@.metadata.annotations.cfzt\.cloudflare\.com/enabled=="true")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'
Real‑World Usage in My Homelab
In my Talos‑powered Raspberry Pi cluster, this operator manages:
www.swheetlife.com— my Ghost blog, exposed via tunnel behind Cloudflareadmin.swheetlife.com— an Access‑protected dashboardapi.swheetlife.com— DNS‑only A record pointing at my Traefik LoadBalancerinternal-api.swheetlife.com— service token for automated clients
All of these routes live in different application namespaces, yet they share the same home tenant and base-home template defined in the cloudflare-zero-trust namespace. The operator is deployed via Helm with 1 replica, 200 Mi memory limit, and runs as non‑root with a read‑only root filesystem.
Deploying the Operator
# Add the Helm repo
helm repo add wheetazlab https://wheetazlab.github.io/cloudflare-zero-trust-operator
helm repo update
# Install
helm install czt-operator wheetazlab/cloudflare-zero-trust-operator \
-n cloudflare-zero-trust --create-namespace \
--set operator.logLevel=INFO
Then create your CloudflareZeroTrustTenant and CloudflareZeroTrustTemplate resources, and start annotating HTTPRoute objects. The operator takes care of the rest.
TL;DR
- What: A Helm‑deployed operator that turns
HTTPRouteannotations into Cloudflare Zero Trust resources. - How: Template‑driven merge, hash‑based change detection, state stored in ConfigMaps.
- Why: Declarative, GitOps‑friendly, self‑healing, no external dependencies.
- Works on: Any Kubernetes cluster — including my Talos‑based Raspberry Pi homelab.
If you're already using Gateway API and want to bring Cloudflare Zero Trust under the same declarative umbrella, this operator is the missing piece. Check out the repo for installation details, examples, and troubleshooting tips.
All technical details are sourced from the operator's own architecture and flow documentation.