Security

Zero Trust Kubernetes: a 2026 checklist

Practical hardening checklist for production Kubernetes — admission policies, identity, network, and supply chain.

A checklist we run before any cluster takes production traffic. Not exhaustive — opinionated.

Identity

  • No long-lived credentials in pods. Use IRSA (AWS), Workload Identity (Azure / GCP), or SPIRE.
  • CI / CD runners use OIDC federation. No static AWS access keys or service-principal secrets.
  • Service-to-service auth via mTLS — Istio or Linkerd, with strict mode enabled.
  • Human access via short-lived tokens through your IdP — no long-lived kubeconfig files distributed to laptops.

Admission

  • Pod Security Standards: restricted profile in production namespaces. baseline is not enough.
  • OPA Gatekeeper or Kyverno for custom policies — required labels, registry allowlist, image-signature verification.
  • Image signature verification at admission. Unsigned images denied. (Cosign verifier or Notary v2.)
  • Resource quotas per namespace. Prevents a misconfigured deployment from spawning 100 GPU nodes.
  • NetworkPolicies default-deny. Explicit allow rules for required traffic.

Supply chain

  • SBOM generated on every image build (Syft / Trivy).
  • Image signed at build time (Cosign / Sigstore or Notary v2).
  • Vulnerability scan as a required CI check — block on CRITICAL / HIGH with no waiver.
  • Base images pinned by digest, not tag.
  • Quarantine registry layer — incoming images held until scans pass, then promoted to the production registry.
  • Secrets scanning in CI — Gitleaks or similar, fails build on hit.

Privileged access

  • No standing privileged access to the cluster. Just-in-time elevation through your PAM (CyberArk, Vault, BeyondTrust).
  • Privileged sessions recorded for compliance.
  • Break-glass account exists, is offline, and rotation is on the calendar.
  • Audit logs shipped off-cluster to immutable storage. A compromised cluster cannot delete its own logs.

Network

  • Calico or Cilium with network policy enforcement — not just installed, actually enforcing.
  • Egress controls — explicit allowlist for outbound traffic. No surprise curl evil.example.com from a worker pod.
  • Ingress behind a WAF. Cloud-native (ALB WAF, Azure WAF, Cloudflare) or in-cluster (e.g. ModSecurity).

Secrets

  • No secrets in env vars in pod spec. Mount from CSI driver pulling from Vault / AWS Secrets Manager / Azure Key Vault.
  • Secret rotation automated. Manual rotation rots.
  • No secrets in image layers. Trivy / Gitleaks check.

Observability

  • Audit logs to SIEM with alerting on suspicious patterns (privilege escalation, exec into pods, secret access).
  • Runtime threat detection — Falco, Sysdig, or equivalent — alerting on shell-spawning in containers, unexpected outbound connections.
  • Failed authentication attempts dashboarded.

Common gotchas

  • baseline PSS in prod — privileged escalation, hostPath volumes, host network access all permitted. Use restricted.
  • Dangling Service Accounts with cluster-admin — left over from migrations. Audit every quarter.
  • NetworkPolicy without DNS allow — pods can’t resolve. Fix with explicit egress to kube-dns.
  • Image signing without verification at admission — half the value. The verifier closes the loop.

See also