Reference

MLOps stack: zero to canary in 6 weeks

An opinionated, six-week plan to take a model from notebook to canary-rollout production on Kubernetes.

A six-week plan to take a model from notebook to canary-rollout production. Designed for teams that already have a model worth shipping, but no production discipline around it.

Week 1 — Container and registry hygiene

  • Wrap the model in a deterministic Docker image: pinned base image, pyproject.toml lock, no pip install at runtime
  • Push to an immutable registry with content-addressable digests (ECR / ACR)
  • Sign images with Cosign — make signature verification a required admission policy
  • Add a CI pipeline with Trivy, Grype, and Semgrep scans as required checks

Week 2 — Inference service on Kubernetes

  • Deploy a minimal KServe InferenceService with a digest-pinned image
  • Set up Istio IngressGateway with mTLS internal, TLS external
  • Add liveness / readiness probes with timeouts that match real model warmup
  • Define resource requests and limits based on a load test, not a guess

Week 3 — Observability before scale

Before adding any traffic, instrument:

  • Prometheus + Grafana — inference latency p50 / p95 / p99, throughput, error rate
  • Jaeger / OpenTelemetry — distributed traces with span breakdown by stage
  • Loki / Splunk — structured logging with request IDs

If you don’t have a Grafana dashboard you’d be willing to put on a wall during launch, you’re not ready for traffic.

Week 4 — Model registry and promotion

  • Stand up MLflow for experiment tracking and registry
  • Adopt alias-based promotion (@champion, @challenger) — never deploy by version number
  • Build a promotion pipeline that bakes the promoted artifact into the inference image (see why)
  • Add evaluation gates against @champion — block promotion on quality, latency, or memory regressions

Week 5 — Drift detection and automated retraining

  • Wire Evidently for input distribution monitoring (PSI / KS / KL)
  • Add output score drift monitoring with a sliding production window
  • Connect drift alerts to a Kubeflow retraining pipeline
  • Promotion still requires the eval gate from Week 4

Week 6 — Canary rollout discipline

  • Configure Knative traffic splitting for staged rollouts (5% → 25% → 50% → 100%)
  • Define automated guardrails at each stage: error rate, p99, output sanity
  • Wire instant rollback on guardrail breach — @previous is one API call away
  • Practice the rollback in staging before you need it in production

What “done” looks like

  • A drift alert fires
  • Retraining runs unattended
  • Eval gate passes
  • New revision rolls out at 5%
  • Guardrails pass at each stage
  • 100% traffic shifts within a few hours
  • Nobody got paged

If any one of those doesn’t work, you’re not done.

What this is not

  • A way to ship sloppy science. Quality of the underlying model is yours to own.
  • A substitute for human judgment on high-impact decisions. Auto-promotion is for routine retraining; novel model architectures should still go through review.