Playbooks
GPU FinOps Audit: scope, deliverables, and the spreadsheet
Exactly what we look at in a two-week GPU FinOps audit, what we deliver, and the categories we use to attribute spend.
This is the audit template we use for GPU FinOps engagements. It runs two weeks, has fixed scope and deliverables.
Week 1 — Discovery and instrumentation
Read-only access required
- Cloud cost data (AWS Cost Explorer, Azure Cost Management, or Harness CCM / Kubecost export)
- Kubernetes API read access (cluster-wide
viewis enough) - Prometheus / Grafana read access
- Read access to GitOps repo (Argo / Flux) for deployment manifests
What we instrument
If not already in place:
- DCGM Exporter → Prometheus for per-pod GPU utilization
- kube-state-metrics for label correlation
- A shared Grafana dashboard for the audit period
We bring this stack up via Helm in under a day. Removed cleanly at end-of-engagement if not adopted.
Week 2 — Analysis and remediation plan
Spend categorization
Every dollar of spend lands in one of these buckets:
| Category | Definition |
|---|---|
| Productive | Workload running, utilization above threshold, attributable to a product / cost center |
| Allocated idle | Workload running, utilization below threshold (often headroom or over-provisioning) |
| Unallocated idle | Resources running, no workload (stale node groups, churn cycles) |
| Unattributed | Resources missing labels — unknown owner |
| Tax | Cluster overhead — control plane, observability, system pods |
What we look for
- Stale node groups — utilization < 1% for the audit window
- GPU node churn cycles — count of GPU node creations / deletions per day
- CPU pools at < 5% utilization — over-provisioning
- Pods missing cost-center labels — unattributable
- Inference pools sized for peak, not p95 — over-provisioned by definition
- GPU workloads that don’t saturate a full GPU — candidates for time slicing
- Long-running pods with low utilization — candidates for scale-to-zero
Deliverables
At end of week 2, you receive:
- A line-item findings spreadsheet — every issue with cost impact, severity, and remediation effort
- A prioritized remediation plan — quick wins (≤1 week), medium (≤1 month), longer projects
- A label and policy proposal — what to enforce via OPA / Kyverno to keep the savings
- A Grafana dashboard pack for ongoing monitoring
- One executive-summary slide for the cost story
Optional follow-on
- 4–6 week remediation engagement to implement the prioritized plan
- Monthly cost review with the platform team
Common findings (across engagements)
The Pareto we see most often:
- 70% of unallocated spend is two or three large issues, not a hundred small ones
- The biggest hidden cost is GPU node churn, not over-provisioning
- The biggest preventable cost is missing labels — without attribution, no team owns the waste
- Quick wins recover 50–60% of waste in the first month; the rest requires structural change (workload migration, app re-architecture)
What this is not
- A cost-cutting hatchet job. We won’t recommend changes that risk SLOs.
- A replacement for ongoing FinOps practice. The audit is the diagnosis. The team owns the cure.
Interested? Email us with a short summary of your environment and we’ll come back with a fixed quote.