← All case studies
2026

Recovering $170K/month in wasted GPU spend

Healthcare AI client running real-time RAG on EKS was burning ~$170–180K/month in idle GPU and over-provisioned compute. We traced and remediated 70% of unallocated spend.

EKSKarpenterKEDANvidia DCGMDatadogHarness CCM

Context

A Fortune 100 healthcare AI program ran a real-time Retrieval-Augmented Generation pipeline for clinical decision support on AWS EKS. Inference workloads were GPU-heavy (CUDA / PyTorch) and bursty — long quiet periods punctuated by traffic spikes during clinic hours.

Problem

Cloud bills had ballooned to over $1M/month, with 70% of EKS spend unallocated to any product team — invisible to Harness CCM cost dashboards. Engineering had no visibility into per-model GPU utilization, and the autoscaler was thrashing.

Approach

A two-week diagnostic followed by a structured remediation:

  1. Stand up GPU observability — DCGM Exporter to Prometheus, Datadog GPU Fleet integration, Splunk dashboards for per-pod utilization, memory pressure, and idle-node signals.
  2. Root-cause the unallocated spend — traced to: KEDA/Karpenter misconfiguration causing cyclic GPU node churn, CPU pools at 0.1% utilization, stale node groups at 0.001% utilization, and pods missing namespace/cost-center labels.
  3. Consolidate workloads — merged 200+ namespaces onto shared node pools with appropriate taints/tolerations and resource quotas; right-sized inference pools by p95 utilization.
  4. GPU time slicing — enabled multiple inference containers to share a single GPU without contention, deferring expensive scale-outs.
  5. Scale-to-zero between batches — KEDA ScaledObjects on queue-depth signals; idle GPU pools drop to zero between jobs.

Outcome

  • ~$170–180K/month recovered in wasted GPU + compute spend
  • 20%+ reduction in baseline Kubernetes infrastructure cost
  • Allocated spend visibility went from ~30% to ~95% across product teams
  • p99 inference latency held stable through migration (no SLO regression)

Stack

EKS, Karpenter, KEDA, Nvidia Device Plugin + DCGM Exporter, Datadog (APM, GPU Fleet, Cost), Splunk, Harness CCM, Terraform.