The Datadog Cost Problem
Datadog is essential for DevOps teams but becomes prohibitively expensive at scale:
- Logs: $0.10-0.25 per GB ingested (100 GB/day = $3K/month)
- Metrics: $0.05 per custom metric (1000 metrics = $50/month, but often 5K-10K metrics)
- APM (Application Performance Monitoring): $0.10-0.40 per trace ingested
- Real User Monitoring (RUM): $1-3 per 1,000 sessions
- Typical mid-market org (100-500 engineers): $50K-$300K annually
Key problem: Most orgs don't know what they're ingesting. Logs double every quarter. Metrics accumulate. Traces grow with each deployment.
7 Proven Cost Reduction Tactics
Most teams send way too much to Datadog. Debug logs, verbose request/response payloads, and repetitive health checks account for 40-60% of ingestion.
Action: Review your intake and enable sampling. Send 100% of ERROR logs, 50% of WARN, 10% of INFO, 0% of DEBUG to production.
Use Datadog's filtering pipeline to drop low-value logs before they're ingested (they don't count against quota if dropped pre-ingestion).
Action: Create intake filter rules: drop health check logs, exclude noisy services (CDN logs, LB health checks, etc.)
High-cardinality tags (user_id, request_id, order_id) multiply your metric count exponentially. 100 base metrics × 1000 unique values = 100K billable metrics.
Action: Audit high-cardinality metrics. Remove user_id, session_id tags. Use low-cardinality alternatives (region, environment, service).
Command to find high-cardinality: avg:system.disk.used{*} → Check cardinality in Datadog UI (Metrics → Cardinality)
Most teams send 100% of traces. You don't need every request—send 100% of errors, 10% of normal requests, 1% of fast/successful operations.
Action: Set trace ingestion controls in Datadog APM settings. This alone cuts 80-90% of trace costs.
Many orgs run Datadog agent + OpenTelemetry collector + proprietary monitoring. This duplicates metrics and logs.
Action: Audit all monitoring tools. Remove New Relic/SolarWinds if Datadog covers 90% of use case. Run only Datadog agent.
Datadog charges for log indexing. 30-day retention is standard, but you don't need immediate access to 90-day-old logs.
Action: Keep 7-14 days in Datadog, archive older logs to S3 (free) for compliance. Datadog can search S3 if needed.
Datadog offers 20-30% discount for 3-year commitments. If cost is $150K/year, negotiate to $110K/year with multi-year lock.
Action: Contact your Datadog rep with audit results. Show you've optimized. Offer 3-year term for 25% discount.
Real Case Studies
Case Study #1: Series B SaaS (200 engineers)
Optimization tactics: Log sampling + filter (drop health checks), trace sampling (100% errors, 10% success), cardinality reduction
Tools used: #1, #2, #4, #3
New cost: $108K/year (after optimizations) → $84K/year (with 3-year discount)
Savings: $96K/year (53% reduction + $24K from multi-year)
Case Study #2: Enterprise (500 engineers, high ingestion)
Optimization tactics: Full audit + intake filtering, APM consolidation (removed third-party RUM tool), archive logs after 14 days
Tools used: #1, #2, #5, #6
New cost: $252K/year
Savings: $168K/year (40% reduction)
Case Study #3: Mid-market (50 engineers, scattered setup)
Optimization tactics: Retire New Relic (duplicate), implement log sampling, remove low-cardinality metrics
Tools used: #1, #5
New cost: $38K/year (Datadog only, optimized)
Savings: $57K/year (60% reduction)
Quick Implementation Timeline
- Week 1: Audit Datadog intake (Logs, APM, Metrics). Identify high-volume sources. Estimate potential savings using tactics above.
- Week 2: Implement log sampling + intake filtering. Deploy trace sampling in staging. Measure 48-72 hour ingestion reduction.
- Week 3: Roll out to production. Fix any monitoring gaps (ensure high-priority alerts still trigger). Monitor dashboards for anomalies.
- Week 4: Audit cardinality + consolidate agents. Archive logs. Prepare negotiation with Datadog rep using before/after numbers.
Get the Datadog Cost Optimization Checklist
Complete audit template, sampling strategy guide, agent consolidation checklist, and negotiation talking points.
No spam. Access instantly. Unsubscribe anytime.
Expected Results After Optimization
- Log ingestion: 50-70% reduction (by sampling + filtering)
- Trace volume: 80-90% reduction (by sampling, keeping errors + critical paths)
- Metrics: 30-50% reduction (by removing high-cardinality tags)
- Total cost reduction: 40-60% typical, 60-80% aggressive
- Monitoring quality: Actually improves (less noise, clearer signals)
Note: You'll maintain 100% visibility for errors and critical paths. You're just reducing noise.