A Series B startup came to us with a problem: AWS costs were growing faster than revenue.
They were spending $87,000/month on AWS. Engineering leadership had tried to optimize but wasn't sure where to focus. The board was asking questions.
Three weeks later, we'd identified $180,000 in annual savings. Here's exactly what we did.
First pass at the bill showed the usual suspects:
Nothing obviously broken. But the devil's in the details.
They were running 100% on-demand. No Savings Plans. No Reserved Instances.
This is startup default mode — move fast, don't lock in — but they were two years old with predictable workloads.
Immediate opportunity: Compute Savings Plans could save 30% on EC2.Production database: db.r5.4xlarge. 16 vCPUs, 128 GB RAM.
Average CPU utilization: 12%. Peak CPU utilization: 34%.
This is a classic over-provisioned database. They'd sized for expected growth that hadn't materialized.
Immediate opportunity: Downsize to db.r5.2xlarge (50% cost reduction).Two full environments — staging and dev — running production-equivalent infrastructure. 24 hours a day. 7 days a week.
Developer activity: primarily 9 AM - 7 PM, Monday-Friday.
That's 50 hours of usage out of 168 hours per week. 70% waste.
Immediate opportunity: Auto-shutdown non-production during off-hours.The zombie hunt found:
Total waste: ~$1,200/month
Small in the scheme of things, but pure waste is pure waste.
CloudWatch Logs costs: $2,800/month.
Log retention: "Never expire" (default). Oldest logs: 26 months. Logs accessed older than 30 days: 0.
Immediate opportunity: Set 30-day retention, archive security logs to S3.Now the work begins. We prioritized by effort-to-savings ratio.
Total quick wins: $3,340/month.
Used AWS Instance Scheduler to:
Some wrinkles:
Results: ~$8,500/month saved on dev/staging infrastructure.
Purchased Compute Savings Plans:Analyzed 30-day usage patterns. Calculated stable baseline. Purchased 1-year Compute Savings Plans covering 70% of baseline.
Results: ~$4,800/month saved on committed compute.
This required more care. Production database.
Process: 1. Took a snapshot 2. Created a smaller replica 3. Tested application against replica 4. Scheduled maintenance window 5. Performed failover to smaller instance 6. Monitored for 48 hours
Results: $4,200/month saved.
Graviton Migration (Partial):Identified 12 services that could move to Graviton:
Migrated 8 services in week 2. Remaining 4 needed more testing.
Results: $1,600/month saved (8 services), more coming.
Created runbooks for:
| Category | Monthly Savings | Annual Savings | |----------|-----------------|----------------| | Zombie Resources | $1,200 | $14,400 | | Log Retention | $1,900 | $22,800 | | Dev/Staging Shutdown | $8,500 | $102,000 | | Savings Plans | $4,800 | $57,600 | | RDS Right-Sizing | $4,200 | $50,400 | | Graviton (Partial) | $1,600 | $19,200 | | Total | $22,200 | $266,400 |
Wait, that's more than $180K.
The blog title says $180K because that's what we committed to. We over-delivered.
More importantly: no performance impact. No engineering burden beyond the initial migration. Sustainable savings.
Zombie resources. Log retention. Dev environment schedules.
None of this is technically sophisticated. It just requires someone to look.
If you've been running stable workloads for 6+ months without commitments, you're overpaying. Period.
Nobody wants to downsize production infrastructure. "What if we need it?"
You almost certainly don't. Monitor, test, and be willing to reverse if needed.
We generated $5,000/month in savings on Day 1 with 2 hours of work.
That bought credibility and breathing room for the harder changes later.
The savings only stick if you maintain them. Zombie resources will return. Logs will grow. New waste will appear.
Ongoing vigilance matters more than one-time optimization.