Got Drift? The Infrastructure Reality Check Nobody Talks About

January 3, 2025

Infrastructure

Terraform

DevOps

SRE

Got Drift? The Infrastructure Reality Check Nobody Talks About

No, this isn’t another AI hype article. It’s about the unsexy infrastructure work everyone’s skipping.

Infrastructure as Code is everywhere now—Terraform and OpenTofu manage 64% of cloud deployments. Teams are automating infrastructure like it’s going out of style, and honestly, that’s great progress.

Except most teams stop at deployment and completely forget about drift governance.

Then reality comes knocking:

Manual console fixes during 3 AM incidents
“Temporary” changes that become permanent fixtures
State files that bear no resemblance to what’s actually running

The infrastructure you think you have isn’t the infrastructure you’ve got. And that gap? It’s where outages live.

This Isn’t a Tooling Problem

Here’s the thing—this isn’t about finding the perfect drift detection tool. It’s about planning for the inevitable. As Sol Rashidi wisely notes, don’t reach for the jackhammer when a hammer will do, but also don’t skip the boring foundational work that makes your strategy actually work.

There is no free lunch in infrastructure management.

Three Questions Every IaC Team Should Answer

If you’re pushing Infrastructure as Code adoption, ask yourself:

Got automated drift detection in your CI/CD pipeline? Not just alerts when things break, but proactive scanning that catches drift before it causes problems.
Have you defined your drift remediation process? When (not if) someone makes a manual change, what happens next? Who gets notified? How do you get back to a known state?
Is drift monitoring part of your definition of “done”? If your deployment pipeline doesn’t include ongoing governance, you’re not actually done.

The Real Value of IaC

The value of Infrastructure as Code isn’t in the initial deployment—it’s in maintaining that single source of truth over time. That requires the unsexy work: monitoring, governance, and discipline.

It’s the difference between having infrastructure that works today and infrastructure that works reliably for years.

Because the alternative is infrastructure archaeology—digging through console changes and trying to reverse-engineer what someone did at 2 AM six months ago.

And nobody has time for that.