How to Ace a DevOps Interview: DORA, Kubernetes, and Terraform

X Facebook LinkedIn

Most DevOps interview candidates walk in knowing their tools. They can rattle off Kubernetes commands, describe Terraform state management, and explain the difference between blue-green and canary deployments. Then the interviewer asks them to explain why Knight Capital lost $440 million in 45 minutes—and the room goes quiet. That gap between tool knowledge and engineering judgment is exactly what separates a callback from a rejection.

DevOps interviews test three things simultaneously: whether you understand the philosophy behind modern software delivery, whether you have the technical depth to back it up, and whether you can communicate under pressure. Here’s how to prepare for each layer.

What the Interview Actually Measures

Before you start memorizing Kubernetes pod states, understand what the interviewer is really after. DevOps roles have evolved significantly—you’re no longer just automating deployments. Modern interviews assess your ability to bridge software development and IT operations, which means philosophical alignment matters as much as syntax recall.

The DevOps Research and Assessment (DORA) framework gives you the vocabulary interviewers expect. The four metrics you need to own cold:

Deployment Frequency: How often you release to production. Elite teams deploy multiple times per day.
Lead Time for Changes: How long from code commit to production. Shorter is better.
Change Failure Rate: The percentage of deployments that cause production incidents.
Failed Deployment Recovery Time: How fast you recover when things break (formerly called Mean Time to Restore).

These aren’t trivia—they’re the lens through which senior engineers evaluate architectural decisions. When an interviewer asks “how would you improve deployment reliability?” they expect an answer grounded in these metrics, not just “I’d add more tests.”

Pair DORA with the CALMS model (Culture, Automation, Lean, Measurement, Sharing) for behavioral questions. When you’re asked about a past conflict with a developer or how you drove adoption of a new tool, CALMS gives you a framework for structuring the answer.

Key Insight: Most candidates over-prepare on tool syntax and under-prepare on engineering philosophy. The DORA metrics and CALMS model tell you what interviewers are actually grading.

The Technical Baselines That Get You Past the First Screen

Technical interviews follow predictable patterns. The topics shift by seniority level, but two technologies show up in nearly every DevOps interview regardless of company size: Kubernetes and Terraform. If you’re not fluent in both, you’re not ready.

Kubernetes: Know the Lifecycle, Not Just the Commands

Interviewers will ask you to walk through the Kubernetes pod lifecycle. The five states you need to explain clearly:

Pending: Pod accepted by the cluster; containers not yet created (scheduling or image download in progress).
Running: Pod bound to a node; at least one container is active.
Succeeded: All containers terminated successfully; won’t restart.
Failed: All containers terminated; at least one exited with a non-zero code.
Unknown: Can’t retrieve pod state, usually due to node communication failure.

The follow-up is almost always about CrashLoopBackOff. This isn’t a lifecycle state—it’s a condition where a container keeps failing and Kubernetes keeps restarting it. Your answer: check kubectl logs <pod> for the application error, then kubectl describe pod <pod> for events like OOM kills or image pull failures.

Know the Kubernetes control plane components well enough to explain what each one does:

kube-apiserver: The single entry point for all cluster API calls. Only component that writes to etcd.
etcd: Distributed key-value store. Single source of truth for cluster state.
kube-scheduler: Assigns new pods to nodes based on resource availability and constraints.
kube-controller-manager: Runs the reconciliation loops that keep actual state matching desired state.
kubelet: Agent on each worker node that ensures containers are running per pod spec.

Terraform: State Management Is Where Interviews Are Won or Lost

Most candidates know how to write a resource block. Fewer can explain Terraform state management under adversarial conditions, which is where the real questions live.

Remote state: Always store terraform.tfstate remotely (S3, Azure Blob, GCS). Enables collaboration, supports versioning, and keeps credentials out of local environments.
State locking: Prevents concurrent terraform apply runs from corrupting state. On AWS S3 backends, DynamoDB handles the lock. A lock ID gets written to the table; concurrent runs are blocked until it’s released.
Drift detection: terraform plan compares actual infrastructure against the state file. If someone made a manual change in the console, plan shows the diff.
Replacing resources: The old terraform taint command is deprecated. Use terraform apply -replace="resource_address" instead.
Importing unmanaged resources: terraform import brings existing infrastructure into Terraform state—but it does not generate the configuration code. You write the HCL manually, then import.

Pro Tip: When asked about Terraform drift, don’t just say “run terraform plan.” Explain what causes drift (manual console changes, other automation tools touching the same resources), why it matters (state and reality diverge), and how you’d prevent it (enforce IaC-only changes via policy).

System Design: The Questions That Separate Senior from Junior

System design questions reveal how you think about failure, scale, and tradeoffs. Two scenarios come up repeatedly.

Migrating a Monolith to Microservices

The interviewer wants to hear “Strangler Fig Pattern”—a method described by Martin Fowler for incrementally extracting functionality from a legacy system. The key points:

You don’t rewrite the monolith all at once. Incremental extraction is the only safe path.
An API gateway or proxy intercepts requests. Migrated functionality routes to new microservices; everything else continues hitting the monolith.
The monolith eventually handles nothing and can be retired.

The trap answer is “I’d rewrite it.” That’s a resume-generating event disguised as a solution.

Observability vs. Monitoring

Modern interviews distinguish between monitoring (“is the system up?”) and observability (“why is it broken?”). The three pillars of observability are:

Metrics: Numerical measurements over time—CPU usage, request latency, error rate. They tell you something is wrong.
Logs: Timestamped records of discrete events. They tell you what happened.
Traces: Distributed request journeys across microservices. They tell you where the bottleneck is.

Walk the interviewer through how these interact: a metric (latency spike) triggers an alert, traces show which service is slow, logs in that service reveal the root cause. That’s the answer they’re looking for.

Behavioral Questions: Structure First, Story Second

Behavioral interviews trip up technical candidates because the skill being tested—structured communication—is different from the skill being asked about. The STAR method (Situation, Task, Action, Result) gives you a scaffold that works for any scenario.

What each component actually requires:

Situation: Brief context. One or two sentences maximum. Don’t narrate the entire backstory.
Task: Your specific responsibility, not your team’s. “We needed to fix it” is not a Task. “I was the on-call engineer responsible for restoring service” is.
Action: The specific steps you took. This is where interviewers are actually listening. Vague answers lose points here.
Result: Quantified outcome. “Latency dropped 50%” beats “things got better.”

The three scenarios you must have prepared answers for:

Production failure you caused: Focus on the fix and the post-mortem, not the blame. Interviewers want to see that you learned something and changed something.
Conflict with a developer: Data-driven resolution. “I showed them the deployment frequency metric and we agreed the release cadence was the root cause” is a good answer. “We had different opinions” is not.
Learning a new tool under pressure: What was the learning strategy? What worked? What would you do differently?

Sound familiar? These scenarios are practically identical across companies. Prepare three solid STAR stories and you’ll cover 80% of the behavioral round.

The Case Study You Need in Your Back Pocket

One case study consistently impresses senior interviewers when used correctly: Knight Capital Group. In 2012, Knight Capital lost $440 million in 45 minutes due to a deployment failure. The root causes:

A deployment flag was repurposed in new code but triggered dormant old code on any server running the old version.
The update was manually deployed to seven of eight servers. The eighth ran the old code with the new flag configuration—and executed erratic trades for 45 minutes.

The DevOps lessons are exactly what interviewers want to hear you articulate:

Deployments must be automated and atomic. Manual processes in deployment pipelines are a risk factor, not a control.
Configuration flags should not be repurposed. Dead code should be removed.
Idempotency matters: the same deployment run against any server should produce the same result.

Bring this up when asked about deployment reliability, not because you’re showing off—because it demonstrates you understand that DevOps failures have real business consequences.

Platform Engineering: The Territory You Need to Know

Interviews for senior roles increasingly touch on Platform Engineering—building the internal infrastructure that other developers use. If you’re applying for anything above mid-level, prepare for questions about Internal Developer Platforms (IDPs).

The core concepts:

Internal Developer Platform (IDP): A self-service layer that lets developers provision infrastructure without filing tickets. Engineers can spin up environments, manage deployments, and access shared services through standardized interfaces.
Golden Paths: Pre-approved templates and workflows that guide developers toward secure, compliant defaults without restricting them.
Cognitive load reduction: The goal is to let application developers focus on application code. Every ticket they have to file or manual step they have to perform is a tax on their attention.

The DORA research shows that organizations using IDPs typically deliver software faster and perform better operationally—though implementation often comes with a temporary performance dip as teams adapt. Knowing that J-curve exists, and being able to explain it, is the kind of detail that signals operational maturity.

Reality Check: If you only know how to use tools, you’re a junior candidate with senior years. The questions that advance your candidacy are the ones where you explain tradeoffs, quantify outcomes, and reference how real systems have failed.

What to Actually Do in the Next Two Weeks

The gap between “I know DevOps” and “I can ace a DevOps interview” closes faster with targeted preparation than with broad review. Here’s where to focus:

DORA metrics and CALMS: Understand both frameworks well enough to discuss them in the context of your own experience—not just as definitions.
Kubernetes pod lifecycle and control plane: Practice explaining each component out loud. If you can’t explain it simply, you don’t know it well enough yet.
Terraform state management: Know locking, drift, import, and the deprecation of terraform taint. These come up at every seniority level.
Three STAR stories: Production failure, developer conflict, and rapid skill acquisition. Have quantified results for each.
Knight Capital case study: Know the root cause and the DevOps lessons by heart.
Strangler Fig Pattern and observability pillars: The go-to answers for system design questions on monolith migration and debugging distributed systems.
Platform Engineering basics: IDP, Golden Paths, cognitive load. Required for senior roles, differentiating for mid-level.