Terraform — Theory

Terraform — Theory (interview deep-dive)

State — why it matters

State is the thing that makes Terraform work and break.

Maps your HCL resources to real cloud IDs.
Tracks attributes Terraform discovered after apply.
Without it, every terraform plan would re-discover everything.
Plain JSON, may contain secrets (never commit local state).

State must be shared across team — use remote backend with locking.

State management commands

terraform state list — what’s managed.
terraform state show <addr> — current attrs.
terraform state mv — rename (refactor without recreate).
terraform state rm — forget without destroying (then import elsewhere).
terraform import <addr> <id> — bring existing resource under management.
terraform refresh — re-read real state into local copy.

Drift

When real infra changes outside Terraform, state goes stale. terraform plan will show drift; apply overrides back to config (or import the change).

Drift causes: console clicks, other tools, auto-scaling.

Detection: scheduled terraform plan in CI; tools like driftctl.

Workspaces vs separate state files

Workspace — same config, different state (terraform workspace new prod). Quick to switch, easy to forget which one you’re in. Good for short-lived envs.
Separate state files / dirs — fully isolated configs per env. Recommended for prod. Use directory structure or Terragrunt.

A common pattern:

infra/
  modules/{vpc, app, db}/
  envs/
    dev/main.tf
    stg/main.tf
    prod/main.tf

Each env dir has its own backend block and references modules.

Modules

Best practices:

One responsibility per module (e.g., a VPC, a service).
Pin version (~> 5).
Don’t expose every variable; provide sensible defaults.
Use outputs for stable contracts.
Avoid deep module nesting (max 2-3 levels).
Standard modules: terraform-aws-modules registry has battle-tested ones.

Plan and apply mechanics

plan:

Read state (refresh).
Diff against config (and provider’s resource schema).
Compute action graph: create / update (in-place or replace) / destroy.
Output: + ~ -/+ - symbols.

apply executes the graph respecting dependencies. Implicit deps via reference (aws_instance.x.id creates ordering); explicit via depends_on.

Implicit vs explicit dependencies

# implicit — Terraform sees the reference
resource "aws_eip" "ip" { instance = aws_instance.web.id }

# explicit — for non-reference relationships (e.g., IAM eventual consistency)
resource "aws_lambda_function" "fn" {
  ...
  depends_on = [aws_iam_role_policy_attachment.logs]
}

What forces replacement?

Changes to fields marked ForceNew in provider schema (e.g., aws_instance.subnet_id).
Plan shows # forces replacement.
Risk: outage during replace. Mitigate with create_before_destroy if resource supports parallel.

Provisioners

local-exec, remote-exec, file. Avoid if alternatives exist (Ansible, cloud-init). They run only at create/destroy, not on changes.

Sensitive data

Mark variables/outputs sensitive = true to redact from logs.
State still contains values in plaintext! Encrypt backend, restrict access.
Pull secrets at runtime from KMS / Vault / Secrets Manager — don’t bake them into TF.

Provider versioning

terraform {
  required_version = ">= 1.6"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

Lock file .terraform.lock.hcl — commit it.

Terraform vs alternatives

Terraform: HCL, broad provider, mature, simple model.
Pulumi: real programming languages (TS, Go, Python). Easier loops/conditionals. Steeper for non-coders.
OpenTofu: open-source fork of Terraform after BSL license change. Drop-in replacement.
CDK / CDKTF: typed code generating CloudFormation/Terraform.
Ansible: imperative, config mgmt — different problem domain.
CloudFormation: AWS-only, slow, sometimes still required.
Crossplane: K8s-native IaC.

Common interview Qs

What’s in state, and why does it matter? Resource ID + attributes; needed to compute diff.
Two engineers run apply concurrently — what happens? Without locking → corrupt state. Use S3+DynamoDB lock.
You imported an existing bucket; plan still wants to recreate. Config attribute mismatches reality. Adjust HCL or import flags.
A resource is gone manually. terraform plan? Plan will re-create. Use state rm if intentional.
Module versioning — pin or float? Pin for prod (~> X.Y); float for libraries.
count vs for_each — when? for_each for stable keys; count for symmetric replicas. Avoid mid-list deletes with count.
How do you organize multi-env infra? Module per concern; env dirs reference modules with different vars; per-env state.
Secrets in TF — how? Don’t put them in state. Pull at runtime via data sources from Secrets Manager / Vault.
Refactoring resource without destroy? terraform state mv for renames; moved {} block (TF 1.1+) for declarative moves.
TF apply taking too long. Parallelism (-parallelism), break monolithic state into smaller, use targeted plan, providers slow APIs.

Common pitfalls

Local state in production.
Massive single state file — slow plans, blast radius.
Hand edits via console then forgot to import.
Provisioners on every resource.
Hard-coded IDs (use data sources or remote state outputs).
Unpinned providers / modules.
terraform destroy in prod by accident — guard with prevent_destroy + IAM.
Letting state drift; never running plan.