Skip to content

Terraform — Theory

Terraform — Theory (interview deep-dive)

Section titled “Terraform — Theory (interview deep-dive)”

State is the thing that makes Terraform work and break.

  • Maps your HCL resources to real cloud IDs.
  • Tracks attributes Terraform discovered after apply.
  • Without it, every terraform plan would re-discover everything.
  • Plain JSON, may contain secrets (never commit local state).

State must be shared across team — use remote backend with locking.

  • terraform state list — what’s managed.
  • terraform state show <addr> — current attrs.
  • terraform state mv — rename (refactor without recreate).
  • terraform state rm — forget without destroying (then import elsewhere).
  • terraform import <addr> <id> — bring existing resource under management.
  • terraform refresh — re-read real state into local copy.

When real infra changes outside Terraform, state goes stale. terraform plan will show drift; apply overrides back to config (or import the change).

Drift causes: console clicks, other tools, auto-scaling.

Detection: scheduled terraform plan in CI; tools like driftctl.

  • Workspace — same config, different state (terraform workspace new prod). Quick to switch, easy to forget which one you’re in. Good for short-lived envs.
  • Separate state files / dirs — fully isolated configs per env. Recommended for prod. Use directory structure or Terragrunt.

A common pattern:

infra/
modules/{vpc, app, db}/
envs/
dev/main.tf
stg/main.tf
prod/main.tf

Each env dir has its own backend block and references modules.

Best practices:

  • One responsibility per module (e.g., a VPC, a service).
  • Pin version (~> 5).
  • Don’t expose every variable; provide sensible defaults.
  • Use outputs for stable contracts.
  • Avoid deep module nesting (max 2-3 levels).
  • Standard modules: terraform-aws-modules registry has battle-tested ones.

plan:

  1. Read state (refresh).
  2. Diff against config (and provider’s resource schema).
  3. Compute action graph: create / update (in-place or replace) / destroy.
  4. Output: + ~ -/+ - symbols.

apply executes the graph respecting dependencies. Implicit deps via reference (aws_instance.x.id creates ordering); explicit via depends_on.

# implicit — Terraform sees the reference
resource "aws_eip" "ip" { instance = aws_instance.web.id }
# explicit — for non-reference relationships (e.g., IAM eventual consistency)
resource "aws_lambda_function" "fn" {
...
depends_on = [aws_iam_role_policy_attachment.logs]
}
  • Changes to fields marked ForceNew in provider schema (e.g., aws_instance.subnet_id).
  • Plan shows # forces replacement.
  • Risk: outage during replace. Mitigate with create_before_destroy if resource supports parallel.

local-exec, remote-exec, file. Avoid if alternatives exist (Ansible, cloud-init). They run only at create/destroy, not on changes.

  • Mark variables/outputs sensitive = true to redact from logs.
  • State still contains values in plaintext! Encrypt backend, restrict access.
  • Pull secrets at runtime from KMS / Vault / Secrets Manager — don’t bake them into TF.
terraform {
required_version = ">= 1.6"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
}

Lock file .terraform.lock.hcl — commit it.

  • Terraform: HCL, broad provider, mature, simple model.
  • Pulumi: real programming languages (TS, Go, Python). Easier loops/conditionals. Steeper for non-coders.
  • OpenTofu: open-source fork of Terraform after BSL license change. Drop-in replacement.
  • CDK / CDKTF: typed code generating CloudFormation/Terraform.
  • Ansible: imperative, config mgmt — different problem domain.
  • CloudFormation: AWS-only, slow, sometimes still required.
  • Crossplane: K8s-native IaC.
  1. What’s in state, and why does it matter? Resource ID + attributes; needed to compute diff.
  2. Two engineers run apply concurrently — what happens? Without locking → corrupt state. Use S3+DynamoDB lock.
  3. You imported an existing bucket; plan still wants to recreate. Config attribute mismatches reality. Adjust HCL or import flags.
  4. A resource is gone manually. terraform plan? Plan will re-create. Use state rm if intentional.
  5. Module versioning — pin or float? Pin for prod (~> X.Y); float for libraries.
  6. count vs for_each — when? for_each for stable keys; count for symmetric replicas. Avoid mid-list deletes with count.
  7. How do you organize multi-env infra? Module per concern; env dirs reference modules with different vars; per-env state.
  8. Secrets in TF — how? Don’t put them in state. Pull at runtime via data sources from Secrets Manager / Vault.
  9. Refactoring resource without destroy? terraform state mv for renames; moved {} block (TF 1.1+) for declarative moves.
  10. TF apply taking too long. Parallelism (-parallelism), break monolithic state into smaller, use targeted plan, providers slow APIs.
  • Local state in production.
  • Massive single state file — slow plans, blast radius.
  • Hand edits via console then forgot to import.
  • Provisioners on every resource.
  • Hard-coded IDs (use data sources or remote state outputs).
  • Unpinned providers / modules.
  • terraform destroy in prod by accident — guard with prevent_destroy + IAM.
  • Letting state drift; never running plan.