Skip to content

Flyway / Migrations — Theory

Schema changes lead code changes; rollbacks reverse the order.

For zero-downtime deploys, every migration must be backwards-compatible with the previously deployed code, AND the new code must work with the old schema. This forces multi-step migrations for any breaking change.

  • Forward compat: old code works with new schema (you just deployed schema, code update next).
  • Backward compat: new code works with old schema (rolling deploy still has old pods running).

Bare ALTER that drops a column breaks both during the rolling transition window.

  • Versioned (Flyway, Liquibase, Alembic): imperative scripts, ordered, append-only.
  • Declarative (Atlas, pgroll): define desired schema, tool computes diff.

Versioned is simpler. Declarative reduces hand-written DDL but is harder to stage.

Flyway uses checksums to detect modified-after-applied scripts. Never edit applied migrations; new migration files only.

If you need to fix a wrong migration:

  • Add a new migration that corrects it.
  • For dev/staging, use flyway repair or rebuild.

If multiple app instances boot together and try to migrate, they race. Solutions:

  • Migration job runs once (init container with Job or argo PreSync).
  • Flyway uses an internal advisory lock.
  • Some teams gate migration on a single CI step.

Most migrations are not safely reversible in prod:

  • DROP COLUMN loses data.
  • DROP TABLE loses data.
  • A failed forward isn’t undone — the right action is “fix forward”.

Backups + point-in-time recovery are the real rollback for data-destructive ops.

  1. Rename a column without downtime? Add new col → dual-write → backfill → switch reads → drop old. 3-4 deploys.
  2. NOT NULL on 100M-row table? Add nullable, backfill in batches, then add NOT NULL constraint.
  3. Migration runs 6 hours and locks the table — what now? Cancel; redesign as online change (chunks, gh-ost, pt-osc, pgroll).
  4. Deploy succeeded, migration failed mid-way. Now what? Investigate via flyway info; manual fix or flyway repair; never silently ignore.
  5. Test migrations? Apply to fresh DB; apply to a snapshot of prod; CI integration tests.
  6. Why not ORM auto-migrate (e.g. TypeORM synchronize:true)? Unsafe in prod, no review, no audit, no rollback story.
  7. MongoDB migrations? Tools: migrate-mongo, mongock; document version field; lazy migration on read.
  • DDL + DML in same migration on huge tables.
  • Re-using version numbers.
  • “I’ll fix it on staging” — checksum mismatch carries to prod.
  • One-shot script touching 50 tables.
  • No backup before destructive migration.