Flyway / Migrations — Theory
Flyway / Migrations — Theory (concise)
Section titled “Flyway / Migrations — Theory (concise)”The fundamental rule
Section titled “The fundamental rule”Schema changes lead code changes; rollbacks reverse the order.
For zero-downtime deploys, every migration must be backwards-compatible with the previously deployed code, AND the new code must work with the old schema. This forces multi-step migrations for any breaking change.
Forward-compatible vs backward-compatible
Section titled “Forward-compatible vs backward-compatible”- Forward compat: old code works with new schema (you just deployed schema, code update next).
- Backward compat: new code works with old schema (rolling deploy still has old pods running).
Bare ALTER that drops a column breaks both during the rolling transition window.
Versioned vs declarative
Section titled “Versioned vs declarative”- Versioned (Flyway, Liquibase, Alembic): imperative scripts, ordered, append-only.
- Declarative (Atlas, pgroll): define desired schema, tool computes diff.
Versioned is simpler. Declarative reduces hand-written DDL but is harder to stage.
Idempotency vs ordering
Section titled “Idempotency vs ordering”Flyway uses checksums to detect modified-after-applied scripts. Never edit applied migrations; new migration files only.
If you need to fix a wrong migration:
- Add a new migration that corrects it.
- For dev/staging, use
flyway repairor rebuild.
Concurrency
Section titled “Concurrency”If multiple app instances boot together and try to migrate, they race. Solutions:
- Migration job runs once (init container with
Joborargo PreSync). - Flyway uses an internal advisory lock.
- Some teams gate migration on a single CI step.
Reversibility
Section titled “Reversibility”Most migrations are not safely reversible in prod:
DROP COLUMNloses data.DROP TABLEloses data.- A failed forward isn’t undone — the right action is “fix forward”.
Backups + point-in-time recovery are the real rollback for data-destructive ops.
Common interview Qs
Section titled “Common interview Qs”- Rename a column without downtime? Add new col → dual-write → backfill → switch reads → drop old. 3-4 deploys.
- NOT NULL on 100M-row table? Add nullable, backfill in batches, then add NOT NULL constraint.
- Migration runs 6 hours and locks the table — what now? Cancel; redesign as online change (chunks, gh-ost, pt-osc, pgroll).
- Deploy succeeded, migration failed mid-way. Now what? Investigate via
flyway info; manual fix orflyway repair; never silently ignore. - Test migrations? Apply to fresh DB; apply to a snapshot of prod; CI integration tests.
- Why not ORM auto-migrate (e.g. TypeORM
synchronize:true)? Unsafe in prod, no review, no audit, no rollback story. - MongoDB migrations? Tools: migrate-mongo, mongock; document version field; lazy migration on read.
Anti-patterns
Section titled “Anti-patterns”- DDL + DML in same migration on huge tables.
- Re-using version numbers.
- “I’ll fix it on staging” — checksum mismatch carries to prod.
- One-shot script touching 50 tables.
- No backup before destructive migration.