Testing Strategy — Basics

The test pyramid

        /\        E2E (few, slow, brittle)
       /  \
      /----\      Integration / Contract (some)
     /------\
    /--------\    Unit (many, fast, isolated)

Idea: many fast unit tests at the base, fewer mid-level integration tests, very few slow E2E tests at the top.

Modern variant: Honeycomb / Trophy — heavier on integration than pyramid suggests, because microservices logic lives at boundaries. (Kent C. Dodds’ “Testing Trophy”.)

Test types

Unit

One function/class in isolation. No I/O.
Fast (ms). Many. Run on every save.
Mock external collaborators if needed; prefer pure logic.

Integration

Multiple modules working together. Hits real DB / Redis / queue (often via Testcontainers).
Slower (100ms-1s). Catches wiring, schema, transaction bugs.

Contract (consumer-driven)

Verifies that a producer satisfies a consumer’s expected request/response shape — without running both at once.
Tools: Pact, Spring Cloud Contract.
Critical in microservices to prevent breaking downstream.

End-to-end (E2E)

Full system, real or near-real environment. Often through UI.
Slow, flaky. Use sparingly for golden paths.
Tools: Playwright, Cypress, Selenium.

Other

Smoke — quick post-deploy checks (“up?”).
Acceptance / behavior — written in business language (Cucumber, Gherkin).
Property-based — generate inputs, check invariants (fast-check, hypothesis, jqwik).
Mutation testing — change code, see if tests catch (Stryker, mutmut, PIT).
Load / performance — k6, Locust, JMeter, wrk.
Chaos — inject failures (Chaos Mesh, Toxiproxy, Gremlin).
Security — static (SAST), dynamic (DAST), dependency (Snyk, Dependabot).

Test doubles (Meszaros taxonomy)

Type	Purpose
Dummy	passed but not used (filler)
Stub	returns canned response
Spy	stub + records calls
Mock	preprogrammed expectations; verifies them
Fake	working impl, simplified (in-memory DB)

Prefer fakes for collaborators when feasible (fast, real-ish behavior). Use mocks sparingly — they couple tests to implementation.

TDD

Red → Green → Refactor.

Write a failing test capturing the next behavior.
Write minimum code to pass.
Refactor without changing behavior.

Benefits: forces design from caller’s perspective, ensures coverage, gives confidence to refactor.

Coverage

Statement / line coverage = baseline metric.
Branch coverage = better.
100% is not a goal — diminishing returns. Target ~80% with judgment.
Mutation score is more meaningful than coverage.

Test naming & structure

Arrange / Act / Assert (AAA) or Given / When / Then.
Descriptive name: creates_user_when_email_is_valid not test1.
One concept per test.
Independent — any order, any subset.

Production-grade tactics

Test fixtures: factories for realistic data (factory_bot, faker).
Snapshot tests: serialize output, diff against committed snapshot. Useful for HTML, large JSON, GraphQL responses.
Golden file tests for ETL / data pipelines.
Test parallelism: separate DB schemas/transactions per worker.
Flaky test policy: quarantine, fix, re-add. Don’t ignore.

CI considerations

Run unit + most integration on every push.
Run E2E + heavy load tests on main branch / nightly.
Fail fast: lint → unit → integration → e2e.
Cache deps for speed.
Test against the same DB version as prod.
Reproduce on local: containerized DB, test commands aligned.