Building a Three-Layer Test Foundation that Supports Continuous Improvement

jest
playwright
testinglibrary
react
expressjs

Published on 2024/08/10

2024/11/19

This post is also available in .

 Summary

Perspective
Content


Issue
Because we depended on manual verification, regressions occurred, verification costs increased, and procedures became person-dependent, which stalled the pace of improvements and large-scale upgrades.

Response
Clearly defined three layers of Unit / Integration / E2E, and prepared an execution environment (CI, DB, mocks, data seeding) that makes tests easy to implement.

Operation
Reduced writing and maintenance costs through test templating and shared Fixtures / Builders.
Established naming conventions that let you understand the cause of failure at a glance.

Results / Outcomes
By achieving a state where we can have “confidence that nothing is broken,” we can safely perform refactoring and dependency upgrades. Manual checks were also greatly reduced.

 Background and IssuesBecause all pre-release quality checks were done manually:
Verification costs increased with every feature addition or refactor
We could not fully prevent regressions (breaking existing features) caused by subtle spec changes
It was difficult to share person-dependent “verification procedures,” and we could not maintain a reproducible quality assurance process
These issues had become apparent.
Also, although introducing TypeScript ensured a certain level of type safety, it did not cover actual behavior verification, and there still remained areas where “we couldn’t notice that things were broken.”
As a result, developers could not refactor with confidence, and the team’s improvement speed hit a ceiling.
We also could not proceed with upgrades of high-impact libraries (React, webpack, express, etc.).
 Research and Measurement PhaseBefore introducing automated tests, we visualized the existing quality assurance process and risk structure to clarify “what must be protected.”

The goal was not simply to increase the number of tests, but to guarantee system behavior at minimal cost.
 Inventory of the Quality Assurance ProcessStarting from the manual checklists, we mapped items along three axes: change frequency, incident rate, and user impact.

This allowed us to define “features that change frequently and have high impact when they fail” as high priority.

We ranked each area as “priority for behavior assurance” and clarified where to invest in test development.
 Evaluation of TestabilityFor the main modules of React and Express, we extracted functions with many side effects and areas with strong state dependence.

We planned improvements such as function separation and dependency injection (DI) for structures that hinder testability, and built a foundation for continuous test development.
 Introduction and Design (System Setup)Before “writing” tests, we prioritized preparing the environment and design so that tests run correctly.
 Design of the Test Foundation and Layer StructureWe defined a three-layer structure of unit tests, integration tests, and E2E tests, and clarified the role of each.


Layer
Main Purpose
Scope of Verification
Granularity of Target
Main Verification Viewpoints


Unit
Validity of logic, pure functions, and methods
Single module (no external dependencies)
Function / class level
Branch conditions, consistency of input and output, exception handling

Integration
Inter-module collaboration and data flow
Including DB, API, and external services
Component level / API endpoint
Request/response consistency

E2E (Scenario)
Actual user operations and consistency of the entire system
Browser + server
Screen operations / scenario level
UI flow, state transitions, UX reproducibility

We documented the responsibilities of each layer and unified the granularity and execution scope of test code.

We also redesigned the structure so that side effects can be injected from the outside, ensuring testability from the design stage.
Example)
Implementation side
// app/http/HttpClient.ts
export interface HttpClient {
  get<T>(url: string): Promise<T>
  post<T>(url: string, body: unknown): Promise<T>
}

export const fetchClient: HttpClient = {
  async get(url)  { const r = await fetch(url); return r.json() },
  async post(url, body) { const r = await fetch(url,{method:"POST",body:JSON.stringify(body)}); return r.json() }
}

export class ListingQuery {
  constructor(private http: HttpClient) {}
  async byId(id: string) { return this.http.get(`/api/listings/${id}`) }
}
Test code
const mockClient: HttpClient = {
  get: jest.fn().mockResolvedValue({ id: 1, title: "mock" }),
  post: jest.fn().mockResolvedValue({ ok: true }),
}

const query = new ListingQuery(mockClient)
expect(await query.byId("1")).toEqual({ id: 1, title: "mock" })
 Preparation of Execution Environment and Operational FoundationWe adopted Jest (unit and integration) and Playwright (E2E) as the test foundation.

We designed the test environment and CI foundation with top priority on “being able to reliably reproduce failures.”
 Measures to Maintain ReproducibilityFixing dependency versions (eliminating environment differences via package-lock)
Initializing test data and fixing seeds to maintain consistent state
Mocking external APIs (msw / nock) to remove network dependencies
Fixing time and random numbers to suppress non-deterministic behavior
This enabled a test environment where “failures can be reproduced under the same conditions” both locally and in CI.
 Mechanisms to Ensure Reliable Re-runsJob design that considers parallel execution performance and cache characteristics (improved stability on CI)
Flexible adjustment of timeouts and retries to control execution independent of environment load
Automatic saving of logs, screenshots, and traces on failure to make debugging on re-run easier
In the CI environment, we built automatic execution per PR on GitHub Actions.

We realized an operation that emphasizes reproducibility, where tests can be re-run and analyzed under the same conditions even if they fail.
 Implementation (Writing and Operating Tests)Based on the foundation prepared in the design phase, we moved to the stage of “building up” tests.

The goal was not simply to increase coverage, but to build a mechanism that reliably detects when something breaks.
 Establishing Unit TestsWe rigorously verified the correspondence between input and output, focusing on functions.
Unified naming conventions and purposes of test cases (happy path / error path / boundary values)
Aligned test file structure one-to-one with implementation files to ensure ease of reference
This made it possible to immediately identify the smallest broken unit when code changes.
Example)
describe('addUser', () => {
  describe('happy path', () => {
    it('New user is registered', () => { ... })
  })

  describe('error cases', () => {
    it('name will error if empty', () => { ... })
  })

  describe('boundary values', () => {
    it('can register with name of 1 character', () => { ... })
  })
})
 Integration Tests (API, DB, Inter-Module Collaboration)As a middle layer between unit tests and E2E tests, we designed tests to narrowly verify “dependencies that cannot be covered by a single module alone.”
The target was not “an entire feature,” but limited to the scope of one module plus its direct dependencies.
Integration tests for the API layer

Using supertest, we sent real requests at the Express handler level. We verified connections with the business logic layer and authentication middleware, and checked consistency of status codes, response structures, and validation errors.
Integration tests for the DB access layer

We executed CRUD operations against a real MongoDB (local / container environment). We checked the impact of schema changes and index settings, and ensured consistency of type definitions, persistence, and restoration.
Integration tests for external integration modules

For Webhooks and external API calls, we used msw/node to stub them. We hooked actual HTTP requests and verified retry control, error handling, and request structure consistency. By not fully mocking the communication layer and leaving HTTP-level interactions, we achieved integration assurance in a form close to the real environment.
To prevent data races during parallel execution, we generated independent schema names and temporary data per test, thoroughly designing tests to be re-runnable.
This layer enabled us to detect “integration inconsistencies that previously could only be noticed by E2E” in advance.
 E2E Tests (Self-Contained and Reproducibility-Oriented)We designed E2E tests based on the principles of “self-contained” and “full reproducibility.” We built a configuration that completes within a single CI job, without depending on external environments or manual operations.
 Execution Pipeline (Single-Job Completion on GitHub Actions)Install dependencies & build
Start the app (in the background)
Initialize data
Run Playwright
Trace collection: on-first-retry
Report output: html

Artifact collection & cleanup
Save screenshots / reports / traces
Ensure processes are terminated in the final step (equivalent to finally)

 Parallelization Strategy (Avoiding Instability)Prioritize sharding: split the entire test suite into multiple jobs (shards) to shorten time.

→ Safely scale via CI matrix. Less likely to create differences from local runs.
Be cautious with workers: use workers=1 as the default.

→ Avoid test flakiness caused by port conflicts, shared state, and I/O load.
 Operation and Improvement PhaseAfter introduction, we focused operations on continuously detecting and preventing regressions.
 Points of Attention During and After SetupAvoid “tests for the sake of tests”

Add tests only in the necessary scope, starting from actual bugs or requirement changes. Position tests as a means for quality assurance, not as an end in themselves.
Write tests that reveal the cause when they fail

Clarify test names and output messages. For example, enforce naming that conveys intent and preconditions in one line, such as should return 400 when missing header.
Reduce test maintenance costs

Introduce shared fixtures / builders and centrally manage test data.

Concentrate follow-up changes in one place and increase refactor tolerance.
Use “reliability” rather than “coverage” as the metric

Instead of chasing coverage numbers, adopt “whether we can reliably notice when something breaks” as the primary metric.

In test reviews, we also discussed whether there is a “guarantee of noticing.”
Balance with CI execution time

Optimize parallel execution and cache strategies, and maintain a test foundation that completes within 10 minutes.
 Detection and Isolation of Flaky TestsFailure re-run policy: retry: 2 / on-first-retry: trace.
Flake rate threshold: if it exceeds X%, attach a quarantine label, exclude it from the E2E suite, and improve it separately.
Standardization of failure logs: always save screenshots, videos, and traces as artifacts, and automatically attach reproduction steps.
 Reducing Slow TestsCategorization of bottlenecks: network waits, DB initialization, excessive rendering, excessive dependence on E2E.
Countermeasure catalog:
Convert API/DB checks into integration tests (reduce dependence on E2E)
Differential initialization of fixtures

Instead of resetting all data every time, initialize only the range needed by the test.

We designed it so that running the same process multiple times does not break the state, and greatly shortened execution time while maintaining reproducibility.

 OutcomesWe gained “confidence that nothing is broken,” which sped up decision-making for large refactors and dependency upgrades.
We formalized knowledge and understanding of specifications gained through incident response not as documents but as test code.
 Next Steps Introduction of Differential Test Execution (Test Selection)Instead of running all tests every time, we will introduce a mechanism that re-runs only the affected scope based on change diffs (paths, commit history, dependency graph).
Automatically analyze test files corresponding to code changes
Cache the dependency graph of tests and run selected tests
Accumulate results as metadata to improve the accuracy of impact range estimation
This aims to shorten CI time while maintaining regression detection accuracy.
 Dynamic Optimization of CI Parallelism (History-Based)We will dynamically optimize CI parallelism based on test execution history.
Collect average and P95 execution times for each test suite
Analyze execution history and automatically adjust --shard count and workers count in the next job
Periodically rebalance and visualize resource utilization
This will allow us to control CI load in a data-driven way rather than with fixed values, optimizing the balance of time, resources, and reliability.
 Related Blog Postshttps://shinagawa-web.com/en/blogs/test-automation-enhancement
https://shinagawa-web.com/en/blogs/nextjs-app-router-testing-setup
 References Identifying and Improving Slow Tests 1. Network / External Dependencies (DB, API, S3, etc.)Symptom: Multi-second blocking due to HTTP waits, DNS delays, and external rate limits.
Countermeasures (TypeScript)
Mock HTTP with nock / msw
Block external APIs with Playwright’s page.route()
Speed up DB with mongodb-memory-server / SQLite (in-memory)
Run migrations only once before the suite
Countermeasures (Go)
Use httptest.Server to localize external APIs
Make container reuse the default for testcontainers / dockertest (reduce startup overhead)
 2. sleep / Timeout Waits / PollingSymptom: Accumulation of sleep(1000) makes the entire test suite take minutes.
Countermeasures (TypeScript)
Use fake timers (Jest/Vitest)
Explicitly specify the minimum timeout for waitFor
Countermeasures (Go)
Abstract time dependence into a Clock interface and inject it
Eliminate direct use of time.After and use a fast clock in tests
 3. Heavy Crypto / Hash / Password ProcessingSymptom: Cost of bcrypt / argon2 makes a single case take hundreds of ms to seconds.
Countermeasures
Lower the cost factor during tests
Swap the hash function for a faster implementation (switch via DI)
 4. Overuse of E2ESymptom: E2E becomes the main tool and execution time becomes minutes.
Countermeasures
Optimize the test pyramid: downgrade E2E to Integration where possible
Limit E2E to critical paths
Eliminate unnecessary waitForTimeout
 Measurement and Visualization (TypeScript)You can extract particularly slow tests with jest-slow-test-reporter.
https://github.com/jodonnell/jest-slow-test-reporter
Also, resource consumption and handle leaks are provided as Jest options.
--logHeapUsage

Outputs heap usage at the end of each test file.

Enables early detection of memory leaks and cache bloat, and identification of heavy tests.
--detectOpenHandles

Detects handles that remain open after execution (unclosed sockets, timers, etc.).

Helps find missing awaits in asynchronous processing and contributes to stabilizing E2E and integration tests.

Use only for debugging, not as a default.

Perspective	Content
Issue	Because we depended on manual verification, regressions occurred, verification costs increased, and procedures became person-dependent, which stalled the pace of improvements and large-scale upgrades.
Response	Clearly defined three layers of Unit / Integration / E2E, and prepared an execution environment (CI, DB, mocks, data seeding) that makes tests easy to implement.
Operation	Reduced writing and maintenance costs through test templating and shared Fixtures / Builders. Established naming conventions that let you understand the cause of failure at a glance.
Results / Outcomes	By achieving a state where we can have “confidence that nothing is broken,” we can safely perform refactoring and dependency upgrades. Manual checks were also greatly reduced.

Layer	Main Purpose	Scope of Verification	Granularity of Target	Main Verification Viewpoints
Unit	Validity of logic, pure functions, and methods	Single module (no external dependencies)	Function / class level	Branch conditions, consistency of input and output, exception handling
Integration	Inter-module collaboration and data flow	Including DB, API, and external services	Component level / API endpoint	Request/response consistency
E2E (Scenario)	Actual user operations and consistency of the entire system	Browser + server	Screen operations / scenario level	UI flow, state transitions, UX reproducibility