Building a Three-Layer Test Foundation that Supports Continuous Improvement
Summary
| Perspective | Content |
|---|---|
| Issue | Because we depended on manual verification, regressions occurred, verification costs increased, and procedures became person-dependent, which stalled the pace of improvements and large-scale upgrades. |
| Response | Clearly defined three layers of Unit / Integration / E2E, and prepared an execution environment (CI, DB, mocks, data seeding) that makes tests easy to implement. |
| Operation | Reduced writing and maintenance costs through test templating and shared Fixtures / Builders. Established naming conventions that let you understand the cause of failure at a glance. |
| Results / Outcomes | By achieving a state where we can have “confidence that nothing is broken,” we can safely perform refactoring and dependency upgrades. Manual checks were also greatly reduced. |
Background and Issues
Because all pre-release quality checks were done manually:
- Verification costs increased with every feature addition or refactor
- We could not fully prevent regressions (breaking existing features) caused by subtle spec changes
- It was difficult to share person-dependent “verification procedures,” and we could not maintain a reproducible quality assurance process
These issues had become apparent.
Also, although introducing TypeScript ensured a certain level of type safety, it did not cover actual behavior verification, and there still remained areas where “we couldn’t notice that things were broken.”
As a result, developers could not refactor with confidence, and the team’s improvement speed hit a ceiling.
We also could not proceed with upgrades of high-impact libraries (React, webpack, express, etc.).
Research and Measurement Phase
Before introducing automated tests, we visualized the existing quality assurance process and risk structure to clarify “what must be protected.”
The goal was not simply to increase the number of tests, but to guarantee system behavior at minimal cost.
Inventory of the Quality Assurance Process
Starting from the manual checklists, we mapped items along three axes: change frequency, incident rate, and user impact.
This allowed us to define “features that change frequently and have high impact when they fail” as high priority.
We ranked each area as “priority for behavior assurance” and clarified where to invest in test development.
Evaluation of Testability
For the main modules of React and Express, we extracted functions with many side effects and areas with strong state dependence.
We planned improvements such as function separation and dependency injection (DI) for structures that hinder testability, and built a foundation for continuous test development.
Introduction and Design (System Setup)
Before “writing” tests, we prioritized preparing the environment and design so that tests run correctly.
Design of the Test Foundation and Layer Structure
We defined a three-layer structure of unit tests, integration tests, and E2E tests, and clarified the role of each.
| Layer | Main Purpose | Scope of Verification | Granularity of Target | Main Verification Viewpoints |
|---|---|---|---|---|
| Unit | Validity of logic, pure functions, and methods | Single module (no external dependencies) | Function / class level | Branch conditions, consistency of input and output, exception handling |
| Integration | Inter-module collaboration and data flow | Including DB, API, and external services | Component level / API endpoint | Request/response consistency |
| E2E (Scenario) | Actual user operations and consistency of the entire system | Browser + server | Screen operations / scenario level | UI flow, state transitions, UX reproducibility |
We documented the responsibilities of each layer and unified the granularity and execution scope of test code.
We also redesigned the structure so that side effects can be injected from the outside, ensuring testability from the design stage.
Example)
Implementation side
// app/http/HttpClient.ts
export interface HttpClient {
get<T>(url: string): Promise<T>
post<T>(url: string, body: unknown): Promise<T>
}
export const fetchClient: HttpClient = {
async get(url) { const r = await fetch(url); return r.json() },
async post(url, body) { const r = await fetch(url,{method:"POST",body:JSON.stringify(body)}); return r.json() }
}
export class ListingQuery {
constructor(private http: HttpClient) {}
async byId(id: string) { return this.http.get(`/api/listings/${id}`) }
}
Test code
const mockClient: HttpClient = {
get: jest.fn().mockResolvedValue({ id: 1, title: "mock" }),
post: jest.fn().mockResolvedValue({ ok: true }),
}
const query = new ListingQuery(mockClient)
expect(await query.byId("1")).toEqual({ id: 1, title: "mock" })
Preparation of Execution Environment and Operational Foundation
We adopted Jest (unit and integration) and Playwright (E2E) as the test foundation.
We designed the test environment and CI foundation with top priority on “being able to reliably reproduce failures.”
Measures to Maintain Reproducibility
- Fixing dependency versions (eliminating environment differences via package-lock)
- Initializing test data and fixing seeds to maintain consistent state
- Mocking external APIs (msw / nock) to remove network dependencies
- Fixing time and random numbers to suppress non-deterministic behavior
This enabled a test environment where “failures can be reproduced under the same conditions” both locally and in CI.
Mechanisms to Ensure Reliable Re-runs
- Job design that considers parallel execution performance and cache characteristics (improved stability on CI)
- Flexible adjustment of timeouts and retries to control execution independent of environment load
- Automatic saving of logs, screenshots, and traces on failure to make debugging on re-run easier
In the CI environment, we built automatic execution per PR on GitHub Actions.
We realized an operation that emphasizes reproducibility, where tests can be re-run and analyzed under the same conditions even if they fail.
Implementation (Writing and Operating Tests)
Based on the foundation prepared in the design phase, we moved to the stage of “building up” tests.
The goal was not simply to increase coverage, but to build a mechanism that reliably detects when something breaks.
Establishing Unit Tests
We rigorously verified the correspondence between input and output, focusing on functions.
- Unified naming conventions and purposes of test cases (happy path / error path / boundary values)
- Aligned test file structure one-to-one with implementation files to ensure ease of reference
This made it possible to immediately identify the smallest broken unit when code changes.
Example)
describe('addUser', () => {
describe('happy path', () => {
it('New user is registered', () => { ... })
})
describe('error cases', () => {
it('name will error if empty', () => { ... })
})
describe('boundary values', () => {
it('can register with name of 1 character', () => { ... })
})
})
Integration Tests (API, DB, Inter-Module Collaboration)
As a middle layer between unit tests and E2E tests, we designed tests to narrowly verify “dependencies that cannot be covered by a single module alone.”
The target was not “an entire feature,” but limited to the scope of one module plus its direct dependencies.
-
Integration tests for the API layer
Usingsupertest, we sent real requests at the Express handler level. We verified connections with the business logic layer and authentication middleware, and checked consistency of status codes, response structures, and validation errors. -
Integration tests for the DB access layer
We executed CRUD operations against a real MongoDB (local / container environment). We checked the impact of schema changes and index settings, and ensured consistency of type definitions, persistence, and restoration. -
Integration tests for external integration modules
For Webhooks and external API calls, we usedmsw/nodeto stub them. We hooked actual HTTP requests and verified retry control, error handling, and request structure consistency. By not fully mocking the communication layer and leaving HTTP-level interactions, we achieved integration assurance in a form close to the real environment.
To prevent data races during parallel execution, we generated independent schema names and temporary data per test, thoroughly designing tests to be re-runnable.
This layer enabled us to detect “integration inconsistencies that previously could only be noticed by E2E” in advance.
E2E Tests (Self-Contained and Reproducibility-Oriented)
We designed E2E tests based on the principles of “self-contained” and “full reproducibility.” We built a configuration that completes within a single CI job, without depending on external environments or manual operations.
Execution Pipeline (Single-Job Completion on GitHub Actions)
- Install dependencies & build
- Start the app (in the background)
- Initialize data
- Run Playwright
- Trace collection:
on-first-retry - Report output:
html
- Trace collection:
- Artifact collection & cleanup
- Save screenshots / reports / traces
- Ensure processes are terminated in the final step (equivalent to
finally)
Parallelization Strategy (Avoiding Instability)
- Prioritize sharding: split the entire test suite into multiple jobs (shards) to shorten time.
→ Safely scale via CI matrix. Less likely to create differences from local runs. - Be cautious with workers: use
workers=1as the default.
→ Avoid test flakiness caused by port conflicts, shared state, and I/O load.
Operation and Improvement Phase
After introduction, we focused operations on continuously detecting and preventing regressions.
Points of Attention During and After Setup
-
Avoid “tests for the sake of tests”
Add tests only in the necessary scope, starting from actual bugs or requirement changes. Position tests as a means for quality assurance, not as an end in themselves. -
Write tests that reveal the cause when they fail
Clarify test names and output messages. For example, enforce naming that conveys intent and preconditions in one line, such asshould return 400 when missing header. -
Reduce test maintenance costs
Introduce sharedfixtures/buildersand centrally manage test data.
Concentrate follow-up changes in one place and increase refactor tolerance. -
Use “reliability” rather than “coverage” as the metric
Instead of chasing coverage numbers, adopt “whether we can reliably notice when something breaks” as the primary metric.
In test reviews, we also discussed whether there is a “guarantee of noticing.” -
Balance with CI execution time
Optimize parallel execution and cache strategies, and maintain a test foundation that completes within 10 minutes.
Detection and Isolation of Flaky Tests
- Failure re-run policy:
retry: 2/on-first-retry: trace. - Flake rate threshold: if it exceeds X%, attach a quarantine label, exclude it from the E2E suite, and improve it separately.
- Standardization of failure logs: always save screenshots, videos, and traces as artifacts, and automatically attach reproduction steps.
Reducing Slow Tests
- Categorization of bottlenecks: network waits, DB initialization, excessive rendering, excessive dependence on E2E.
- Countermeasure catalog:
- Convert API/DB checks into integration tests (reduce dependence on E2E)
- Differential initialization of fixtures
Instead of resetting all data every time, initialize only the range needed by the test.
We designed it so that running the same process multiple times does not break the state, and greatly shortened execution time while maintaining reproducibility.
Outcomes
- We gained “confidence that nothing is broken,” which sped up decision-making for large refactors and dependency upgrades.
- We formalized knowledge and understanding of specifications gained through incident response not as documents but as test code.
Next Steps
Introduction of Differential Test Execution (Test Selection)
Instead of running all tests every time, we will introduce a mechanism that re-runs only the affected scope based on change diffs (paths, commit history, dependency graph).
- Automatically analyze test files corresponding to code changes
- Cache the dependency graph of tests and run selected tests
- Accumulate results as metadata to improve the accuracy of impact range estimation
This aims to shorten CI time while maintaining regression detection accuracy.
Dynamic Optimization of CI Parallelism (History-Based)
We will dynamically optimize CI parallelism based on test execution history.
- Collect average and P95 execution times for each test suite
- Analyze execution history and automatically adjust
--shardcount andworkerscount in the next job - Periodically rebalance and visualize resource utilization
This will allow us to control CI load in a data-driven way rather than with fixed values, optimizing the balance of time, resources, and reliability.
Related Blog Posts
References
Identifying and Improving Slow Tests
1. Network / External Dependencies (DB, API, S3, etc.)
Symptom: Multi-second blocking due to HTTP waits, DNS delays, and external rate limits.
Countermeasures (TypeScript)
- Mock HTTP with
nock/msw - Block external APIs with Playwright’s
page.route() - Speed up DB with
mongodb-memory-server/ SQLite (in-memory) - Run migrations only once before the suite
Countermeasures (Go)
- Use
httptest.Serverto localize external APIs - Make container reuse the default for
testcontainers/dockertest(reduce startup overhead)
2. sleep / Timeout Waits / Polling
Symptom: Accumulation of sleep(1000) makes the entire test suite take minutes.
Countermeasures (TypeScript)
- Use
fake timers(Jest/Vitest) - Explicitly specify the minimum timeout for
waitFor
Countermeasures (Go)
- Abstract time dependence into a
Clockinterface and inject it - Eliminate direct use of
time.Afterand use a fast clock in tests
3. Heavy Crypto / Hash / Password Processing
Symptom: Cost of bcrypt / argon2 makes a single case take hundreds of ms to seconds.
Countermeasures
- Lower the cost factor during tests
- Swap the hash function for a faster implementation (switch via DI)
4. Overuse of E2E
Symptom: E2E becomes the main tool and execution time becomes minutes.
Countermeasures
- Optimize the test pyramid: downgrade E2E to Integration where possible
- Limit E2E to critical paths
- Eliminate unnecessary
waitForTimeout
Measurement and Visualization (TypeScript)
You can extract particularly slow tests with jest-slow-test-reporter.
Also, resource consumption and handle leaks are provided as Jest options.
-
--logHeapUsage
Outputs heap usage at the end of each test file.
Enables early detection of memory leaks and cache bloat, and identification of heavy tests. -
--detectOpenHandles
Detects handles that remain open after execution (unclosed sockets, timers, etc.).
Helps find missing awaits in asynchronous processing and contributes to stabilizing E2E and integration tests.
Use only for debugging, not as a default.
Performance Optimization
Enhanced system response speed and stability through database and delivery route optimization.
Developer Productivity & Quality Automation
Maintained continuous development velocity through quality assurance automation and build pipeline improvements.
Enhanced User Experience
Improved usability and reliability from the user's perspective, including search experiences and reservation systems.