Reducing origin load by introducing a reverse proxy (Nginx)
Summary
| Perspective | Content |
|---|---|
| Issue | Dynamic APIs were concentrated on the application servers, causing persistently high CPU usage and response variability. Due to the direct connection from CDN to app, the system could not absorb request load. |
| Action | Placed an Nginx relay layer between the CDN and the app, and introduced short-TTL caching and Keep-Alive connection reuse. |
| Operations | Adopted a phased rollout with a design that allows immediate rollback. Visualized HIT / MISS and continuously tuned TTL and excluded paths. |
| Result / Outcome | Significantly reduced origin-reaching requests, stabilizing application CPU load and response variability. Improved resilience during peak times. |
Background / Issues
Although overall service traffic was delivered via the CDN, CPU usage on the backend (application servers) remained high due to concentration of dynamic API requests.
Load was particularly skewed toward dynamic endpoints that are hard to cache, such as dashboards for logged-in users and suggestion APIs.
During peak times, application server CPU usage stayed high and response times degraded.
In the then-current architecture, the CDN reached the application servers directly, resulting in an inefficient state where:
- A TCP connection was established and torn down for each request
- The application layer was responsible for maintaining and reusing Keep-Alive
- Cache control for dynamic requests depended entirely on the application side
As a result, connection-establishment overhead and unnecessary regeneration processing at the application layer pushed up CPU load, and under high load, slow queries and timeouts occurred sporadically.
Approach
Taking the characteristics of dynamic requests into account, we adopted an architecture that inserts Nginx as a relay layer between the CDN and the backend.
The goal was to avoid direct connections to the application servers and simultaneously satisfy the following three points:
-
Suppress repeated requests via short-term caching
- Temporarily store API responses in Nginx that do not change for several to several tens of seconds.
- Example: Cache results of
/api/dashboard/statsand/api/search/suggest.
-
Reduce overhead via Keep-Alive and connection reuse
- Pool and reuse connections between CDN→Nginx and Nginx→App.
- Greatly reduce the number of connection establishments on the application servers and stabilize CPU load.
-
Absorb spikes via buffering
- Buffer responses on the Nginx side to absorb momentary concurrent access.
With this, we set a policy to simultaneously expand the cacheable area and suppress origin-reaching requests.
Comparison and evaluation phase
Even before introducing Nginx, we considered multiple options to reduce origin load.
After organizing the effects, costs, and risks of each, we chose Nginx because of its flexibility in cache control and short lead time for introduction.
| Overall rating | Option | Origin load reduction | Implementation cost | Impact scope | Operational load | Risk |
|---|---|---|---|---|---|---|
| ◎ | Introduce Nginx between CDN and app (short-term cache + Keep-Alive) | ◎ | ○ | ○ | ○ | ◎ |
| ○ | Implement cache control on the application side (Redis, etc.) | ○ | △ | ○ | △ | ○ (consistency / bug risk) |
| △ | Strengthen Auto Scaling (horizontal scaling based on CPU utilization threshold) | ○ | ○ | △ | ○ | △ (delayed scaling response) |
Nginx was a choice with low introduction and tuning cost that allowed combining short-term caching and connection reuse. The decisive factors were that no changes to existing application code were required and that integration into operations and monitoring was easy.
- We passed on implementing caching on the application side, which is flexible but was avoided to prevent complexity in consistency management and during failures.
Verification phase
To accurately understand the effect of origin load reduction and side effects from introducing Nginx, we conducted phased load tests.
At each stage, we quantitatively evaluated the impact of cache settings, connection reuse, and transparent proxy behavior.
① Individual component
Purpose: Verify basic performance and health of Nginx layer settings (first make it work as a transparent proxy)
Targets:
- Direct to ALB (private) — obtain baseline for app alone
- Nginx on ECS (Fargate) → ALB (private) — measure overhead when going via Nginx
Execution:
- Measured baseline with cache OFF
Migration criteria:
- Latency increase when passing through Nginx is within acceptable range
- No change in 5xx occurrence rate
② Staging environment E2E
Purpose: Verify behavior on the actual CDN path (Cloudflare → ALB(public) → Nginx → ALB(private) → App).
Target endpoint:
- Send requests to the stg subdomain (e.g.,
stg.example.com) via Cloudflare
Execution:
- Experimented with two patterns: cold (uncached) and warm (cached)
- In logs, used cache status, backend response time, and total processing time as indicators to analyze cache hit rate and response speed improvements.
Success criteria (examples):
- Warm responses show improvement compared to the current staging environment
- MISS spikes are limited to the cold period immediately after deployment
Migration plan and recurrence prevention
Migration policy
- Phased deployment: First introduce to low-traffic tenants and check for anomalies via monitoring. If no issues, gradually expand to all.
- Rollout conditions: Proceed to the next stage only after confirming that there is no increase in 5xx at the Nginx layer, no drop in cache hit rate, and no persistent high CPU usage.
- Rollback procedure: On anomaly detection, immediately switch back to the direct ALB route. Nginx configuration is templated as immutable and can be redeployed.
Risk management
- Automatic collection of observation metrics: Visualize request counts between Nginx and app, cache hit rate, latency, and 5xx rate using CloudWatch Logs and CloudWatch Metrics.
- Maintain configuration consistency: Manage the Nginx layer configuration with Terraform to eliminate environment differences.
Continuous improvement
- Regularly analyze access logs to optimize cache TTL and excluded paths.
- When adding new APIs, document cache policies and make their consideration mandatory during design review.
- Periodically compare “load via Nginx” and “load when direct” to verify cache effectiveness and reusability.
Effects
By inserting Nginx between the CDN and the backend, we were able to offload processing such as caching, connection reuse, encryption, and compression to the relay layer.
As a result, application server load stabilized and processing became less likely to stall even during peak times.
Caching (temporary storage with short TTL)
- Temporarily stored results of high-traffic APIs in Nginx.
- Identical requests no longer reached the origin and could be returned immediately from Nginx.
- Origin CPU and DB load decreased, and responses became more stable.
Connection reuse (Keep-Alive)
- Reused communication between Nginx and origin without reconnecting.
- Eliminated the overhead of repeated connection establishment and teardown, reducing memory and thread consumption.
- Even under high traffic, the number of connections did not spike, smoothing resource usage.
TLS termination (offloading encryption processing)
- Performed encryption and decryption of communication on the Nginx side.
- The origin could communicate in plaintext, reducing CPU load.
- Centralized certificate management on Nginx, making updates and configuration changes easier.
Compression (efficient data transfer)
- Performed gzip or brotli compression on Nginx, reducing processing on the origin side.
- Could cache compressed data, eliminating the need for recompression for identical requests.
- Reduced data volume improved network speed and perceived response time.
Overall, Nginx functioned as a “layer that relays while offloading processing,” achieving load leveling on servers and stabilization of responses.
Points of ingenuity and challenges
-
Eliminating issues from double caching
- Since both the CDN and Nginx have cache layers, we carefully examined overlapping TTL and Cache-Control settings.
- To prevent cache incidents (stale responses remaining) when the origin changes, we separated only specific paths into short TTL.
-
Log extension (visualizing cache status)
- Output cache status to logs and collected
HIT/MISS/EXPIRED/STALE. - Visualization enabled quantitative analysis of cache efficiency and room for TTL optimization.
- Output cache status to logs and collected
Measures under consideration after introduction
-
Selective caching of requests with high cache efficiency
- Score cache effectiveness per API and make only high cost-effectiveness paths cache targets.
-
Utilizing browser cache
- Keep individual data that cannot be cached by the CDN only in the user’s own browser for a short time to reduce re-fetching.
- Improve perceived speed with short-term caching such as per session while maintaining safety.
Performance Optimization
Enhanced system response speed and stability through database and delivery route optimization.
Developer Productivity & Quality Automation
Maintained continuous development velocity through quality assurance automation and build pipeline improvements.
Enhanced User Experience
Improved usability and reliability from the user's perspective, including search experiences and reservation systems.