Reducing origin load by introducing a reverse proxy (Nginx)

nginx
cloudflare
ecs

Published on 2024/12/21

2025/03/03

This post is also available in .

 Summary

Perspective
Content


Issue
Dynamic APIs were concentrated on the application servers, causing persistently high CPU usage and response variability. Due to the direct connection from CDN to app, the system could not absorb request load.

Action
Placed an Nginx relay layer between the CDN and the app, and introduced short-TTL caching and Keep-Alive connection reuse.

Operations
Adopted a phased rollout with a design that allows immediate rollback. Visualized HIT / MISS and continuously tuned TTL and excluded paths.

Result / Outcome
Significantly reduced origin-reaching requests, stabilizing application CPU load and response variability. Improved resilience during peak times.

 Background / IssuesAlthough overall service traffic was delivered via the CDN, CPU usage on the backend (application servers) remained high due to concentration of dynamic API requests.

Load was particularly skewed toward dynamic endpoints that are hard to cache, such as dashboards for logged-in users and suggestion APIs.

During peak times, application server CPU usage stayed high and response times degraded.
In the then-current architecture, the CDN reached the application servers directly, resulting in an inefficient state where:
A TCP connection was established and torn down for each request
The application layer was responsible for maintaining and reusing Keep-Alive
Cache control for dynamic requests depended entirely on the application side
As a result, connection-establishment overhead and unnecessary regeneration processing at the application layer pushed up CPU load, and under high load, slow queries and timeouts occurred sporadically.
 ApproachTaking the characteristics of dynamic requests into account, we adopted an architecture that inserts Nginx as a relay layer between the CDN and the backend.

The goal was to avoid direct connections to the application servers and simultaneously satisfy the following three points:
Suppress repeated requests via short-term caching
Temporarily store API responses in Nginx that do not change for several to several tens of seconds.
Example: Cache results of /api/dashboard/stats and /api/search/suggest.
Reduce overhead via Keep-Alive and connection reuse
Pool and reuse connections between CDN→Nginx and Nginx→App.
Greatly reduce the number of connection establishments on the application servers and stabilize CPU load.
Absorb spikes via buffering
Buffer responses on the Nginx side to absorb momentary concurrent access.
With this, we set a policy to simultaneously expand the cacheable area and suppress origin-reaching requests.
 Comparison and evaluation phaseEven before introducing Nginx, we considered multiple options to reduce origin load.

After organizing the effects, costs, and risks of each, we chose Nginx because of its flexibility in cache control and short lead time for introduction.


Overall rating
Option
Origin load reduction
Implementation cost
Impact scope
Operational load
Risk


◎
Introduce Nginx between CDN and app (short-term cache + Keep-Alive)
◎
○
○
○
◎

○
Implement cache control on the application side (Redis, etc.)
○
△
○
△
○ (consistency / bug risk)

△
Strengthen Auto Scaling (horizontal scaling based on CPU utilization threshold)
○
○
△
○
△ (delayed scaling response)

Nginx was a choice with low introduction and tuning cost that allowed combining short-term caching and connection reuse. The decisive factors were that no changes to existing application code were required and that integration into operations and monitoring was easy.
We passed on implementing caching on the application side, which is flexible but was avoided to prevent complexity in consistency management and during failures.
 Verification phaseTo accurately understand the effect of origin load reduction and side effects from introducing Nginx, we conducted phased load tests.

At each stage, we quantitatively evaluated the impact of cache settings, connection reuse, and transparent proxy behavior.
 ① Individual componentPurpose: Verify basic performance and health of Nginx layer settings (first make it work as a transparent proxy)
Targets:
Direct to ALB (private) — obtain baseline for app alone
Nginx on ECS (Fargate) → ALB (private) — measure overhead when going via Nginx
Execution:
Measured baseline with cache OFF
Migration criteria:
Latency increase when passing through Nginx is within acceptable range
No change in 5xx occurrence rate
 ② Staging environment E2EPurpose: Verify behavior on the actual CDN path (Cloudflare → ALB(public) → Nginx → ALB(private) → App).
Target endpoint:
Send requests to the stg subdomain (e.g., stg.example.com) via Cloudflare
Execution:
Experimented with two patterns: cold (uncached) and warm (cached)
In logs, used cache status, backend response time, and total processing time as indicators to analyze cache hit rate and response speed improvements.
Success criteria (examples):
Warm responses show improvement compared to the current staging environment
MISS spikes are limited to the cold period immediately after deployment
 Migration plan and recurrence prevention Migration policyPhased deployment: First introduce to low-traffic tenants and check for anomalies via monitoring. If no issues, gradually expand to all.
Rollout conditions: Proceed to the next stage only after confirming that there is no increase in 5xx at the Nginx layer, no drop in cache hit rate, and no persistent high CPU usage.
Rollback procedure: On anomaly detection, immediately switch back to the direct ALB route. Nginx configuration is templated as immutable and can be redeployed.
 Risk managementAutomatic collection of observation metrics: Visualize request counts between Nginx and app, cache hit rate, latency, and 5xx rate using CloudWatch Logs and CloudWatch Metrics.
Maintain configuration consistency: Manage the Nginx layer configuration with Terraform to eliminate environment differences.
 Continuous improvementRegularly analyze access logs to optimize cache TTL and excluded paths.
When adding new APIs, document cache policies and make their consideration mandatory during design review.
Periodically compare “load via Nginx” and “load when direct” to verify cache effectiveness and reusability.
 EffectsBy inserting Nginx between the CDN and the backend, we were able to offload processing such as caching, connection reuse, encryption, and compression to the relay layer.

As a result, application server load stabilized and processing became less likely to stall even during peak times.
 Caching (temporary storage with short TTL)Temporarily stored results of high-traffic APIs in Nginx.
Identical requests no longer reached the origin and could be returned immediately from Nginx.
Origin CPU and DB load decreased, and responses became more stable.
 Connection reuse (Keep-Alive)Reused communication between Nginx and origin without reconnecting.
Eliminated the overhead of repeated connection establishment and teardown, reducing memory and thread consumption.
Even under high traffic, the number of connections did not spike, smoothing resource usage.
 TLS termination (offloading encryption processing)Performed encryption and decryption of communication on the Nginx side.
The origin could communicate in plaintext, reducing CPU load.
Centralized certificate management on Nginx, making updates and configuration changes easier.
 Compression (efficient data transfer)Performed gzip or brotli compression on Nginx, reducing processing on the origin side.
Could cache compressed data, eliminating the need for recompression for identical requests.
Reduced data volume improved network speed and perceived response time.
Overall, Nginx functioned as a “layer that relays while offloading processing,” achieving load leveling on servers and stabilization of responses.
 Points of ingenuity and challengesEliminating issues from double caching
Since both the CDN and Nginx have cache layers, we carefully examined overlapping TTL and Cache-Control settings.
To prevent cache incidents (stale responses remaining) when the origin changes, we separated only specific paths into short TTL.
Log extension (visualizing cache status)
Output cache status to logs and collected HIT / MISS / EXPIRED / STALE.
Visualization enabled quantitative analysis of cache efficiency and room for TTL optimization.
 Measures under consideration after introductionSelective caching of requests with high cache efficiency
Score cache effectiveness per API and make only high cost-effectiveness paths cache targets.
Utilizing browser cache
Keep individual data that cannot be cached by the CDN only in the user’s own browser for a short time to reduce re-fetching.
Improve perceived speed with short-term caching such as per session while maintaining safety.

Perspective	Content
Issue	Dynamic APIs were concentrated on the application servers, causing persistently high CPU usage and response variability. Due to the direct connection from CDN to app, the system could not absorb request load.
Action	Placed an Nginx relay layer between the CDN and the app, and introduced short-TTL caching and Keep-Alive connection reuse.
Operations	Adopted a phased rollout with a design that allows immediate rollback. Visualized HIT / MISS and continuously tuned TTL and excluded paths.
Result / Outcome	Significantly reduced origin-reaching requests, stabilizing application CPU load and response variability. Improved resilience during peak times.

Overall rating	Option	Origin load reduction	Implementation cost	Impact scope	Operational load	Risk
◎	Introduce Nginx between CDN and app (short-term cache + Keep-Alive)	◎	○	○	○	◎
○	Implement cache control on the application side (Redis, etc.)	○	△	○	△	○ (consistency / bug risk)
△	Strengthen Auto Scaling (horizontal scaling based on CPU utilization threshold)	○	○	△	○	△ (delayed scaling response)