Our Works

Published on 2024/06/11

2025/09/30

Performance Optimization

Enhanced system response speed and stability through database and delivery route optimization.

Speeding Up Search Responses (Eliminating N+1)

In a ticket management SaaS, we eliminated N+1 queries and optimized Preload / IN clauses, improving list API response time from 1.4s → 0.2s (about 7x faster) and reducing DB query count by 80%.

Perspective	Details
Issue	A large number of N+1 queries were occurring in the list API, issuing a huge number of SELECTs for each page fetch.
Action	Redesigned the data access layer and fetched related data in bulk using Preload. Optimized complex-condition lookups with batched retrieval using IN clauses.
Observation	Introduced structured query logs keyed by request correlation ID, visualizing query count, type, and latency in real time.
Result / Outcome	Reduced query count, stabilizing DB load and connection pool utilization. Additionally, established a state where performance regressions can be continuously detected.

Recovery from MySQL Failure and Stabilization of Write Performance

Resolved the batch I/O load that had been causing production DB failures by monthly table partitioning and INSERT optimization, shortening batch time by 70% and stabilizing free memory. No recurrence for the following three months.

Perspective	Content
Issue	The production DB failed during business hours, causing resend retries to spike and degrading overall response. In addition, visualization was insufficient, so it was impossible to detect the risk of recurrence in advance.
Initial response	Immediately set up Aurora automatic failover and read replica distribution, and established continuous monitoring of Slow Query / CloudWatch metrics / error logs.
Investigation	Performed correlation analysis of CloudWatch metrics and event logs, and identified that the batch process’s DELETE→INSERT pattern was putting pressure on memory and I/O.
Action	Partitioned the table by month and revamped the batch to TRUNCATE + multi-row INSERT. Operationally, established a combined practice of phased migration, instant rollback, and weekly reporting.
Results / Outcomes	Suppressed the risk of unplanned failover recurrence. Stabilized batch time and load fluctuation, improving overall system availability and decision-making speed.

Reducing origin load by introducing a reverse proxy (Nginx)

Introduced an Nginx reverse proxy between the CDN and the app to implement short-TTL caching and connection reuse. Reduced origin-reaching requests by 40% and stabilized CPU load and response variability.

Perspective	Content
Issue	Dynamic APIs were concentrated on the application servers, causing persistently high CPU usage and response variability. Due to the direct connection from CDN to app, the system could not absorb request load.
Action	Placed an Nginx relay layer between the CDN and the app, and introduced short-TTL caching and Keep-Alive connection reuse.
Operations	Adopted a phased rollout with a design that allows immediate rollback. Visualized HIT / MISS and continuously tuned TTL and excluded paths.
Result / Outcome	Significantly reduced origin-reaching requests, stabilizing application CPU load and response variability. Improved resilience during peak times.

Developer Productivity & Quality Automation

Maintained continuous development velocity through quality assurance automation and build pipeline improvements.

Building a Three-Layer Test Foundation that Supports Continuous Improvement

For an environment that depended on manual verification, we built a three-layer test foundation of Unit / Integration / E2E to prevent regressions, enabling continuous large-scale refactors and upgrades.

Perspective	Content
Issue	Because we depended on manual verification, regressions occurred, verification costs increased, and procedures became person-dependent, which stalled the pace of improvements and large-scale upgrades.
Response	Clearly defined three layers of Unit / Integration / E2E, and prepared an execution environment (CI, DB, mocks, data seeding) that makes tests easy to implement.
Operation	Reduced writing and maintenance costs through test templating and shared Fixtures / Builders. Established naming conventions that let you understand the cause of failure at a glance.
Results / Outcomes	By achieving a state where we can have “confidence that nothing is broken,” we can safely perform refactoring and dependency upgrades. Manual checks were also greatly reduced.

Improving development speed by renewing the build environment

In a monorepo that assumed Webpack, heavy builds and HMR were stalling development. By gradually migrating to Rspack, we significantly improved perceived speed and the review cycle.

Perspective	Content
Issue	In a large-scale monorepo based on Webpack, slow HMR, heavy initial/CI builds, and unstable caching had become chronic. As a result, review queues formed easily, and it took a long time to reflect and verify changes.
Action	Gradually switched to Rspack, which is almost compatible with Webpack. Reorganized settings such as transforms, source maps, and code splitting, and redesigned the configuration so that caching works stably. Migrated with zero downtime while checking compatibility via automated tests and E2E.
Operation	Ran dual builds with Webpack / Rspack on each PR and automatically checked differences (especially bundle size). Visualized metrics such as build time and cache efficiency, and maintained a state where continuous improvement is possible.
Result / Outcome	Clearly improved perceived HMR speed and build time. (Example: incremental build -81%, initial build -67%). Reviews became less likely to stall, and the development cycle became lighter and more stable. Was able to gain benefits step by step without breaking existing mechanisms such as SSR / Storybook / Sentry.

Enhanced User Experience

Improved usability and reliability from the user's perspective, including search experiences and reservation systems.

Improve the search experience and reduce “0 results” (query relaxation + ranking adjustment)

Improve a search experience that easily results in “0 results” due to exact match search, by gradually relaxing conditions and adjusting scoring. Make it possible to naturally present alternative suggestions, achieving convincing results while reducing churn.

Perspective	Content
Issue	Because of exact match search, even slightly off conditions easily led to “0 results,” causing users to repeat searches with small changes and eventually churn.
Approach	Introduced a mechanism that scores search candidates and displays them in order of “closest to the conditions.” Applied score tuning (`function_score` + `gauss`) so that scores decrease smoothly even when distance or match level slightly differ.
Operation	Continuously monitored relaxation steps / weights / thresholds via a dashboard, and kept validating and tuning side effects through A/B testing.
Results	Significantly reduced “0-result searches,” decreased the number of re-searches, and increased session duration. By improving acceptance of alternative suggestions, the search experience became smoother without disruption.
Impact	Delivered a search experience with a sense of substance, contributing to improved inventory utilization and booking rates. Also secured room for future ML-ization / expansion to vector search.

Asynchronizing Search Index Updates to Improve Peak-Time Resilience

We revamped ElasticSearch updates from synchronous to an asynchronous Outbox→Pub/Sub→Indexer pipeline, achieving both stability and freshness of inventory reflection even at peak times. We then standardized this approach internally and rolled it out to other APIs, enabling load leveling and reduced operational costs.

Perspective	Content
Issue	Inventory and price updates were synchronously reflected to ElasticSearch, so write processing was delayed or timed out when load concentrated. Availability became unstable in spike situations such as sales.
Response	Revamped to an asynchronous pipeline using Outbox → Pub/Sub → Indexer. In addition, adopted partial upsert to make updates lighter and apply only minimal diffs.
Outcome	While maintaining stability of the write experience, we were able to reflect updates to ElasticSearch without significant delay. Inventory freshness and search reliability now coexist even under peak load.
Ripple effects	This asynchronous update method became a standard pattern within the company. By rolling it out horizontally to other APIs and projects, we built an update foundation that combines load leveling with ease of extension, continuously reducing operational costs.

Maintain inventory consistency and sales opportunities by automatically releasing expired reservations

In response to the issue of expired reservations remaining in inventory, designed Redis TTL + background release jobs to enable automatic recovery. Reduced the unreleased rate from 7% to almost 0%, preventing loss of sales opportunities.

Perspective	Details
Issue	Some reservations were not released and remained, causing a state where inventory “appeared fully booked.” This led to lost sales opportunities and required staff to manually release reservations.
Response	Introduced a mechanism to automatically release expired reservations using Redis TTL (expiration) and background jobs. ・ Controlled with a unique key to prevent duplicate processing of the same reservation. ・ Redis acts as the trigger, while the actual inventory updates are handled by the DB.
Visualization	Turned the number of pending reservations and the elapsed time until release into metrics, and constantly monitored behavior.
Outcome	Unreleased reservations were improved to almost zero. Inventory is now updated quickly and accurately, preventing loss of sales opportunities.
Effect	Reservation API responses became more stable and lock waits decreased. Manual release work became almost unnecessary.

Infrastructure & Cost Optimization (FinOps)

Optimized cloud expenditures through enhanced monitoring and architectural re-design for long-term sustainability.

Practical AWS Cost Optimization for Large-Scale B2C Services

This article details the cost reduction measures we implemented in response to rising AWS costs due to increased traffic. It documents specific techniques for eliminating unnecessary expenses while maintaining service quality, including anomaly detection with AWS Budgets, DynamoDB table class optimization, log management with S3 Lifecycle policies, and spike mitigation using queuing.

Aspect	Details
Challenge	Costs became a black box as traffic grew. Over-provisioning for spike tolerance and abandoned, unnecessary resources were impacting business profits.
Actions	Optimized through architectural reviews, such as introducing SQS for load smoothing (separating synchronous/asynchronous paths) and reducing S3 requests by aggregating logs with Fluent Bit.
Operations	Established automated anomaly detection with AWS Budgets and resource inventory management using an `EndDate` tag. Fostered a culture of considering cost efficiency from the initial design phase.
Results & Outcomes	Reduced monthly AWS costs by 30%. Achieved infrastructure cost transparency and built a scalable, low-cost foundation aligned with business growth.

Table of Contents