Internal Development of a Go Code Quality Platform: Automating Review Culture with Ruleguard and Static Analysis
Summary
- Problem: Review comments were highly person-dependent, and quality assurance relied on individual skills.
- Response: Integrated static analysis tools to automatically detect boilerplate and oral tradition rules.
- Outcome: Reduced review workload and made quality standards reproducible at the code level.
Incident Overview
The quality of code reviews had become dependent on individual reviewers, and project-wide quality standards were not clearly documented.
In Go code in particular, many review comments focused on boilerplate “oral tradition rules” such as “test naming,” “struct tags,” and “forbidden dangerous APIs,” and the content and accuracy of comments varied by reviewer.
About 40% of review time was spent on items that could be mechanically judged, leaving insufficient resources for essential design and performance reviews.
Furthermore, rules restricting the use of os.Exit and panic were not documented, and there were confirmed cases where unexpected behavior occurred in the operations phase.
In response to this situation, we began building a quality platform by “automating review culture” and “codifying rules” using the static analysis ecosystem.
Investigation Phase
First, we collected six months of past code review comments and classified them along three axes: “reproducibility,” “automability,” and “impact scope.”
As a result, a significant portion of all reviews fell into boilerplate patterns that could be detected by static analysis.
Next, we examined the rules included in the existing golangci-lint configuration and separated the areas that could be covered by off-the-shelf linters from those that required custom rules.
The latter included project-specific struct tag formats, missing authorization checks, and test stabilization rules.
- Struct tag unification: naming mismatches between
jsonanddbtags (user_id/UserID) - Dangerous API detection: direct calls to
os.Exitorpanic - Test stabilization: flaky tests that depend on
time.Sleep - Missing authorization checks: missing calls to
Authorize()
During the investigation stage, we evaluated and selected automation targets from the following perspectives:
- Accuracy: target false positive rate of 10% or less
- Maintainability: rules must be easy to add and modify
- Educational value: developers should be able to understand the background of each rule
For each item, we compared three approaches—existing tools, custom rules, and combinations—and first experimented with codifying rules using ruleguard.
Verification Phase
We verified in three stages: introduce as warn → tune false positives → promote to error.
We applied the rules to both sample and real projects, measuring detection recall, false positive rate, and CI execution time.
As a result, recall for boilerplate comments improved from 78% to 95%, the false positive rate dropped from 14% to 6%, and CI overhead was contained to an additional +28–35 seconds per job.
Procedure (Summary)
- Create a baseline (lock current violations into
issues-baseline.json) - Check coverage of existing rules with
golangci-lint - Prototype boilerplate rules with
ruleguardand suppress false positives - Promote high-value rules to in-house
go/analysisanalyzers using type analysis - Output SARIF for PR visualization and update thresholds/rules weekly
Promotion policy
Only rules that satisfy a false positive rate ≤ 7% for two consecutive weeks are promoted to issues-exit-code: 1.
Before promotion, we submit fixup PRs (auto-formatting and rule fixes) in advance to avoid development slowdowns.
We used golangci-lint as the core and enabled staticcheck, revive, gocritic, and gocyclo/gocognit/nestif/funlen.
We implemented 2–3 “oral tradition rules” in ruleguard (test stabilization / tag unification / dangerous API prohibition).
We promoted high-value rules to go/analysis (authorization pass guarantees, audit log coverage, type-based replacements).
We used off-the-shelf tools plus SARIF for metrics, automatically attaching them to PRs to “visualize” increases and decreases.
How go/analysis Works
Go’s official static analysis framework.
You define dependencies (Requires) and the main execution body (Run) per Analyzer, and for each compilation unit you receive a Pass (AST, type information, and reporting API) and generate diagnostics.
Diagnostics can be returned to PR reviews via pass.Reportf, and automatic fixes via SuggestedFix.
Basic Structure of an Analyzer
// analyzer/example/analyzer.go
package example
import (
"go/ast"
"golang.org/x/tools/go/analysis"
"golang.org/x/tools/go/analysis/passes/inspect"
"golang.org/x/tools/go/ast/inspector"
)
var Analyzer = &analysis.Analyzer{
Name: "noexit",
Doc: "reports os.Exit usage in app code",
Requires: []*analysis.Analyzer{
inspect.Analyzer, // dependency (AST traversal utility)
},
Run: func(pass *analysis.Pass) (any, error) {
insp := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)
insp.Preorder([]ast.Node{(*ast.CallExpr)(nil)}, func(n ast.Node) {
call := n.(*ast.CallExpr)
// simple detection: find os.Exit(...)
if sel, ok := call.Fun.(*ast.SelectorExpr); ok {
if ident, ok := sel.X.(*ast.Ident); ok && ident.Name == "os" && sel.Sel.Name == "Exit" {
pass.Reportf(call.Pos(), "avoid os.Exit; return error instead")
}
}
})
return nil, nil
},
}
Type Information and Higher Accuracy
By using types.Info, you can make judgments that are unaffected by package names or aliased imports.
You resolve with pass.TypesInfo.TypeOf and pass.TypesInfo.ObjectOf and compare function call symbols in fully qualified form.
import "go/types"
func isCallTo(pass *analysis.Pass, call *ast.CallExpr, pkg, name string) bool {
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok { return false }
obj := pass.TypesInfo.ObjectOf(sel.Sel)
if obj == nil { return false }
if fn, ok := obj.(*types.Func); ok {
if fn.Pkg() != nil && fn.Pkg().Path() == pkg && fn.Name() == name {
return true
}
}
return false
}
SuggestedFix (Automatic Fixes)
If you return fix candidates along with diagnostics, editors and CI can apply automatic fixes.
You construct analysis.TextEdit and return it as SuggestedFixes.
pass.Report(analysis.Diagnostic{
Pos: pos, End: end,
Message: "use context-aware logger",
SuggestedFixes: []analysis.SuggestedFix{{
Message: "replace with log.From(ctx)",
TextEdits: []analysis.TextEdit{{
Pos: pos, End: end, NewText: []byte("log.From(ctx)"),
}},
}},
})
Composing Multiple Analyzers and Dependencies
You can reuse the results of other analyzers via Requires (for example, the result of inspect).
You modularize in-house common rules and plug them into each repository’s main (standalone execution).
// cmd/noexit/main.go
package main
import (
"golang.org/x/tools/go/analysis/singlechecker"
"your.org/analyzers/noexit"
)
func main() { singlechecker.Main(noexit.Analyzer) }
Testing (analysistest)
With analysistest, you can declaratively verify “input code → expected diagnostics.”
// want "message" comments specify the expected position and message of diagnostics.
// analyzer/example/analyzer_test.go
package example_test
import (
"testing"
"golang.org/x/tools/go/analysis/analysistest"
"your.org/analyzers/example"
)
func TestAnalyzer(t *testing.T) {
testdata := analysistest.TestData()
analysistest.Run(t, testdata, example.Analyzer, "a") // testdata/src/a/...
}
// testdata/src/a/main.go
package a
import "os"
func f() {
os.Exit(1) // want "avoid os.Exit"
}
Key Points for Pre-measurement and Tuning
- Narrow down target nodes (minimize the type array passed to Preorder).
- Keep Requires dependencies to the minimum necessary (inspect, typesinfo, etc.).
- Avoid hot paths for path resolution and string processing; share helper functions for judgments.
- In CI, use diff-based analysis (limit to changed files) plus nightly full scans.
Migration Plan and Recurrence Prevention
For introduction, we designed a phased migration under the principle of “do not overload CI and do not halt development.”
In the initial phase, we operated in warn mode and gradually tightened rules while monitoring the false positive rate.
Only stable rules were promoted to error, and we enforced them only after automatically fixing existing code via fixup PRs.
Phased Migration Process
- Phase 1: Detection only — set
issues-exit-code: 0and limit notifications to Slack - Phase 2: Warning level — output to CI results and aggregate false positive reports weekly
- Phase 3: Enforcement — promote rules to
errorthat maintain a false positive rate of 7% or less for two weeks - Phase 4: Analyzer integration — package go/analysis rules independently and move them into shared CI for all services
Operations and Recurrence Prevention Measures
- Separate the rule management repository and make rule updates visible via Pull Requests
- Automate false positive log collection in CI and adjust thresholds in monthly reviews
- Adopt SARIF output and display static analysis results directly in PR review screens
- Embed rule templates into scaffolding for new services to ensure consistent quality standards from the start
- Documentation: attach “background, intent, and avoidance examples” to each rule to reduce educational costs
This setup minimized fluctuations caused by tool updates and rule extensions and sustainably realized a state where “tools preserve review culture.”
Results and Outcomes
Quantitative Outcomes (Indicator-based / Figures Not Disclosed: Go Code Quality Platform)
We evaluated and improved based on the following indicators (figures are not disclosed for confidentiality). Each indicator includes its intent and measurement method.
| Indicator | Notes (Intent / How to Read) | Measurement Method / Caveats (Examples) |
|---|---|---|
| Review Comment Recall | How much static analysis (Analyzers) can substitute for human review comments. The “automation rate” of review culture. | Correlate PR review comments (tags such as style, safety, perf, etc.) with Analyzer detection results. Track over time. |
| False Positive Rate | Level of noise. A key factor in developer experience. | Aggregate the rate of Analyzer detections labeled won’t fix / false-positive. Monitor weekly per rule. |
| Number of Boilerplate Reviews (Reduction Rate) | Labor savings in human reviews through “automation of comments.” | Automatically classify and count PR comments that are template-like (stock phrases or bot-originated). |
| CI Overhead (Build Time Difference) | Whether the additional cost of quality assurance is within acceptable bounds. | Measure CI execution time with and without the Quality job (p50/p95/p99). Fix parallelism and cache conditions. |
| Number of Analyzer Rules / Commonization Rate | Maturity and cross-project reuse of custom rules. | Automatically compute total rule count and the ratio modularized as internal common modules in the management repository. |
| Block Rate by Severity | Balancing minimal release disruption with safety guarantees. | Extract block occurrence rates per severity (error/warn/info) from CI results. Validate relaxation scenarios when SLOs are exceeded. |
Qualitative Outcomes
-
Establishment of Review Culture
By documenting “why each rule exists,” review comments became reproducible, and even new members could maintain the same quality standards in a short time. -
Balancing Quality and Speed
We established an operation where we manage rule enforcement levels in stages and insert automatic fixes before enforcement. This allowed us to build a cycle that improves automation accuracy without causing development slowdowns. -
Improved Cross-team Reusability
By modularizing custom Analyzers and reusing them in other projects, we built a foundation for sharing consistent quality standards and security policies across the organization. -
Reduced Educational and Review Burden
By expressing previously tacit oral tradition rules in code, we reduced the cost of onboarding and reviewer training and eliminated person-dependence.
References
| Item | Possible with ruleguard Alone | go/analysis Recommended | Off-the-shelf Linters / Other Means |
|---|---|---|---|
| Cross-layer Guard (forbid domain→infra) | △ (possible if simple conditions on file paths/imports) | ◎ (strict verification of package dependency graph and exception rules) | Tools like depguard are also options |
| Authorization / Permission Style (scope check omissions) | △ (limited to checking for calls directly under handlers) | ◎ (use type info and control flow to guarantee “always passes”) | — |
| Mandatory Audit Logs / Traces (entry/exit) | △ (can check for calls at entry points) | ◎ (ensure coverage including early returns and error paths) | — |
Test Stabilization (time.Sleep ban, enforce t.Parallel()) |
◎ | — | Some can be substituted by existing tools (testpackage, gocritic) |
Error Handling Policy (%w wrap, PII log prohibition) |
○ (can encourage %w) / △ (PII is heuristic) |
◎ (PII judgment and data-flow-like checks suit analyzers) | Partially covered by staticcheck |
Struct Tag Unification (json:"snake_case", required tags) |
◎ (good at regex checks on tag strings) | — | Partially possible with revive |
API Migration (io/ioutil → io / os replacements) |
○ (simple replacements and import assistance) | ◎ (safe transformations including call-form and type differences) | Combine with existing migration tools if available |
| Bulk Fixes for Breaking Changes | △ (if pattern matching is simple) | ◎ (type resolution, argument reconstruction, safe SuggestedFix) | — |
| Large struct Value Passing → Pointer | △ (if only simple detection without size thresholds) | ◎ (estimate size with go/types.Sizes and propose changes) |
— |
| Reporting CC / NPath / Nesting Depth | — | — | Existing tools (gocyclo, gocognit, nestif, funlen) |
| Automatic PR Comments (visualizing deltas) | — | — | CI-side features (SARIF / GitHub Code Scanning, danger, etc.) |
Performance Optimization
Enhanced system response speed and stability through database and delivery route optimization.
Developer Productivity & Quality Automation
Maintained continuous development velocity through quality assurance automation and build pipeline improvements.
Enhanced User Experience
Improved usability and reliability from the user's perspective, including search experiences and reservation systems.