Internal Development of a Go Code Quality Platform: Automating Review Culture with Ruleguard and Static Analysis

golang
react
expressjs

Published on 2023/11/21

2025/01/19

この記事はドラフト版です。

 SummaryProblem: Review comments were highly person-dependent, and quality assurance relied on individual skills.
Response: Integrated static analysis tools to automatically detect boilerplate and oral tradition rules.
Outcome: Reduced review workload and made quality standards reproducible at the code level.
 Incident OverviewThe quality of code reviews had become dependent on individual reviewers, and project-wide quality standards were not clearly documented.

In Go code in particular, many review comments focused on boilerplate “oral tradition rules” such as “test naming,” “struct tags,” and “forbidden dangerous APIs,” and the content and accuracy of comments varied by reviewer.
About 40% of review time was spent on items that could be mechanically judged, leaving insufficient resources for essential design and performance reviews.

Furthermore, rules restricting the use of os.Exit and panic were not documented, and there were confirmed cases where unexpected behavior occurred in the operations phase.
In response to this situation, we began building a quality platform by “automating review culture” and “codifying rules” using the static analysis ecosystem.
 Investigation PhaseFirst, we collected six months of past code review comments and classified them along three axes: “reproducibility,” “automability,” and “impact scope.”

As a result, a significant portion of all reviews fell into boilerplate patterns that could be detected by static analysis.
Next, we examined the rules included in the existing golangci-lint configuration and separated the areas that could be covered by off-the-shelf linters from those that required custom rules.

The latter included project-specific struct tag formats, missing authorization checks, and test stabilization rules.
Struct tag unification: naming mismatches between json and db tags (user_id / UserID)
Dangerous API detection: direct calls to os.Exit or panic
Test stabilization: flaky tests that depend on time.Sleep
Missing authorization checks: missing calls to Authorize()
During the investigation stage, we evaluated and selected automation targets from the following perspectives:
Accuracy: target false positive rate of 10% or less
Maintainability: rules must be easy to add and modify
Educational value: developers should be able to understand the background of each rule
For each item, we compared three approaches—existing tools, custom rules, and combinations—and first experimented with codifying rules using ruleguard.
 Verification PhaseWe verified in three stages: introduce as warn → tune false positives → promote to error.

We applied the rules to both sample and real projects, measuring detection recall, false positive rate, and CI execution time.

As a result, recall for boilerplate comments improved from 78% to 95%, the false positive rate dropped from 14% to 6%, and CI overhead was contained to an additional +28–35 seconds per job.
 Procedure (Summary)Create a baseline (lock current violations into issues-baseline.json)
Check coverage of existing rules with golangci-lint
Prototype boilerplate rules with ruleguard and suppress false positives
Promote high-value rules to in-house go/analysis analyzers using type analysis
Output SARIF for PR visualization and update thresholds/rules weekly
Promotion policy
Only rules that satisfy a false positive rate ≤ 7% for two consecutive weeks are promoted to issues-exit-code: 1.

Before promotion, we submit fixup PRs (auto-formatting and rule fixes) in advance to avoid development slowdowns.
We used golangci-lint as the core and enabled staticcheck, revive, gocritic, and gocyclo/gocognit/nestif/funlen.
We implemented 2–3 “oral tradition rules” in ruleguard (test stabilization / tag unification / dangerous API prohibition).
We promoted high-value rules to go/analysis (authorization pass guarantees, audit log coverage, type-based replacements).
We used off-the-shelf tools plus SARIF for metrics, automatically attaching them to PRs to “visualize” increases and decreases.
 How go/analysis WorksGo’s official static analysis framework.

You define dependencies (Requires) and the main execution body (Run) per Analyzer, and for each compilation unit you receive a Pass (AST, type information, and reporting API) and generate diagnostics.

Diagnostics can be returned to PR reviews via pass.Reportf, and automatic fixes via SuggestedFix.
 Basic Structure of an Analyzer// analyzer/example/analyzer.go
package example

import (
  "go/ast"
  "golang.org/x/tools/go/analysis"
  "golang.org/x/tools/go/analysis/passes/inspect"
  "golang.org/x/tools/go/ast/inspector"
)

var Analyzer = &analysis.Analyzer{
  Name: "noexit",
  Doc:  "reports os.Exit usage in app code",
  Requires: []*analysis.Analyzer{
    inspect.Analyzer, // dependency (AST traversal utility)
  },
  Run: func(pass *analysis.Pass) (any, error) {
    insp := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)
    insp.Preorder([]ast.Node{(*ast.CallExpr)(nil)}, func(n ast.Node) {
      call := n.(*ast.CallExpr)
      // simple detection: find os.Exit(...)
      if sel, ok := call.Fun.(*ast.SelectorExpr); ok {
        if ident, ok := sel.X.(*ast.Ident); ok && ident.Name == "os" && sel.Sel.Name == "Exit" {
          pass.Reportf(call.Pos(), "avoid os.Exit; return error instead")
        }
      }
    })
    return nil, nil
  },
}
 Type Information and Higher AccuracyBy using types.Info, you can make judgments that are unaffected by package names or aliased imports.

You resolve with pass.TypesInfo.TypeOf and pass.TypesInfo.ObjectOf and compare function call symbols in fully qualified form.
import "go/types"

func isCallTo(pass *analysis.Pass, call *ast.CallExpr, pkg, name string) bool {
  sel, ok := call.Fun.(*ast.SelectorExpr)
  if !ok { return false }
  obj := pass.TypesInfo.ObjectOf(sel.Sel)
  if obj == nil { return false }
  if fn, ok := obj.(*types.Func); ok {
    if fn.Pkg() != nil && fn.Pkg().Path() == pkg && fn.Name() == name {
      return true
    }
  }
  return false
}
 SuggestedFix (Automatic Fixes)If you return fix candidates along with diagnostics, editors and CI can apply automatic fixes.

You construct analysis.TextEdit and return it as SuggestedFixes.
pass.Report(analysis.Diagnostic{
  Pos: pos, End: end,
  Message: "use context-aware logger",
  SuggestedFixes: []analysis.SuggestedFix{{
    Message: "replace with log.From(ctx)",
    TextEdits: []analysis.TextEdit{{
      Pos: pos, End: end, NewText: []byte("log.From(ctx)"),
    }},
  }},
})
 Composing Multiple Analyzers and DependenciesYou can reuse the results of other analyzers via Requires (for example, the result of inspect).

You modularize in-house common rules and plug them into each repository’s main (standalone execution).
// cmd/noexit/main.go
package main

import (
  "golang.org/x/tools/go/analysis/singlechecker"
  "your.org/analyzers/noexit"
)

func main() { singlechecker.Main(noexit.Analyzer) }
 Testing (analysistest)With analysistest, you can declaratively verify “input code → expected diagnostics.”

// want "message" comments specify the expected position and message of diagnostics.
// analyzer/example/analyzer_test.go
package example_test

import (
  "testing"
  "golang.org/x/tools/go/analysis/analysistest"
  "your.org/analyzers/example"
)

func TestAnalyzer(t *testing.T) {
  testdata := analysistest.TestData()
  analysistest.Run(t, testdata, example.Analyzer, "a") // testdata/src/a/...
}
// testdata/src/a/main.go
package a

import "os"

func f() {
  os.Exit(1) // want "avoid os.Exit"
}
 Key Points for Pre-measurement and TuningNarrow down target nodes (minimize the type array passed to Preorder).
Keep Requires dependencies to the minimum necessary (inspect, typesinfo, etc.).
Avoid hot paths for path resolution and string processing; share helper functions for judgments.
In CI, use diff-based analysis (limit to changed files) plus nightly full scans.
 Migration Plan and Recurrence PreventionFor introduction, we designed a phased migration under the principle of “do not overload CI and do not halt development.”

In the initial phase, we operated in warn mode and gradually tightened rules while monitoring the false positive rate.

Only stable rules were promoted to error, and we enforced them only after automatically fixing existing code via fixup PRs.
 Phased Migration ProcessPhase 1: Detection only — set issues-exit-code: 0 and limit notifications to Slack
Phase 2: Warning level — output to CI results and aggregate false positive reports weekly
Phase 3: Enforcement — promote rules to error that maintain a false positive rate of 7% or less for two weeks
Phase 4: Analyzer integration — package go/analysis rules independently and move them into shared CI for all services
 Operations and Recurrence Prevention MeasuresSeparate the rule management repository and make rule updates visible via Pull Requests
Automate false positive log collection in CI and adjust thresholds in monthly reviews
Adopt SARIF output and display static analysis results directly in PR review screens
Embed rule templates into scaffolding for new services to ensure consistent quality standards from the start
Documentation: attach “background, intent, and avoidance examples” to each rule to reduce educational costs
This setup minimized fluctuations caused by tool updates and rule extensions and sustainably realized a state where “tools preserve review culture.”
 Results and Outcomes Quantitative Outcomes (Indicator-based / Figures Not Disclosed: Go Code Quality Platform)We evaluated and improved based on the following indicators (figures are not disclosed for confidentiality). Each indicator includes its intent and measurement method.


Indicator
Notes (Intent / How to Read)
Measurement Method / Caveats (Examples)


Review Comment Recall
How much static analysis (Analyzers) can substitute for human review comments. The “automation rate” of review culture.
Correlate PR review comments (tags such as style, safety, perf, etc.) with Analyzer detection results. Track over time.

False Positive Rate
Level of noise. A key factor in developer experience.
Aggregate the rate of Analyzer detections labeled won’t fix / false-positive. Monitor weekly per rule.

Number of Boilerplate Reviews (Reduction Rate)
Labor savings in human reviews through “automation of comments.”
Automatically classify and count PR comments that are template-like (stock phrases or bot-originated).

CI Overhead (Build Time Difference)
Whether the additional cost of quality assurance is within acceptable bounds.
Measure CI execution time with and without the Quality job (p50/p95/p99). Fix parallelism and cache conditions.

Number of Analyzer Rules / Commonization Rate
Maturity and cross-project reuse of custom rules.
Automatically compute total rule count and the ratio modularized as internal common modules in the management repository.

Block Rate by Severity
Balancing minimal release disruption with safety guarantees.
Extract block occurrence rates per severity (error/warn/info) from CI results. Validate relaxation scenarios when SLOs are exceeded.

 Qualitative OutcomesEstablishment of Review Culture

By documenting “why each rule exists,” review comments became reproducible, and even new members could maintain the same quality standards in a short time.
Balancing Quality and Speed

We established an operation where we manage rule enforcement levels in stages and insert automatic fixes before enforcement. This allowed us to build a cycle that improves automation accuracy without causing development slowdowns.
Improved Cross-team Reusability

By modularizing custom Analyzers and reusing them in other projects, we built a foundation for sharing consistent quality standards and security policies across the organization.
Reduced Educational and Review Burden

By expressing previously tacit oral tradition rules in code, we reduced the cost of onboarding and reviewer training and eliminated person-dependence.
 References

Item
Possible with ruleguard Alone
go/analysis Recommended
Off-the-shelf Linters / Other Means


Cross-layer Guard (forbid domain→infra)
△ (possible if simple conditions on file paths/imports)
◎ (strict verification of package dependency graph and exception rules)
Tools like depguard are also options

Authorization / Permission Style (scope check omissions)
△ (limited to checking for calls directly under handlers)
◎ (use type info and control flow to guarantee “always passes”)
—

Mandatory Audit Logs / Traces (entry/exit)
△ (can check for calls at entry points)
◎ (ensure coverage including early returns and error paths)
—

Test Stabilization (time.Sleep ban, enforce t.Parallel())
◎
—
Some can be substituted by existing tools (testpackage, gocritic)

Error Handling Policy (%w wrap, PII log prohibition)
○ (can encourage %w) / △ (PII is heuristic)
◎ (PII judgment and data-flow-like checks suit analyzers)
Partially covered by staticcheck

Struct Tag Unification (json:"snake_case", required tags)
◎ (good at regex checks on tag strings)
—
Partially possible with revive

API Migration (io/ioutil → io / os replacements)
○ (simple replacements and import assistance)
◎ (safe transformations including call-form and type differences)
Combine with existing migration tools if available

Bulk Fixes for Breaking Changes
△ (if pattern matching is simple)
◎ (type resolution, argument reconstruction, safe SuggestedFix)
—

Large struct Value Passing → Pointer
△ (if only simple detection without size thresholds)
◎ (estimate size with go/types.Sizes and propose changes)
—

Reporting CC / NPath / Nesting Depth
—
—
Existing tools (gocyclo, gocognit, nestif, funlen)

Automatic PR Comments (visualizing deltas)
—
—
CI-side features (SARIF / GitHub Code Scanning, danger, etc.)

Indicator	Notes (Intent / How to Read)	Measurement Method / Caveats (Examples)
Review Comment Recall	How much static analysis (Analyzers) can substitute for human review comments. The “automation rate” of review culture.	Correlate PR review comments (tags such as `style`, `safety`, `perf`, etc.) with Analyzer detection results. Track over time.
False Positive Rate	Level of noise. A key factor in developer experience.	Aggregate the rate of Analyzer detections labeled `won’t fix` / `false-positive`. Monitor weekly per rule.
Number of Boilerplate Reviews (Reduction Rate)	Labor savings in human reviews through “automation of comments.”	Automatically classify and count PR comments that are template-like (stock phrases or bot-originated).
CI Overhead (Build Time Difference)	Whether the additional cost of quality assurance is within acceptable bounds.	Measure CI execution time with and without the Quality job (p50/p95/p99). Fix parallelism and cache conditions.
Number of Analyzer Rules / Commonization Rate	Maturity and cross-project reuse of custom rules.	Automatically compute total rule count and the ratio modularized as internal common modules in the management repository.
Block Rate by Severity	Balancing minimal release disruption with safety guarantees.	Extract block occurrence rates per severity (`error/warn/info`) from CI results. Validate relaxation scenarios when SLOs are exceeded.

Item	Possible with ruleguard Alone	go/analysis Recommended	Off-the-shelf Linters / Other Means
Cross-layer Guard (forbid domain→infra)	△ (possible if simple conditions on file paths/imports)	◎ (strict verification of package dependency graph and exception rules)	Tools like `depguard` are also options
Authorization / Permission Style (scope check omissions)	△ (limited to checking for calls directly under handlers)	◎ (use type info and control flow to guarantee “always passes”)	—
Mandatory Audit Logs / Traces (entry/exit)	△ (can check for calls at entry points)	◎ (ensure coverage including early returns and error paths)	—
Test Stabilization (`time.Sleep` ban, enforce `t.Parallel()`)	◎	—	Some can be substituted by existing tools (`testpackage`, `gocritic`)
Error Handling Policy (`%w` wrap, PII log prohibition)	○ (can encourage `%w`) / △ (PII is heuristic)	◎ (PII judgment and data-flow-like checks suit analyzers)	Partially covered by staticcheck
Struct Tag Unification (`json:"snake_case"`, required tags)	◎ (good at regex checks on tag strings)	—	Partially possible with revive
API Migration (`io/ioutil` → `io` / `os` replacements)	○ (simple replacements and import assistance)	◎ (safe transformations including call-form and type differences)	Combine with existing migration tools if available
Bulk Fixes for Breaking Changes	△ (if pattern matching is simple)	◎ (type resolution, argument reconstruction, safe SuggestedFix)	—
Large struct Value Passing → Pointer	△ (if only simple detection without size thresholds)	◎ (estimate size with `go/types.Sizes` and propose changes)	—
Reporting CC / NPath / Nesting Depth	—	—	Existing tools (`gocyclo`, `gocognit`, `nestif`, `funlen`)
Automatic PR Comments (visualizing deltas)	—	—	CI-side features (SARIF / GitHub Code Scanning, danger, etc.)

Performance Optimization

Enhanced system response speed and stability through database and delivery route optimization.

Developer Productivity & Quality Automation

Maintained continuous development velocity through quality assurance automation and build pipeline improvements.

Enhanced User Experience

Improved usability and reliability from the user's perspective, including search experiences and reservation systems.

Table of Contents

Internal Development of a Go Code Quality Platform: Automating Review Culture with Ruleguard and Static Analysis

Summary

Incident Overview

Investigation Phase

Verification Phase

Procedure (Summary)

How go/analysis Works

Basic Structure of an Analyzer

Type Information and Higher Accuracy

SuggestedFix (Automatic Fixes)

Composing Multiple Analyzers and Dependencies

Testing (analysistest)

Key Points for Pre-measurement and Tuning

Migration Plan and Recurrence Prevention

Phased Migration Process

Operations and Recurrence Prevention Measures

Results and Outcomes

Quantitative Outcomes (Indicator-based / Figures Not Disclosed: Go Code Quality Platform)

Qualitative Outcomes

References

Performance Optimization

Developer Productivity & Quality Automation

Enhanced User Experience

目次