reliability · incident response · SLO · safe change

Reliability for systems that have to keep working.

I embed with teams as an SRE to keep their systems running. Incident response, SLOs, and safe change are the day-to-day.

Consulting

Let's talk reliability

Incident response, SLOs, safe change, observability. If something around the reliability of a long-running system is weighing on you, tell me what's going on. I start by getting a clear read on where things actually stand.