Back to Home

Systems Thinking Posts

Systems Thinking

Quorum Math Meets Cache Ttl Jitter In An At-Least-One Read Architecture

The bug that started it all I ran into a weird production incident that looked like “random stale data.” The system was built around an at-least-one...

Jun 29, 2026Read more
Systems Thinking

Incident Postmortems As Feedback Control For Queue Backlog Oscillations

I used to think incident postmortems were mostly for “remembering what happened.” Then I watched a system almost learn from its failures—and fail anyw...

Jun 22, 2026Read more
Systems Thinking

The Queue Debt Ledger I Built For Incident-Free Deploys

I didn’t start out trying to build a “philosophy” tool. I started because my deploys kept “working” and still hurting us. Every time we shipped, the ...

Jun 21, 2026Read more
Systems Thinking

Modeling Cache Stampedes With A Two-Delay Feedback Loop In Python

The tiny production fire I wanted to understand A while back I chased a weird incident: response times would suddenly spike, then slowly recover—but...

Jun 19, 2026Read more
Systems Thinking

Debugging Eventual Consistency With A Deterministic “Outbox Storm” Simulator

I ran into a bug that felt haunted: data looked correct most of the time, then occasionally—usually right after a deploy or a load spike—it “snapped” ...

Jun 12, 2026Read more
Systems Thinking

Pagerduty Triage Bots And The “One-Line Blame” Trap

Last year I inherited an on-call rotation where every incident felt like the same small play: the pager went off, someone posted a terse message like ...

Jun 8, 2026Read more
Systems Thinking

A Tiny Mental Model For Debugging Event Loops With Virtual Time

Last year I got bitten by a bug that “couldn’t possibly happen”: timers were firing, yet the system behaved like they weren’t. It turned out I was usi...

Apr 18, 2026Read more
Systems Thinking

Mental Model For Debugging Event Loops With A Deterministic Time Tracer

I ran into a bug that looked “random” in production: a UI button sometimes didn’t update, but only when users clicked quickly. Locally it was fine. In...

Apr 13, 2026Read more