Systems ThinkingJuly 5, 2026

Weekend Notes On Designing A Deterministic “Incident Timeline” For Event-Driven Systems

E

Written by

Elena Holos

The problem I couldn’t ignore

I once ran an incident where the system behaved “correctly” by every metric—but the story didn’t line up. Alerts told one narrative, logs told another, and our runbook assumed a third. Afterward, we spent hours arguing about what happened first, because we treated the timeline as a byproduct of debugging instead of a first-class artifact.

That experience pushed me toward a very specific design goal:

Build a deterministic incident timeline generator that can turn a stream of events into a single, reproducible “what happened” narrative—even when events arrive out of order.

This is a tech philosophy choice: when reality is messy (distributed systems are), I try to make the process of understanding deterministic.


The philosophy choice: “Reproducibility beats intuition”

Systems thinking says components interact through feedback loops, delays, and couplings. In incidents, the coupling is between:

  • Event ingestion timing (arrival order)
  • Causality (what really caused what)
  • Human interpretation (how we narrate the incident)

If you can’t reproduce the same incident timeline from the same underlying events, you can’t reliably compare “today’s fix” against “yesterday’s failure.” So I designed the timeline builder to be deterministic.

Two principles guided me:

  1. Sort using explicit ordering rules, not arrival order.
  2. Break ties consistently using stable identifiers, so two runs produce byte-for-byte identical output.

A concrete model I used

Each event record had:

  • event_id: unique stable ID (string)
  • emitted_at: when the system claims it emitted the event (integer timestamp)
  • received_at: when my collector received it (integer timestamp)
  • correlation_id: to group related events (string)
  • type: e.g. request_started, request_failed, service_scaled
  • causal_ref: optional pointer to another event by event_id

Key detail: causal references

If an event declares causal_ref, I can place it relative to the event it causally depends on. If it doesn’t, I fall back to time ordering.


Step-by-step: deterministic timeline builder in code

Below is a working Python implementation that:

  1. Builds a dependency graph from causal_ref.
  2. Produces a topological ordering (a linearization consistent with dependencies).
  3. Uses deterministic tie-breaking so the ordering is stable.
  4. Emits a timeline grouped by correlation_id.
from __future__ import annotations from dataclasses import dataclass from typing import Optional, List, Dict, Tuple import heapq import json @dataclass(frozen=True) class Event: event_id: str emitted_at: int received_at: int correlation_id: str type: str causal_ref: Optional[str] = None def deterministic_timeline(events: List[Event]) -> Dict[str, List[dict]]: """ Returns a deterministic timeline grouped by correlation_id. Determinism rules: - Primary ordering comes from causal dependencies (causal_ref graph). - When multiple events are available to schedule next, ties are resolved using: (emitted_at, received_at, event_id) """ # Group events by correlation id first (keeps timelines readable) by_corr: Dict[str, List[Event]] = {} for e in events: by_corr.setdefault(e.correlation_id, []).append(e) timeline_by_corr: Dict[str, List[dict]] = {} for corr_id, corr_events in by_corr.items(): # Index for quick lookup by_id: Dict[str, Event] = {e.event_id: e for e in corr_events} # Build adjacency list: ref_event -> list of dependent events dependents: Dict[str, List[str]] = {e.event_id: [] for e in corr_events} indegree: Dict[str, int] = {e.event_id: 0 for e in corr_events} for e in corr_events: if e.causal_ref is not None and e.causal_ref in by_id: # Edge: causal_ref -> e dependents[e.causal_ref].append(e.event_id) indegree[e.event_id] += 1 # Priority queue for "available" nodes (indegree == 0) # Heap key provides deterministic tie-breaking. def heap_key(event_id: str) -> Tuple[int, int, str]: ev = by_id[event_id] return (ev.emitted_at, ev.received_at, ev.event_id) heap: List[Tuple[Tuple[int, int, str], str]] = [] for event_id, deg in indegree.items(): if deg == 0: heapq.heappush(heap, (heap_key(event_id), event_id)) ordered: List[str] = [] while heap: _, event_id = heapq.heappop(heap) ordered.append(event_id) for child in dependents[event_id]: indegree[child] -= 1 if indegree[child] == 0: heapq.heappush(heap, (heap_key(child), child)) # If there are cycles (bad causal data), fall back deterministically by time. # Cycles should be rare; this is defensive programming. if len(ordered) != len(corr_events): ordered = sorted( [e.event_id for e in corr_events], key=lambda eid: heap_key(eid) ) timeline_by_corr[corr_id] = [ { "event_id": by_id[eid].event_id, "type": by_id[eid].type, "emitted_at": by_id[eid].emitted_at, "received_at": by_id[eid].received_at, "causal_ref": by_id[eid].causal_ref, } for eid in ordered ] return timeline_by_corr if __name__ == "__main__": # Example: arrival order is intentionally scrambled. raw = [ {"event_id": "e3", "emitted_at": 300, "received_at": 305, "correlation_id": "c1", "type": "request_failed", "causal_ref": "e2"}, {"event_id": "e2", "emitted_at": 200, "received_at": 310, "correlation_id": "c1", "type": "timeout_detected", "causal_ref": "e1"}, {"event_id": "e1", "emitted_at": 100, "received_at": 320, "correlation_id": "c1", "type": "request_started", "causal_ref": None}, # Another correlation group {"event_id": "e4", "emitted_at": 150, "received_at": 400, "correlation_id": "c2", "type": "service_scaled", "causal_ref": None}, {"event_id": "e5", "emitted_at": 160, "received_at": 410, "correlation_id": "c2", "type": "latency_recovered", "causal_ref": "e4"}, ] events = [Event(**r) for r in raw] timeline = deterministic_timeline(events) # Print deterministically: sorting keys makes the output stable too. print(json.dumps(timeline, indent=2, sort_keys=True))

What each block is doing (and why)

  • Grouping by correlation_id: I want each timeline to represent one “thread” of causality (e.g., one request), not a blended soup of unrelated events.

  • Graph construction: If causal_ref is present and known, I create a directed edge: causal_ref -> event. This turns narrative into structure.

  • Topological ordering: A topological sort produces an ordering that respects dependencies. That means if event e2 claims it was caused by e1, the timeline won’t put e2 before e1.

  • Deterministic tie-breaking using a heap key: When multiple events are “available” (no unmet causal dependencies), I pick the next one using: (emitted_at, received_at, event_id)

    This is the crucial philosophy part: I don’t trust whatever order events came in from the network or collector. I trust the explicit rules.

  • Cycle handling fallback: If the causal graph is inconsistent (cycles), topological sorting can’t produce an ordering. I fall back to deterministic time ordering so the output is still reproducible.


“What happens when I run this?”

In the example, the input list is shuffled so arrival order is misleading:

  • e3 arrives first in the raw list, but it depends on e2
  • e2 depends on e1

A deterministic timeline generator should still output:

  1. request_started (e1)
  2. timeout_detected (e2)
  3. request_failed (e3)

When I run the script, the printed JSON reflects exactly that ordering, and the order won’t change across runs because the tie-breaker includes event_id.


The systems thinking connection

This small artifact helps with feedback loops in incident response:

  • Without determinism, teams create interpretation variance (“I think it happened first…”).
  • With determinism, the organization converges on a stable shared model: “Given these events, the timeline is X.”
  • That stabilizes learning—postmortems become comparisons against the same narrative, not re-litigated mysteries.

In other words, I treat debugging as a component in the system, not a side quest.


Practical takeaway I carried forward

I stopped thinking of logs as text we scan manually and started thinking of them as inputs to a deterministic transformation that produces an incident “story” we can trust. The philosophical shift is simple: make the understanding pipeline reproducible, so the system’s behavior can be improved rather than endlessly debated.

In the end, I learned that deterministic incident timelines aren’t about fancy algorithms—they’re a systems-thinking move that turns messy event streams into a stable narrative you can learn from.