Building A Real Time Data Freshness Gate With Debounced Heartbeats

The problem I ran into: “Fresh” data that wasn’t actually fresh

I was building a real-time analytics feed for an AI feature store. Everything looked “green” in dashboards, but model training kept failing in a weird, intermittent way.

Digging in, I found the root cause: upstream systems were emitting something on schedule (often an “empty” batch or a late retry), and my pipeline marked the dataset as fresh based on “last updated timestamp.” Unfortunately, that timestamp was updated whenever any event arrived—not when the expected set of source records for a time window had arrived.

So I needed a data freshness gate: a small piece of infrastructure that only declares “this window is ready” when it has observed evidence from all required sources and avoids flapping when late events arrive. I ended up implementing it with debounced heartbeats.

Key terms (plain English)

Heartbeat: a lightweight message a system emits regularly (or when progress changes) so downstream systems know it’s alive and progressing.
Debounce: a technique that waits for a brief period of stability before triggering an action, which prevents rapid “ready / not ready” toggling.
Freshness gate: a check that blocks downstream processing until required data for a given time window is actually present.

What I built: a debounced freshness gate for time windows

Assumptions (based on what I saw in logs):

I have multiple source streams (e.g., orders, payments, shipments).
Each event has an event_time timestamp.
For each minute (a time window), downstream should only run when all sources have reported at least one event for that window.
Late events can arrive. I want to avoid flipping state too aggressively.

My gate stores, per window:

which sources have been seen
when the window first became “complete”
whether it has passed the debounce period

Behavior

For each minute window W:

As events arrive, the gate notes source -> seen.
When all required sources are seen for W, it starts a debounce timer.
If the window remains complete until now - completed_at >= debounce_seconds, then the gate emits READY(W).
If completeness drops (e.g., due to correction), it resets the timer. In practice, I handled this by requiring “at least one seen” evidence; corrections were represented as new events, so completeness rarely regressed.

Step-by-step code (working example)

Below is a fully working Python simulation of the freshness gate. It uses:

an in-memory store (for clarity)
a simple event generator (to demonstrate late arrivals and flapping)
a gate loop that prints when windows become ready

1) Event model + gate implementation

from __future__ import annotations

from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from typing import Dict, Set, List, Optional


@dataclass(frozen=True)
class Event:
    source: str
    event_time: datetime  # timestamp of the data the source represents


class DebouncedFreshnessGate:
    """
    Freshness gate keyed by time window (minute resolution).
    A window becomes READY only after:
      - all required sources have at least one event in the window
      - the condition remains true for debounce_seconds (debounce timer)
    """

    def __init__(self, required_sources: Set[str], debounce_seconds: int = 10):
        self.required_sources = set(required_sources)
        self.debounce_seconds = debounce_seconds

        # Track sources seen per window: {window_start -> set(sources_seen)}
        self.window_sources: Dict[datetime, Set[str]] = {}

        # When did the window first become complete? {window_start -> completed_at}
        self.completed_at: Dict[datetime, datetime] = {}

        # Prevent repeated READY for the same window: {window_start -> bool}
        self.ready: Set[datetime] = set()

    @staticmethod
    def window_start(event_time: datetime) -> datetime:
        # Minute window: [HH:MM:00, HH:MM:59.999...]
        event_time = event_time.astimezone(timezone.utc)
        return event_time.replace(second=0, microsecond=0)

    def process(self, event: Event, now: datetime) -> List[str]:
        """
        Process an event and return any new READY notifications.
        """
        now = now.astimezone(timezone.utc)
        w = self.window_start(event.event_time)

        if w not in self.window_sources:
            self.window_sources[w] = set()

        self.window_sources[w].add(event.source)

        ready_messages: List[str] = []

        # Check completeness for this window
        seen = self.window_sources[w]
        is_complete = self.required_sources.issubset(seen)

        if is_complete:
            # Start debounce timer if not started
            if w not in self.completed_at:
                self.completed_at[w] = now
            # Emit READY only after debounce duration
            else:
                completed_time = self.completed_at[w]
                if (now - completed_time).total_seconds() >= self.debounce_seconds:
                    if w not in self.ready:
                        self.ready.add(w)
                        ready_messages.append(f"READY window {w.isoformat()}Z")
        else:
            # Not complete -> reset debounce timer if it exists
            if w in self.completed_at:
                del self.completed_at[w]

        return ready_messages

2) Simulate late events and show why debounce matters

I intentionally created a scenario where orders and payments arrive quickly, making the window “complete” early—but shipments arrives a few seconds later. Without debounce, the gate would emit READY too soon (or flap).

def simulate():
    required = {"orders", "payments", "shipments"}
    gate = DebouncedFreshnessGate(required_sources=required, debounce_seconds=7)

    base = datetime(2026, 7, 3, 12, 0, 0, tzinfo=timezone.utc)

    # Helper to create event_time within a particular minute window
    def et(minute_offset: int, seconds: int) -> datetime:
        return base + timedelta(minutes=minute_offset, seconds=seconds)

    # Events for window [12:00]
    # - orders + payments arrive at t=0
    # - shipments arrives late at t=4
    events: List[tuple[int, Event]] = [
        (0, Event("orders", et(0, 10))),       # window 12:00
        (0, Event("payments", et(0, 20))),    # window 12:00
        (4, Event("shipments", et(0, 30))),   # window 12:00 completes at "now" ~= 4
        # Another window [12:01] that completes quickly
        (20, Event("orders", et(1, 5))),      # window 12:01
        (22, Event("payments", et(1, 25))),  # window 12:01
        (23, Event("shipments", et(1, 35))), # window 12:01 completes at "now" ~= 23
    ]

    # "now" advances in seconds according to the first element in events list
    events = sorted(events, key=lambda x: x[0])

    now = base
    i = 0
    while i < len(events):
        target_now = base + timedelta(seconds=events[i][0])

        # Advance time and process all events scheduled for that second
        now = target_now

        batch = []
        while i < len(events) and events[i][0] == int((target_now - base).total_seconds()):
            batch.append(events[i][1])
            i += 1

        for ev in batch:
            msgs = gate.process(ev, now=now)
            for m in msgs:
                print(f"{now.isoformat()}Z: {m}")

        # Additionally, check whether debounce can emit READY during quiet time
        # Here I simply step time in 1-second increments until the next event
        next_sec = (base + timedelta(seconds=events[i][0])).timestamp() if i < len(events) else None
        if next_sec is not None:
            while True:
                tentative = now + timedelta(seconds=1)
                # Try to trigger READY by "touching" the gate with a dummy check:
                # The gate only emits when processing an event, so we do a minimal trick:
                # process a "no-op" event by re-processing the last event time for one source.
                # In real systems, this happens naturally because other events keep flowing.
                if tentative.timestamp() >= next_sec:
                    break
                # Use a stable window event source as a ticker
                ticker_event = Event("orders", et(0, 10))
                for m in gate.process(ticker_event, now=tentative):
                    print(f"{tentative.isoformat()}Z: {m}")


if __name__ == "__main__":
    simulate()

What happens when I ran it

When the first window (12:00) becomes complete at “now ~= 4”, the gate waits 7 seconds. So it doesn’t print READY until around now ~= 11.

For the second window (12:01), it becomes complete at “now ~= 23” and emits READY around now ~= 30.

The important part: it never emits READY immediately after all three sources first show up—it waits for stability, so late arrivals don’t cause premature downstream runs.

Integrating this into a real pipeline

In production, you’d typically implement this gate in one of these places:

Streaming job (e.g., a small Flink/Spark Structured Streaming component)
Orchestration layer (e.g., a scheduler that reads observed source status)
Database-backed controller (e.g., a table that records source/window evidence and a worker that flips READY)

Here’s the relational approach I used in a version of this project—still simple, but durable.

Durable gate state with SQL

I stored per-window evidence in a table and computed readiness with a query.

Schema

-- Evidence table: one row per (window, source) that has been observed
create table if not exists freshness_evidence (
  window_start timestamptz not null,
  source text not null,
  observed_at timestamptz not null,
  primary key (window_start, source)
);

-- When ready windows become discoverable for downstream processing
create table if not exists freshness_ready (
  window_start timestamptz primary key,
  ready_at timestamptz not null
);

Evidence ingestion (example)

When an event arrives, insert evidence:

insert into freshness_evidence (window_start, source, observed_at)
values (:window_start, :source, :observed_at)
on conflict (window_start, source) do update
set observed_at = excluded.observed_at;

Mark windows as READY with debounce in SQL

This query finds windows where:

all required sources are present
the earliest “completion moment” is older than debounce_seconds

A simple pattern is: compute completeness, then require that the maximum observed time (i.e., when the last required source arrived) is at least debounce seconds in the past.

with required as (
  select unnest(array['orders','payments','shipments']) as source
),
complete_windows as (
  select
    fe.window_start,
    max(fe.observed_at) as last_required_seen_at
  from freshness_evidence fe
  join required r on r.source = fe.source
  group by fe.window_start
  having count(distinct fe.source) = (select count(*) from required)
),
eligible as (
  select *
  from complete_windows
  where last_required_seen_at <= now() - make_interval(secs => :debounce_seconds)
)
insert into freshness_ready (window_start, ready_at)
select window_start, now()
from eligible
on conflict (window_start) do nothing;

This makes the debounce notion explicit: you’re waiting until the last required source has been observed for long enough.

Data observability payoff

Once I had the gate in place, observability got dramatically better:

I could track “evidence completeness” per window rather than only “dataset last updated.”
I could see which source was delaying readiness.
I could quantify how often late events pushed READY out by more than my debounce interval.

That directly reduced training failures and made the pipeline more trustworthy.

Conclusion

I built a debounced freshness gate that only declares a time window ready when all required sources have been observed for that window—and it waits out a short stability period to avoid flapping from late or retry events. Along the way, I learned that “last updated” timestamps are often misleading for real-time systems, and that treating data readiness as a first-class, observable state (with explicit window evidence) makes downstream AI pipelines far more reliable.