Building A Threat Intelligence “Stix Mini-Linter” For Suspicious Jarm Clusters

I fell into this rabbit hole because I kept seeing the same pattern in my telemetry: outbound connections from a service to a set of TLS endpoints kept failing in bursts, and the failures looked “random”… until I realized they weren’t.

The clue was JARM. JARM is a TLS fingerprinting technique that generates a stable identifier from a server’s TLS handshake behavior. In practice, that means different servers (CDNs, load balancers, misconfigured hosts) can share the same “fingerprint signature,” which can be clustered.

I wanted a way to automatically validate threat-intel indicators that referenced these clusters—specifically to catch malformed or self-contradicting STIX 2.1 bundles before they ever influenced enforcement decisions. So I built a tiny STIX mini-linter that checks whether JARM-derived cluster indicators are internally consistent.

This post walks through the project end-to-end: generating a STIX bundle, linting it, and explaining what the linter catches.

The niche problem I ran into

I had threat intel that came in as a STIX 2.1 bundle (STIX stands for Structured Threat Information eXpression—a standard JSON format for exchanging cyber threat intelligence).

The intel included indicator objects describing suspicious infrastructure. One indicator had a field meant to represent a JARM cluster—but occasionally the bundle was inconsistent:

An indicator claimed it was about a cluster, but the pattern didn’t actually include the cluster value.
The same indicator listed multiple conflicts (e.g., open-redirect label but a network pattern that didn’t match).
The bundle asserted the indicator had an observation time range that didn’t make sense (start time after end time).

When I feed a broken indicator into an enrichment/enforcement pipeline, I can get “confidence poisoning”: bad indicators can silently skew risk scoring.

So I built a linter that focuses on a very specific format: STIX indicators whose pattern encodes a JARM cluster match.

What I implemented: a “STIX Mini-Linter” for JARM cluster indicators

What the linter checks

Given a STIX bundle JSON file, it validates:

Bundle structure
- Must be type: "bundle" with objects.
Indicator objects
- For each object with type: "indicator", ensure:
  - pattern contains a jarm_cluster: marker with the expected value format.
Internal consistency between metadata and pattern
- If the indicator has a custom extension-like field (in this mini example I model it with x_jarm_cluster), then:
  - x_jarm_cluster must appear in pattern.
Valid time range
- If the indicator has valid_from and valid_until, ensure valid_from <= valid_until.

This is intentionally narrow and practical: it’s not a full STIX validator. It’s a guardrail for a specific threat-intel production pipeline.

Working code (Python): generate a sample bundle and lint it

1) Create a sample STIX bundle

Below I generate two bundles:

One valid
One invalid (pattern doesn’t include the cluster value; also has a time range issue)

import json
from datetime import datetime

def make_bundle(valid: bool) -> dict:
    now = datetime.utcnow().isoformat(timespec="seconds") + "Z"
    if valid:
        valid_from = now
        valid_until = datetime.utcnow().replace(year=datetime.utcnow().year + 1).isoformat(timespec="seconds") + "Z"
        jarm_cluster = "jarm:cluster:82f3a1c0"
        pattern = (
            "[network-traffic:src_ref.type = 'ipv4-addr' AND "
            f"network-traffic:dst_port = 443 AND "
            f"jarm_cluster = '{jarm_cluster}']"
        )
    else:
        # Invalid case: pattern is missing the cluster value,
        # and the time range is inverted.
        valid_from = datetime.utcnow().replace(year=datetime.utcnow().year + 1).isoformat(timespec="seconds") + "Z"
        valid_until = now
        jarm_cluster = "jarm:cluster:82f3a1c0"
        pattern = (
            "[network-traffic:src_ref.type = 'ipv4-addr' AND "
            "network-traffic:dst_port = 443 AND "
            "jarm_cluster = 'jarm:cluster:DIFFERENT']"
        )

    bundle = {
        "type": "bundle",
        "id": f"bundle--{'valid' if valid else 'invalid'}-0001",
        "objects": [
            {
                "type": "indicator",
                "spec_version": "2.1",
                "id": f"indicator--{'valid' if valid else 'invalid'}-jarm-001",
                "created": now,
                "modified": now,
                "indicator_types": ["malicious-activity"],
                "pattern": pattern,
                # mini “extension fields” for our linter:
                "x_jarm_cluster": jarm_cluster,
                "valid_from": valid_from,
                "valid_until": valid_until,
            }
        ],
    }
    return bundle

if __name__ == "__main__":
    valid_bundle = make_bundle(True)
    invalid_bundle = make_bundle(False)

    print("=== Valid bundle ===")
    print(json.dumps(valid_bundle, indent=2))
    print("\n=== Invalid bundle ===")
    print(json.dumps(invalid_bundle, indent=2))

What this does (and why):

I’m producing a minimal STIX bundle that still looks like real data: type, id, objects[].
The indicator uses:
- pattern to encode a match condition (here it’s a simplified string).
- x_jarm_cluster as a separate metadata field.
- valid_from / valid_until to represent time constraints.

In the valid bundle, x_jarm_cluster matches what’s inside pattern. In the invalid bundle, it doesn’t—and time is inverted.

2) Lint the bundle

Now the actual linter.

import re
from datetime import datetime, timezone

JARM_CLUSTER_RE = re.compile(r"jarm:cluster:[0-9a-f]{8}$")

def parse_rfc3339(ts: str) -> datetime:
    # STIX timestamps are typically RFC3339.
    # Our sample uses "...Z" so fromisoformat won’t parse the Z without adjustment.
    if ts.endswith("Z"):
        ts = ts[:-1] + "+00:00"
    return datetime.fromisoformat(ts)

def lint_stix_bundle(bundle: dict) -> list[str]:
    errors: list[str] = []

    if bundle.get("type") != "bundle":
        errors.append("Top-level object must have type 'bundle'.")
        return errors

    objects = bundle.get("objects")
    if not isinstance(objects, list):
        errors.append("Bundle must contain an 'objects' array.")
        return errors

    for idx, obj in enumerate(objects):
        if obj.get("type") != "indicator":
            continue

        pattern = obj.get("pattern")
        if not isinstance(pattern, str) or not pattern.strip():
            errors.append(f"[objects[{idx}]] indicator.pattern must be a non-empty string.")
            continue

        x_cluster = obj.get("x_jarm_cluster")
        if not isinstance(x_cluster, str) or not x_cluster.strip():
            errors.append(f"[objects[{idx}]] indicator.x_jarm_cluster must be a non-empty string.")
            continue

        # 1) Validate JARM cluster format
        if not JARM_CLUSTER_RE.match(x_cluster):
            errors.append(
                f"[objects[{idx}]] x_jarm_cluster has unexpected format: '{x_cluster}'. "
                "Expected: jarm:cluster:XXXXXXXX (8 hex chars)."
            )

        # 2) Ensure pattern contains the cluster value
        if x_cluster not in pattern:
            errors.append(
                f"[objects[{idx}]] pattern does not include x_jarm_cluster value. "
                f"x_jarm_cluster='{x_cluster}', pattern='{pattern}'"
            )

        # 3) Validate time range if present
        valid_from = obj.get("valid_from")
        valid_until = obj.get("valid_until")
        if valid_from and valid_until:
            try:
                vf = parse_rfc3339(valid_from)
                vu = parse_rfc3339(valid_until)
                if vf > vu:
                    errors.append(
                        f"[objects[{idx}]] invalid time range: valid_from ({valid_from}) "
                        f"is after valid_until ({valid_until})."
                    )
            except Exception as e:
                errors.append(f"[objects[{idx}]] time parsing failed: {e}")

    return errors

if __name__ == "__main__":
    # quick self-test using the generator above
    from datetime import datetime
    import json

    def make_bundle(valid: bool) -> dict:
        now = datetime.utcnow().isoformat(timespec="seconds") + "Z"
        if valid:
            valid_from = now
            valid_until = datetime.utcnow().replace(year=datetime.utcnow().year + 1).isoformat(timespec="seconds") + "Z"
            jarm_cluster = "jarm:cluster:82f3a1c0"
            pattern = (
                "[network-traffic:src_ref.type = 'ipv4-addr' AND "
                f"network-traffic:dst_port = 443 AND "
                f"jarm_cluster = '{jarm_cluster}']"
            )
        else:
            valid_from = datetime.utcnow().replace(year=datetime.utcnow().year + 1).isoformat(timespec="seconds") + "Z"
            valid_until = now
            jarm_cluster = "jarm:cluster:82f3a1c0"
            pattern = (
                "[network-traffic:src_ref.type = 'ipv4-addr' AND "
                "network-traffic:dst_port = 443 AND "
                "jarm_cluster = 'jarm:cluster:DIFFERENT']"
            )

        return {
            "type": "bundle",
            "id": f"bundle--{'valid' if valid else 'invalid'}-0001",
            "objects": [
                {
                    "type": "indicator",
                    "spec_version": "2.1",
                    "id": f"indicator--{'valid' if valid else 'invalid'}-jarm-001",
                    "created": now,
                    "modified": now,
                    "indicator_types": ["malicious-activity"],
                    "pattern": pattern,
                    "x_jarm_cluster": jarm_cluster,
                    "valid_from": valid_from,
                    "valid_until": valid_until,
                }
            ],
        }

    for name, bundle in [("valid", make_bundle(True)), ("invalid", make_bundle(False))]:
        print(f"\nLinting {name} bundle")
        errs = lint_stix_bundle(bundle)
        if not errs:
            print("No errors.")
        else:
            print("Errors:")
            for e in errs:
                print(" -", e)

What this linter does step-by-step:

JARM_CLUSTER_RE: enforces a strict format for the cluster string:
- jarm:cluster: + 8 hex chars.
parse_rfc3339(): converts timestamps ending in Z to a format datetime.fromisoformat() can parse.
For each indicator object:
1. Confirms pattern is a non-empty string.
2. Confirms x_jarm_cluster exists and matches the expected format.
3. Confirms pattern contains the same x_jarm_cluster string (metadata ↔ pattern consistency).
4. If both time fields exist, ensures the range isn’t inverted.

This is exactly the kind of “small but dangerous” data problem that shows up in threat intel pipelines.

Running it

Put the first script (bundle generator) and the second script (linter) into a single file, or just run the second script as-is since it includes its own generator.

Expected behavior:

Valid bundle: No errors.
Invalid bundle: it should report:
- pattern mismatch (cluster in metadata not found inside pattern)
- invalid time range

How this fits into preemptive defense (Zero Trust / DevSecOps mindset)

Even though this code is small, the operational lesson is big:

Threat intel is code-like input. It has schemas, invariants, and failure modes.
In a Zero Trust architecture (a design principle where nothing is implicitly trusted and each request is verified), indicators often become policy inputs.
In DevSecOps, I treat intel ingestion like CI: validate → lint → only then allow downstream systems (matching engines, scoring, enforcement) to use it.

I didn’t need a giant framework to get value; I needed a guardrail that catches the common “wrong but plausible” cases before they become production behavior.

Conclusion

I built a tiny STIX 2.1 mini-linter tailored to a very specific threat-intelligence shape: indicator objects that describe JARM-derived TLS clusters. By validating metadata/pattern consistency and checking time ranges, I prevented malformed bundles from corrupting indicator matching and risk scoring—turning threat intel ingestion into something closer to a reliable, testable step in a DevSecOps pipeline.