Sourcing Standards

Last reviewed 2026-05-08.

SpoolBench operates under a fabrication-impossibility contract. Every fact-bearing claim is anchored to an atom: a structured record carrying claim text, source URL, content hash, captured text span, fetch timestamp, and verification timestamp. Pre-publish gates verify atom resolution, content hash freshness, and entailment between rendered claim and captured source span. Pages with unresolved or stale atoms cannot ship.

The Fabrication-Impossibility Contract

SpoolBench cannot publish a fact-bearing claim that lacks a verifiable source. The pipeline enforces this structurally: page generation receives a constrained set of atoms (sourced facts) and is forbidden from emitting prose containing claims outside that set. Pre-publish gates re-check every claim against its atom; pages with unresolved or stale atoms cannot ship.

This is the structural difference between SpoolBench and most AI-driven content operations. We do not "ask the model to be careful"; we make fabrication mechanically unable to leave the build pipeline.

The Atom Substrate

Research is decomposed into atoms — small, structured records, one per verifiable claim. Each atom carries:

atom_id — stable identifier, niche-prefixed (e.g. AP-014)
kind — one of existence, spec, price, benchmark, quote, comparison
claim — the asserted fact, in canonical prose, ≤500 characters
source.url — re-fetchable URL where the claim was captured
source.fetched_at — ISO 8601 UTC timestamp of the original capture
source.content_hash — SHA-256 hash of the source slice as it appeared at capture
source.span — character offsets [start, end) within the captured page slice
source.fetched_text — the exact text the span indexes into
verified_at — ISO 8601 UTC timestamp of the most recent re-fetch + entailment check

Atoms live at research/atoms/{slug}.yaml, version-controlled alongside the page they support. The schema is locked by Zod validation; any atom missing a field, or whose span extends past its captured text, fails parse and the page cannot generate.

Atom-Only Generation Discipline

Page generation runs through a headless agent with a strict prompt contract: every fact-bearing sentence in the output must be wrapped in <claim ref="ATOM_ID">...</claim> markup. At build time, that markup expands to inline citation markers (the bracketed numbers you see in the prose) and a structured data-atom-ref attribute that AI engines extract directly.

The generation prompt does not let the agent invent claims. When the agent needs a fact, it must select from the atom set the orchestrator passed in. If the atoms cannot support a claim, the agent must omit the claim — not paraphrase a guess.

Post-Generation Verification

After the page draft is generated, a verifier runs locally (zero external API calls) and checks every <claim ref>:

The referenced atom exists in the substrate.
The atom is not stale (price atoms verified within 30 days; evergreen atoms verified within 365 days).
For numeric kinds (spec, price, benchmark): the rendered claim's numeric tokens match the source span within precision tolerance.
For quote kinds: the rendered claim is a substring of the captured source text after whitespace normalization.
For all kinds: a faithfulness check (small distilled NLI model) returns ≥ 0.85 entailment between the rendered claim and the captured source span.

Any failure triggers up to three regeneration attempts with the specific failure context fed back to the agent. After three failed attempts, the page is quarantined to research/quality-failures/ and the run aborts. Pages do not ship "with known defects."

Content-Hash Verification

Sources mutate. A manufacturer updates a spec page; an Amazon listing gets a new title; a Wikipedia paragraph gets edited. Without verification, pages drift quietly. Our weekly atom-refresh job re-fetches every source URL, hashes the slice the atom captured, and compares to the stored content_hash:

Hash matches. The atom is re-stamped with a fresh verified_at.
Hash differs but the captured text still appears in the new fetch. The atom is updated with the new fetched_text and re-stamped.
Hash differs and the captured text is gone. The atom is marked stale (verified_at: null) with a stale_reason. Pages citing it cannot regenerate until the atom is re-extracted from current source.

Audit Log + Merkle Chain

Every page edit appends one line to research/audit-log.jsonl: trigger key, source URL, atom set fingerprint, page content hash, model version, and ISO 8601 UTC timestamp. The log is append-only — the pipeline does not delete or rewrite past entries.

A daily job hashes the day's lines into a SHA-256 Merkle root and commits it to research/audit-merkle.jsonl. Each daily root references the previous day's root, building a chain that locks history: tampering with a past entry breaks the verification at the next root.

End-to-end audit path for any claim on the site: rendered prose → data-atom-ref on the inline span → atom record in research/atoms/ → source URL + content hash → re-fetch + entailment check → audit-log entry → daily Merkle root → daily commit on the repository.

What We Don't Do

SpoolBench synthesizes aggregated user experience and primary-source documentation. It does not run a physical test lab. We do not measure decibels with a meter we own; we do not bench-test products we bought; we do not stand up a controlled environment and run controlled trials. That is real first-party value we cannot match.

The structural advantage we provide is at the synthesis layer: aggregating thousands of verified user reviews, cross-referencing manufacturer specs against independent measurements, surfacing contradictions between retailer claims and owner experience, and tracking how those signals shift over time. The provenance contract makes that synthesis verifiable rather than asserted.

Legal-Compliance Backdrop

The Federal Trade Commission's 2023 Endorsement Guides revision (16 CFR Part 255) holds publishers liable for fabricated endorsements. The Mata v. Avianca decision and the Air Canada chatbot ruling have established that organizations are accountable for fabrications produced by AI-generated content that ships under their name. Our contract is designed to make those failure modes mechanically impossible — not as a regulatory checkbox, but because they're load-bearing for editorial trust.

Author Identity

Editorial work on SpoolBench is attributed to a named author with a public byline and disclosed credentials documented on the about page. The atom substrate + audit log establish what each claim is grounded in; the byline establishes who is accountable for each page's editorial choices around what to cover, how to weigh contradictions, and where to draw the line on a verdict.

SpoolBench synthesizes aggregated user experience and primary-source documentation; it does not operate a physical test lab. The structural transparency the contract provides — every claim verifiable to its source via re-fetch + hash check — is the basis on which the site's authority rests.

Spotted a claim on the site that contradicts a primary source you can point to? Reach us via the contact form on our about page with the page URL and the conflicting source — we'll re-verify the atom and update the page on a Tier 1 surgical edit if the source confirms the contradiction.