Sourcing Standards
SpoolBench operates under a fabrication-impossibility contract. Every fact-bearing claim is anchored to an atom: a structured record carrying claim text, source URL, content hash, captured text span, fetch timestamp, and verification timestamp. Pre-publish gates verify atom resolution, content hash freshness, and entailment between rendered claim and captured source span. Pages with unresolved or stale atoms cannot ship.
The Fabrication-Impossibility Contract
SpoolBench cannot publish a fact-bearing claim that lacks a verifiable source. The pipeline enforces this structurally: page generation receives a constrained set of atoms (sourced facts) and is forbidden from emitting prose containing claims outside that set. Pre-publish gates re-check every claim against its atom; pages with unresolved or stale atoms cannot ship.
This is the structural difference between SpoolBench and most AI-driven content operations. We do not "ask the model to be careful"; we make fabrication mechanically unable to leave the build pipeline.
The Atom Substrate
Research is decomposed into atoms — small, structured records, one per verifiable claim. Each atom carries:
- atom_id — stable identifier, niche-prefixed (e.g.
AP-014) - kind — one of existence, spec, price, benchmark, quote, comparison
- claim — the asserted fact, in canonical prose, ≤500 characters
- source.url — re-fetchable URL where the claim was captured
- source.fetched_at — ISO 8601 UTC timestamp of the original capture
- source.content_hash — SHA-256 hash of the source slice as it appeared at capture
- source.span — character offsets [start, end) within the captured page slice
- source.fetched_text — the exact text the span indexes into
- verified_at — ISO 8601 UTC timestamp of the most recent re-fetch + entailment check
Atoms live at research/atoms/{slug}.yaml, version-controlled
alongside the page they support. The schema is locked by Zod validation;
any atom missing a field, or whose span extends past its captured text,
fails parse and the page cannot generate.
Atom-Only Generation Discipline
Page generation runs through a headless agent with a strict prompt
contract: every fact-bearing sentence in the output must be wrapped
in <claim ref="ATOM_ID">...</claim> markup.
At build time, that markup expands to inline citation markers (the
bracketed numbers you see in the prose) and a structured
data-atom-ref attribute that AI engines extract
directly.
The generation prompt does not let the agent invent claims. When the agent needs a fact, it must select from the atom set the orchestrator passed in. If the atoms cannot support a claim, the agent must omit the claim — not paraphrase a guess.
Post-Generation Verification
After the page draft is generated, a verifier runs locally (zero
external API calls) and checks every <claim ref>:
- The referenced atom exists in the substrate.
- The atom is not stale (price atoms verified within 30 days; evergreen atoms verified within 365 days).
- For numeric kinds (spec, price, benchmark): the rendered claim's numeric tokens match the source span within precision tolerance.
- For quote kinds: the rendered claim is a substring of the captured source text after whitespace normalization.
- For all kinds: a faithfulness check (small distilled NLI model) returns ≥ 0.85 entailment between the rendered claim and the captured source span.
Any failure triggers up to three regeneration attempts with the
specific failure context fed back to the agent. After three failed
attempts, the page is quarantined to
research/quality-failures/ and the run aborts. Pages do
not ship "with known defects."
Content-Hash Verification
Sources mutate. A manufacturer updates a spec page; an Amazon listing gets a new title; a Wikipedia paragraph gets edited. Without verification, pages drift quietly. Our weekly atom-refresh job re-fetches every source URL, hashes the slice the atom captured, and compares to the stored content_hash:
- Hash matches. The atom is re-stamped with a fresh
verified_at. - Hash differs but the captured text still appears in the new fetch. The atom is updated with the new fetched_text and re-stamped.
- Hash differs and the captured text is gone. The atom is marked stale (
verified_at: null) with a stale_reason. Pages citing it cannot regenerate until the atom is re-extracted from current source.
Audit Log + Merkle Chain
Every page edit appends one line to research/audit-log.jsonl:
trigger key, source URL, atom set fingerprint, page content hash, model
version, and ISO 8601 UTC timestamp. The log is append-only — the
pipeline does not delete or rewrite past entries.
A daily job hashes the day's lines into a SHA-256 Merkle root and
commits it to research/audit-merkle.jsonl. Each daily
root references the previous day's root, building a chain that locks
history: tampering with a past entry breaks the verification at the
next root.
End-to-end audit path for any claim on the site:
rendered prose → data-atom-ref on the inline span →
atom record in research/atoms/ → source URL + content
hash → re-fetch + entailment check → audit-log entry → daily Merkle
root → daily commit on the repository.
What We Don't Do
SpoolBench synthesizes aggregated user experience and primary-source documentation. It does not run a physical test lab. We do not measure decibels with a meter we own; we do not bench-test products we bought; we do not stand up a controlled environment and run controlled trials. That is real first-party value we cannot match.
The structural advantage we provide is at the synthesis layer: aggregating thousands of verified user reviews, cross-referencing manufacturer specs against independent measurements, surfacing contradictions between retailer claims and owner experience, and tracking how those signals shift over time. The provenance contract makes that synthesis verifiable rather than asserted.
Legal-Compliance Backdrop
The Federal Trade Commission's 2023 Endorsement Guides revision (16 CFR Part 255) holds publishers liable for fabricated endorsements. The Mata v. Avianca decision and the Air Canada chatbot ruling have established that organizations are accountable for fabrications produced by AI-generated content that ships under their name. Our contract is designed to make those failure modes mechanically impossible — not as a regulatory checkbox, but because they're load-bearing for editorial trust.
Author Identity
Editorial work on SpoolBench is attributed to a named author with a public byline and disclosed credentials documented on the about page. The atom substrate + audit log establish what each claim is grounded in; the byline establishes who is accountable for each page's editorial choices around what to cover, how to weigh contradictions, and where to draw the line on a verdict.
SpoolBench synthesizes aggregated user experience and primary-source documentation; it does not operate a physical test lab. The structural transparency the contract provides — every claim verifiable to its source via re-fetch + hash check — is the basis on which the site's authority rests.
Spotted a claim on the site that contradicts a primary source you can point to? Reach us via the contact form on our about page with the page URL and the conflicting source — we'll re-verify the atom and update the page on a Tier 1 surgical edit if the source confirms the contradiction.