Validation & Auditability

Validation & Auditability

PTEM provides a frozen, forward-only structural validation harness built on a canonical Atlantic structural spine (v1.2, 1851–2024). All validation outputs are derived from immutable run manifests and receipts to ensure reproducibility and auditability. The harness evaluates declared operating points; each run records its operating point in the run manifest and validation receipt.

Validated against Atlantic structural spine freeze-2026-01-27 (superseding freeze-2026-01-25; superseded freezes remain preserved for auditability).

Canonical Atlantic Structural Spine (Frozen)

  • Freeze stamp: freeze-2026-01-27
  • Code reference: recorded in immutable run manifests & validation receipts (evaluation access)
  • Registry: ptem_struct-v1-2/hurricanes/registries/atlantic_all/storms_6h_dtg_hurdat2_v1.csv
  • Coverage: 1851-06-25 → 2024-11-18
  • Scope: 1,991 storms • 54,203 timestamps (6-hour cadence)

Construction integrity

Planned timestamps written

54,203 / 54,203

Strict input enforcement

enabled

Missing inputs

0

Ingest errors

0

This freeze contains structural substrate only. No energy- or intensity-derived lenses are included.

Forward-Only Validation

Frozen harness

Evaluation is performed strictly forward-only against the frozen Atlantic structural spine:

  • No look-ahead or future conditioning
  • Blind to future intensity, classification, or outcomes
  • Independent of intensity-based forecast models

Structural activation timestamps are generated using pre-declared activation rules recorded at run time, with the reference operating point noted in the run manifest and validation receipt for each canonical evaluation.

Lead times are measured from first activation to the aligned event onset marker, evaluated retrospectively but computed without future access.

Internal gating (e.g., landfall proximity, post-transition regimes) is enforced during activation and evaluation to ensure structural validity. Gating logic affects which activations are admitted into evaluation but is not exposed directly in public artifacts.

Reference operating point (p80) — Atlantic v1.2 (frozen harness)

  • Median lead time (true positives): 36 hours
  • Mean lead time (true positives): 50 hours
  • Precision / Recall: 0.396 / 0.205

This operating point reflects an early-warning structural regime, prioritizing lead time and activation confidence over event coverage. Metrics are evaluated under forward-only conditions, blind to future intensity data, across 1,991 storms (1851–2024).

(Full distributions, threshold sweeps, and per-advisory flags available under evaluation license.)

Read-Only Inspection (GUS)

Immutable inspection

PTEM includes a read-only inspection interface (GUS) for independent verification of:

registry scope and coverage,

completion ledgers,

frozen artifact URIs, and

correspondence between published summaries, underlying scorecard artifacts, and immutable run manifests.

GUS confirms that reported scorecards and summaries are derived from declared runs and validation receipts.

It produces audit outputs only and cannot modify frozen data or activation logic.

Audit Artifacts

The following artifacts are produced for each canonical run:

Immutable outputs

  • run_manifest.json
  • validation_receipt.json
  • scorecard.json

All reported results are reproducible from these artifacts.

Evaluation Access

Provided under Tier 1 Evaluation

  • Canonical run artifacts and manifests
  • Per-advisory activation flags
  • Event indices and earliest-lead tables
  • Coverage and summary statistics

Not provided (protected)

  • Internal functional forms
  • Structural operators and filters
  • Gate firing logic (e.g., landfall, regime suppression)
  • HC / DC derivation logic

All validation outputs are reproducible from provided data and manifests.

Summary

Canonical validation is performed on a frozen Atlantic structural record using pre-declared activation rules recorded at run time. Public materials document auditability, scope, and aggregate operating characteristics, including activation timing, lead-time distributions, and false-activation rates under declared operating points with reference operating points captured in the associated manifests and receipts.

Evaluation licenses provide advisory-level activation flags, event-aligned lead tables, and full validation receipts suitable for independent technical review.