Validation & Auditability
Validation & Auditability
PTEM provides a frozen, forward-only structural validation harness built on a canonical Atlantic structural spine (v1.2, 1851–2024). All validation outputs are derived from immutable run manifests and receipts to ensure reproducibility and auditability. The harness evaluates declared operating points; each run records its operating point in the run manifest and validation receipt.
Validated against Atlantic structural spine freeze-2026-01-27 (superseding freeze-2026-01-25; superseded freezes remain preserved for auditability).
Canonical Atlantic Structural Spine (Frozen)
- Freeze stamp: freeze-2026-01-27
- Code reference: recorded in immutable run manifests & validation receipts (evaluation access)
- Registry: ptem_struct-v1-2/hurricanes/registries/atlantic_all/storms_6h_dtg_hurdat2_v1.csv
- Coverage: 1851-06-25 → 2024-11-18
- Scope: 1,991 storms • 54,203 timestamps (6-hour cadence)
Construction integrity
Planned timestamps written
54,203 / 54,203
Strict input enforcement
enabled
Missing inputs
0
Ingest errors
0
This freeze contains structural substrate only. No energy- or intensity-derived lenses are included.
Forward-Only Validation
Frozen harness
Evaluation is performed strictly forward-only against the frozen Atlantic structural spine:
- No look-ahead or future conditioning
- Blind to future intensity, classification, or outcomes
- Independent of intensity-based forecast models
Structural activation timestamps are generated using pre-declared activation rules recorded at run time, with the reference operating point noted in the run manifest and validation receipt for each canonical evaluation.
Lead times are measured from first activation to the aligned event onset marker, evaluated retrospectively but computed without future access.
Internal gating (e.g., landfall proximity, post-transition regimes) is enforced during activation and evaluation to ensure structural validity. Gating logic affects which activations are admitted into evaluation but is not exposed directly in public artifacts.
Reference operating point (p80) — Atlantic v1.2 (frozen harness)
- Median lead time (true positives): 36 hours
- Mean lead time (true positives): 50 hours
- Precision / Recall: 0.396 / 0.205
This operating point reflects an early-warning structural regime, prioritizing lead time and activation confidence over event coverage. Metrics are evaluated under forward-only conditions, blind to future intensity data, across 1,991 storms (1851–2024).
(Full distributions, threshold sweeps, and per-advisory flags available under evaluation license.)
Read-Only Inspection (GUS)
Immutable inspection
PTEM includes a read-only inspection interface (GUS) for independent verification of:
registry scope and coverage,
completion ledgers,
frozen artifact URIs, and
correspondence between published summaries, underlying scorecard artifacts, and immutable run manifests.
GUS confirms that reported scorecards and summaries are derived from declared runs and validation receipts.
It produces audit outputs only and cannot modify frozen data or activation logic.
Audit Artifacts
The following artifacts are produced for each canonical run:
Immutable outputs
- run_manifest.json
- validation_receipt.json
- scorecard.json
All reported results are reproducible from these artifacts.
Evaluation Access
Provided under Tier 1 Evaluation
- Canonical run artifacts and manifests
- Per-advisory activation flags
- Event indices and earliest-lead tables
- Coverage and summary statistics
Not provided (protected)
- Internal functional forms
- Structural operators and filters
- Gate firing logic (e.g., landfall, regime suppression)
- HC / DC derivation logic
All validation outputs are reproducible from provided data and manifests.
Summary
Canonical validation is performed on a frozen Atlantic structural record using pre-declared activation rules recorded at run time. Public materials document auditability, scope, and aggregate operating characteristics, including activation timing, lead-time distributions, and false-activation rates under declared operating points with reference operating points captured in the associated manifests and receipts.
Evaluation licenses provide advisory-level activation flags, event-aligned lead tables, and full validation receipts suitable for independent technical review.