Validation & Auditability

PTEM provides a frozen, forward-only structural validation harness built on the canonical Atlantic structural spine (v1.2, 1851–2024). All validation outputs derive from immutable run manifests and receipts to ensure reproducibility and auditability, and each run records its declared operating point in both artifacts.

Validated against Atlantic structural spine freeze-2026-01-27 (superseding freeze-2026-01-25; superseded freezes remain preserved for auditability).

Event-scale operating points and climatology-scale SRI percentiles are declared separately and are not blended.

Canonical Atlantic Structural Spine (Frozen)

  • Freeze stamp: freeze-2026-01-27
  • Code reference: recorded in immutable run manifests & validation receipts (evaluation access)
  • Registry: ptem_struct-v1-2/hurricanes/registries/atlantic_all/storms_6h_dtg_hurdat2_v1.csv
  • Coverage: 1851-06-25 → 2024-11-18
  • Scope: 1,991 storms • 54,203 timestamps (6-hour cadence)

Cross-basin transfer diagnostics (EPAC)

PTEM has been independently ported into the Eastern Pacific (EPAC) under the same governed execution contract: forward-only computation, immutable freezes, manifests, and receipts.

Transfer diagnostics are published as scorecards derived from frozen spines:

  • • Basin-native percentile thresholds (computed per basin under the same declared rule)
  • • Cross-threshold stress tests (Atlantic threshold applied to EPAC; EPAC threshold applied to Atlantic)
  • • Diagonal transfer (early-era calibration evaluated on late-era data, applied across basins)
  • • Ratios reported against basin-appropriate native baselines (late-native baselines for diagonal transfer)

These diagnostics measure how structural regime behavior transfers under identical governance rules rather than asserting basin equivalence.

Interpretation boundary:
These are comparative diagnostics only. They do not assert basin equivalence, replace basin climatology, or modify canonical structural truth.

Construction integrity

Planned timestamps written

54,203 / 54,203

Strict input enforcement

enabled

Missing inputs

0

Ingest errors

0

This freeze contains structural signals only; no energy- or intensity-derived lenses are included.

Forward-Only Validation

Frozen harness

Evaluation is performed strictly forward-only against the frozen Atlantic structural spine:

  • No look-ahead or future conditioning
  • Blind to future intensity, classification, or outcomes
  • Independent of intensity-based forecast models

Structural activation timestamps are generated using pre-declared activation rules recorded at run time, with the reference operating point noted in the run manifest and validation receipt for each canonical evaluation.

Structural lead times are measured from first activation to the aligned event onset marker, describing timing of rare structural regime entry relative to later lifecycle milestones rather than asserting a universal intensity-forecast lead for all storms.

Internal gating (e.g., landfall proximity, post-transition regimes) is enforced during activation and evaluation to ensure structural validity. Gating logic affects which activations are admitted into evaluation but is not exposed directly in public artifacts.

Event Scale (Structural Activation & Lead) — Reference Operating Point (p80)

PTEM targets rare high-organization regimes rather than classifying every intensification event; absence of activation never implies safety under the declared operating point.

These operating characteristics are reported for transparency but are not the primary evaluation lens—forward-only lead distributions, survival-style alignment, and gating behavior replace classifier metrics such as AUC/Brier or Brier scores.

Illustrative operating characteristics at the reference threshold (p80) on the Atlantic v1.2 frozen harness:

Traditional classifier metrics are included for orientation only; PTEM’s primary evaluation lens is forward-only lead distributions, survival-style event alignment, and activation behavior.

  • Median structural lead (aligned events): 36 hours
  • Mean structural lead (aligned events): 50 hours
  • Precision / Recall (illustrative operating characteristics, not primary evaluation lens): 0.396 / 0.205

This operating point prioritizes structural activation confidence and lead coverage under forward-only scoring across 1,991 storms (1851–2024). Event-scale receipts record activation rules, gating notes, and operational behavior summaries for each canonical run.

Climatology-scale SRI v1 is published separately as a year-level regime density instrument; cross-horizon comparisons are documented in validation receipts rather than blended on this page.

(Full distributions, threshold sweeps, and per-advisory flags available under evaluation license.)

PTEM’s event-scale outputs characterize rare structural timing regimes and their alignment relative to later storm milestones; they are not presented as a broad binary forecast of RI across all storms.

Operational Behavior (Post-Run Analysis)

In addition to lead-time and precision/recall metrics, PTEM produces a storm-anonymous post-run operational behavior summary for each canonical evaluation.

It characterizes activation stability (episode duration, chatter rates), morphology (step-change vs. ramp vs. multi-pulse behavior), and event-aligned survival relative to Rapid Intensification (RI) onset, and each summary is frozen with cryptographic hashes for audit and publication.

These artifacts describe how the system behaves, not just how often it fires.

Climatology Scale (SRI v1) — Canonical Freeze

SRI v1 is the canonical climatology-scale structural regime instrument for Atlantic hurricanes. It aggregates extreme structural regimes across years using a predeclared selection rule (not performance tuning) and records robustness characterization across both percentile depth and baseline memory.

Canonical declaration (SRI v1, Atlantic):

  • Canonical percentile: p95
  • Rolling baseline: 15 years
  • Top-k years: 15
  • Min storms/year: 10
  • Canonical selection rule: predeclared_percentile

Robustness characterization (embedded in receipt):

  • Percentiles tested: p90, p95, p97, p99
  • Baseline windows tested: 10, 15, 20 years
  • Cross-configuration intersections, union sizes, and Jaccard diagnostics are frozen with explicit paths and SHA256 hashes.

Interpretation boundary:

SRI v1 is a year-level structural density index, not a loss model and not an intensity proxy. Corr(SRI, Accumulated Cyclone Energy (ACE)) falls in the ~0.11–0.17 range across sweep receipts, underscoring partial orthogonality. It is suitable for catalogue realism testing, non-stationarity monitoring, and regime drift analysis.

The canonical SRI v1 freeze is stored under an immutable S3 prefix with a manifest containing SHA256 hashes for all receipt artifacts. Outputs remain year-level only and do not expose storm-level time series.

Structural Regime Signals (SRS v1)

  • Derived from the canonical SRI v1 freeze (year-level regime aggregates only)
  • Deterministic outputs with manifest hashes and receipt-style provenance
  • Stored under immutable S3 prefixes for evaluation access; bundle includes MANIFEST, FREEZE_META, RPI/RVI change-point candidates, and decadal summaries
  • Designed for decade-scale risk conditioning, drift monitoring, and regime clustering
  • Non-invertible to storm-level structure (no storm-level traces; year-level only)

SRS v1 is a synthetic, non-invertible year-level signal pack that operates strictly on aggregated regime metrics and is not a loss model or intensity proxy.

SRS Derived Views v1 (optional): analyst-friendly drift/label summaries derived only from SRS v1 outputs (non-invertible).

SRS Derived Views v1

SRS Derived Views v1 produces analyst-facing diagnostics (e.g., RDI/CSS labels, derived_by_year slices) generated only from the SYNTH_SIGNALS.json payload produced by SRS v1. It inherits the same MANIFEST and FREEZE_META governance layer, including sha256 references and git provenance.

Outputs remain year-level, non-invertible aggregates designed for catalogue realism testing, non-stationarity monitoring, and capital-surface conditioning. No storm-level structure or operator logic is exposed.

Governed artifact verification

All climatology-surface bundles (SRI v1, SRS v1, SRS Derived Views v1) include MANIFEST and FREEZE_META files with sha256 references and git commit provenance. PTEM verifies uploads via verify_freeze_bundle_v1.sh, which can optionally enforce a clean git state using --strict-clean. Freeze helpers permit --allow-dirty overrides but default to clean-tree enforcement so downstream teams can audit every release.

Read-Only Inspection (GUS)

Immutable inspection

PTEM includes a read-only inspection interface (GUS) for independent verification of:

registry scope and coverage,

completion ledgers,

frozen artifact URIs, and

correspondence between published summaries, underlying scorecard artifacts, and immutable run manifests.

GUS confirms that reported scorecards and summaries are derived from declared runs and validation receipts.

It produces audit outputs only and cannot modify frozen data or activation logic.

Audit Artifacts

The following artifacts are produced for each canonical run:

Immutable outputs

  • run_manifest.json
  • validation_receipt.json
  • scorecard.json

All reported results are reproducible from these artifacts.

Evaluation Access

Provided under Tier 1 Evaluation

  • Canonical run artifacts and manifests
  • Per-advisory activation flags
  • Event indices and earliest-lead tables
  • Coverage and summary statistics

Not provided (protected)

  • Internal functional forms
  • Structural operators and filters
  • Gate firing logic (e.g., landfall, regime suppression)
  • Coherence and circulation derivation logic

All validation outputs are reproducible from provided data and manifests.

Summary

Canonical validation is performed on a frozen Atlantic structural record using pre-declared activation rules recorded at run time. Public materials document auditability, scope, and aggregate operating characteristics, including activation timing, lead-time distributions, and false-activation rates under declared operating points with reference operating points captured in the associated manifests and receipts.

Evaluation licenses provide advisory-level activation flags, event-aligned lead tables, and full validation receipts suitable for independent technical review.