Validation & Auditability

PTEM is validated as a governed structural measurement system on frozen records under forward-only computation.

Evaluation is designed for reproducibility and committee-grade review. It is not framed as a probabilistic classifier benchmark program.

Validation overview

Canonical evaluation posture:

  • Frozen Atlantic v1.2 structural spine (1851–2024)
  • Forward-only computation
  • Pre-declared operating points recorded in run artifacts
  • Deterministic replay from manifests and receipts
  • Cross-basin transfer diagnostics under the same governance contract

PTEM is evaluated for structural timing, persistence, and regime behavior under declared governance rules, not as a broad probability classifier for all storm outcomes.

Validation outputs include deterministic run manifests, validation receipts, and reproducible derived reporting artifacts from frozen datasets.

Takeaway: validation asks whether the instrument is structurally consistent, replayable, and governance-safe.

Public research artifacts supporting external evaluation are available in the Research section.

Tier-1 validation layer reference: Tier-1 Persistence Capacity Surface (public, best-track-derived, forward-only baseline).

Event-scale validation (activation and lead)

Event-scale evaluation tracks when rare high-organization structural regimes activate relative to later lifecycle milestones.

  • Reference operating point: p80 (event-scale structural activation lens)
  • Median structural lead (aligned events): 36 hours
  • Mean structural lead (aligned events): 50 hours
  • Precision / Recall (orientation only): 0.396 / 0.205

Classifier-style statistics are reported for orientation only. Primary evidence remains forward-only lead distributions, alignment behavior, and replayable activation traces.

Takeaway: event-scale outputs are structural timing evidence, not discretionary forecast calls.

Climatology-scale validation: structural regime surfaces

These predecessor surfaces remain part of the audit trail, while Phase Atlas is the current operational structural climatology surface.

Climatology artifacts are governed separately from event artifacts and remain year-level, non-invertible measurement surfaces.

Legacy declaration (historical, pre-Phase Atlas)

  • Canonical percentile: p95
  • Rolling baseline: 15 years
  • Top-k years: 15
  • Minimum storms per year: 10
  • Canonical selection rule: predeclared_percentile

Robustness characterization

  • Percentiles tested: p90, p95, p97, p99
  • Baseline windows tested: 10, 15, 20 years
  • Cross-configuration intersections, unions, and Jaccard diagnostics recorded in receipts

Interpretation boundary

Legacy climatology surfaces are retained as historical audit artifacts. They are not loss models, not intensity proxies, and are non-forecasting artifacts.

Correlation against ACE remains low across sweep receipts, indicating partial orthogonality.

Takeaway: climatology outputs are portfolio-facing structural priors, separate from storm-level exposure.

Phase Atlas + PAC + Capital Impact validation

Phase Atlas produces deterministic climatology surfaces under fixed thresholds and frozen inputs. PAC applies these surfaces to external event sets through deterministic transformations, producing reproducible conditioned outputs and audit receipts.

  • manifest-first loading
  • SHA256 artifact validation
  • no storm-level exposure
  • no track modification
  • no event generation
  • no model physics modification
  • non-forecasting outputs

Capital Impact then performs deterministic post-conditioning comparison on the same event set to translate baseline vs conditioned outputs into capital-relevant tail metrics.

Takeaway: Phase Atlas measures structural capacity, PAC conditions, and Capital Impact reports tail metric translation without changing event generation or model physics.

Evaluation Interface

PTEM provides a reproducible evaluation surface for structural conditioning workflows, including:

  • PAC test packet for one-command verification
  • Sensitivity sweep for controlled parameter-stability inspection
  • Evaluation harness for external event-set execution
  • Scale test for operational validation on large event catalogs

All evaluations are deterministic, row-preserving, and produce receipt-backed artifacts for audit.

Capital-impact reproducibility

Capital-impact outputs are generated from the same PAC-conditioned event set and include JSON reports plus a one-page Markdown/PDF-ready sheet. The layer is deterministic, report-driven, and governed by the same manifest and receipt discipline as the conditioning layer.

  • conditioned_events.csv
  • conditioning_manifest.json
  • conditioning_receipt.json
  • capital_impact_report.json
  • capital_impact_sheet.md / PDF-ready sheet

Language boundary: capital-relevant metrics, tail metric translation, and post-conditioning comparison under the same event set.

Cross-basin transfer diagnostics (EPAC)

EPAC transfer diagnostics are produced from frozen spines using the same governance envelope as Atlantic evaluation.

  • Basin-native percentile thresholds under the same declared rule
  • Cross-threshold stress tests (Atlantic threshold on EPAC and vice versa)
  • Diagonal transfer (early-era calibration evaluated on late-era data across basins)
  • Ratios reported against basin-appropriate native baselines

These diagnostics are comparative portability evidence. They do not assert basin equivalence or rewrite canonical structural truth.

Takeaway: transfer is measured under identical governance constraints.

See /transfer for published transfer scorecards.

Construction integrity

Integrity checks confirm complete, deterministic construction of the frozen validation harness.

Planned timestamps written

54,203 / 54,203

Strict input enforcement

enabled

Missing inputs

0

Ingest errors

0

Takeaway: the canonical spine is complete and replay-safe under strict input enforcement.

Audit artifacts and inspection

Immutable outputs

Each canonical run publishes reproducible audit artifacts:

  • run_manifest.json
  • validation_receipt.json
  • scorecard.json

Read-only inspection (GUS)

GUS supports independent verification of scope, coverage, artifact correspondence, and manifest-level integrity.

It is inspection-only and cannot modify frozen records or execution behavior.

Takeaway: published results remain independently inspectable and non-mutable.

Evaluation access boundaries

Provided under evaluation access

  • Canonical run artifacts and manifests
  • Per-advisory activation flags
  • Event indices and earliest-lead tables
  • Coverage and summary statistics
  • Capital-impact report artifacts and one-page decision sheet

Not provided (protected)

  • Internal functional forms
  • Internal operator implementations
  • Internal gate logic
  • Internal derivation methods

Takeaway: evaluation provides reproducible evidence without exposing proprietary internals.

Technical provenance (detailed references)

Implementation-level references are retained for institutional audit workflows and placed here to keep core validation interpretation readable.

Reference details

  • Registry path (Atlantic canonical spine): ptem_struct-v1-2/hurricanes/registries/atlantic_all/storms_6h_dtg_hurdat2_v1.csv
  • Coverage window: 1851-06-25 → 2024-11-18 (1,991 storms; 54,203 timestamps at 6-hour cadence)
  • Canonical freeze stamp: freeze-2026-01-27 (superseding freeze-2026-01-25; superseded freezes retained for auditability)
  • Immutable storage posture: Canonical climatology bundles are stored under immutable prefixes with manifest SHA256 references (evaluation access)
  • Verification tooling: verify_freeze_bundle_v1.sh with optional --strict-clean enforcement; clean-tree discipline is default

Additional bundle verification guidance is available at /docs/freeze-verification.

Takeaway: provenance is explicit, immutable, and suitable for repeatable technical audit.