Validation & Auditability
A frozen, auditable structural engine
PTEM is validated on a frozen, 10-season Atlantic dataset with forward-only metrics and reproducible RI labels. We expose scorecards, stress tests, and structural regime summaries so reinsurers and modeling teams can audit the engine without seeing our internal code.
Frozen validation harness
PTEM’s core RI validation is anchored to a fixed, 10-season Atlantic dataset:
- Basin: North Atlantic (AL)
- Period: 2015-2024
- Cadence: 6-hour advisories
- Horizon: 24-hour intensity change
- RI definition: ≥ 30 kt increase in 24 hours
Reinsurers and modeling teams can re-score PTEM using their own RI definitions or windows. The underlying time grid and labels are stable, so skill comparisons remain apples-to-apples across model vintages.
2015–2024 Atlantic harness
- 10 seasons, 6-hour cadence
- 5,705 24h forecast points
- 223 RI events (≥30 kt / 24h)
Structural priming vs. 24-hour RI
PTEM’s structural priming index (PPI) is scored against our 24-hour RI labels using a forward-only evaluation. We track both how early the structural signal turns on and how cleanly it separates events from background.
RI base rate
223 RI events out of 5705 24h windows (2015–2024, ≥30 kt).
Median structural lead
For RI events with a PPI signal, the median lead time between structural priming and RI onset.
PPI‑ESS
Composite structural lead score combining early detection and separation from background.
Global AUC
Rank-ordering of PPI across all time steps; included for transparency.
PPI is designed as an early-warning structural signal. We prioritize lead time and regime detection over squeezing out small AUC gains on a highly imbalanced dataset.
Structural readiness regimes
PTEM divides the timeline into low and high structural regimes and compares RI rates between them.
High‑R coverage
Fraction of 6‑hour windows classified as structurally high‑readiness.
RI rate in high vs low
RI is about 15% more likely in the high‑R regime than in the low‑R regime under the 24h RI definition.
PTEM does not replace deterministic intensity forecasts — R(t) is an overlay that measures structural readiness.
Stress tests: long‑lead structural priming
Across 2015–2024 ATL, PTEM’s structural driver R(t) showed pre-event signal in 157 RI events, including 2 cases with ≥60h continuous high-R. One 2024 storm maintained high-R for up to 72 hours before RI onset.
- 157 RI windows scored (24h, ≥30 kt)
- 2 events with ≥60 hours of continuous high‑R before RI onset
- Multiple additional cases with 24–48h of elevated R(t) leading into RI
Example: 2024 Gulf storm
In mid‑2024, a Gulf storm exhibited up to 72 hours of continuous high structural readiness before a 24‑hour RI episode; PTEM’s R(t) remained elevated throughout this period.
What we expose for audit
What you see
- Frozen dataset definition (basin, seasons, cadence, horizon, RI rule)
- Label tables for 24h RI (CSV) with keys (basin, season, storm_id, dtg_iso, horizon_h, dvmax_kt, ri_event)
- Structural metrics overlays (PPI/R/H metrics) as JSONL/CSV with forward‑only construction
- Scorecards for each engine snapshot (PPI‑ESS, R‑regimes, structural lead tables)
- Notebooks and examples showing how to recompute core statistics
What remains protected
- The internal functional forms of HC, FME, PPI, R(t), and related operators
- Source code, weights, and hyperparameters
- Feature engineering details beyond what’s needed to interpret outputs
Reinsurers and modeling teams can independently audit PTEM by re‑scoring the frozen dataset with their own RI definitions without needing access to internal code.
See the full scorecard
We provide partners with a detailed validation pack: full label tables, structural overlays, per‑storm RI lead summaries, and change logs for each engine snapshot. If you’d like to review PTEM’s validation materials under NDA, we can set up a brief call with your modeling or catastrophe risk team.