The eval-anchored performance bond.
The Model Performance Warranty is a first-party line. Where the other three lines pay third-party claims, this one pays the insured directly when a declared model fails to perform within an agreed band on a Castra-shared evaluation set. It is the financial guarantee a buyer offers their own customers, underwritten by us.
This is the line a buyer reaches for when their customer contract carries a service-level commitment denominated in model accuracy, recall, latency, or refusal rate. It converts that commitment from a balance-sheet exposure into a known premium and a known indemnity schedule.
What we mean by a warranty trigger.
- Top-1 / Top-k classification accuracy
- Recall on a declared minority class
- F1 on a structured-output task
- Refusal rate on a guardrail eval
- p50 / p95 latency on a declared inference profile
- Hallucination rate on a closed-book QA eval
- Translation BLEU / chrF on a held-out corpus
- Third-party financial loss → AI Agent E&O
- Bodily injury & property damage → Autonomous Systems
- Regulatory inquiry on the same model → Regulatory Defense
- Provider outage of the underlying inference API → cyber tower
- Insured's intentional change of the eval set post-bind
- Adversarial inputs the eval set was not designed to capture
What we warrant, and what we pay.
| Metric | Threshold | Window | Cure period | Telemetry tier |
|---|---|---|---|---|
| Top-1 classification accuracyDeclared at bind from the eval canary. | ≥ 0.92 | 7-day rolling | 10 days | Tier A |
| Minority-class recallFor fairness-sensitive deployments. | ≥ 0.85 | 14-day rolling | 14 days | Tier S |
| Structured-output F1JSON, function-call, schema-bound. | ≥ 0.95 | 7-day rolling | 10 days | Tier A |
| Guardrail refusal rateOn the declared red-team prompt set. | ≥ 0.99 | 3-day rolling | 5 days | Tier S |
| Latency p95On the declared inference profile. | ≤ 1,200ms | 1-day rolling | 3 days | Tier A |
| Closed-book hallucination rateOn the contract-Q&A eval. | ≤ 0.04 | 14-day rolling | 14 days | Tier S |
| Loss category | Covered | Sublimit | Notes |
|---|---|---|---|
| Customer credit issued under SLAPro-rata or fixed, per contract. | Yes | 100% of limit | Insured's contract schedule controls. |
| Remediation engineering costHours-and-rates ceiling agreed at bind. | Yes | 25% of limit | Capped at the ceiling, paid against invoice. |
| Forensic eval costIndependent re-evaluation of the breach window. | Yes | $150K | Castra panel or insured's pre-approved firm. |
| Replacement-model evaluationValidation of the fallback or successor model. | Yes | $100K | Where the breach requires model swap. |
| Reputational make-wholeNegotiated customer goodwill credit. | By endorsement | $250K | Requires prior carrier consent at issue. |
| Lost margin on terminated contractsWhere breach precipitates customer exit. | By referral | 10% of limit | Evaluated case by case. Documented cause. |
The shared eval, continuously run.
This line binds against a Castra-shared evaluation set executed continuously by the insured against the warranted model. Castra ingests only the metric output and the integrity signature of the eval run — not the eval inputs, the model outputs, or the model weights. The eval set is the contract.
The eval set is constructed jointly at bind, sized between 5,000 and 25,000 items depending on metric, and frozen for the policy period. Castra hosts a signed copy and verifies execution-integrity hashes monthly. The insured may add a private hold-out at any time.
The methodology and the three instruments are described in detail on the Underwriting page.
A sample loss, worked end to end.
The extractor that missed the schedule.
A legal-technology platform served a contract-extraction model to enterprise law firms under a service-level commitment of F1 ≥ 0.95 on the customer's contract schedule. The model had warranted at 0.96 on a 12,000-item shared eval set. On a Sunday, the upstream model provider rolled out a structural change to its embedding API. The platform's downstream extractor began returning malformed schedules. F1 on the shared eval fell to 0.83 within 14 hours and held there for three days before the team responded.
The breach triggered automatic SLA credits across 18 customers totalling $1.4M. The platform's engineering team rebuilt the extractor against the new embedding shape; the cure landed on day 9, inside the 10-day window for structured-output F1.
The warranty is not against the model. It is against the metric. The contract is between the metric and the customer.
Coverage attached under § 4.1 for the SLA credit ($1.4M) and § 4.2 for remediation engineering (180 hours at the agreed ceiling, $96K). The forensic eval ran in parallel at $52K under § 4.3. The cure-attestation requirement was met on day 9. No reputational-make-whole was triggered.
Indemnity paid: $1.55M total. Defense not applicable — first-party. Retention $50K applied. The placement renewed at the next anniversary at +6.8% on the telemetry-priced base, with an additional embedding-version pointer added to the dependency-graph stream so the next upstream change triggers a 24-hour notice instead of a 14-hour blind spot.
Note. This claim example is a composite, drawn from sample patterns. It is not based on any single insured. Amounts are illustrative.