Form 03-MPW-MMXXVI · Bind-ready
Coverage line No. III · Model Performance Warranty
Home/ Coverage/ Model Performance Warranty
Coverage line No. III · Model Performance Warranty

The eval-anchored performance bond.

The Model Performance Warranty is a first-party line. Where the other three lines pay third-party claims, this one pays the insured directly when a declared model fails to perform within an agreed band on a Castra-shared evaluation set. It is the financial guarantee a buyer offers their own customers, underwritten by us.

This is the line a buyer reaches for when their customer contract carries a service-level commitment denominated in model accuracy, recall, latency, or refusal rate. It converts that commitment from a balance-sheet exposure into a known premium and a known indemnity schedule.

§ I · Definition

What we mean by a warranty trigger.

Form § 2.1
Definition § 2.1.a
A warranty trigger is the sustained breach of a declared performance threshold on the Castra-shared evaluation set, measured over a declared rolling window, where the breach is not attributable to an excluded cause.
— Castra Form 03-MPW-MMXXVI, § 2.1.a
Warrantable metrics
  • Top-1 / Top-k classification accuracy
  • Recall on a declared minority class
  • F1 on a structured-output task
  • Refusal rate on a guardrail eval
  • p50 / p95 latency on a declared inference profile
  • Hallucination rate on a closed-book QA eval
  • Translation BLEU / chrF on a held-out corpus
Out of scope — refer to other lines
  • Third-party financial loss → AI Agent E&O
  • Bodily injury & property damage → Autonomous Systems
  • Regulatory inquiry on the same model → Regulatory Defense
  • Provider outage of the underlying inference API → cyber tower
  • Insured's intentional change of the eval set post-bind
  • Adversarial inputs the eval set was not designed to capture
§ II · Thresholds & indemnity schedule

What we warrant, and what we pay.

Bind-ready terms
Tab. 01 Threshold & window by metric. Sample — bound to specific eval
Metric Threshold Window Cure period Telemetry tier
Top-1 classification accuracyDeclared at bind from the eval canary. ≥ 0.92 7-day rolling 10 days Tier A
Minority-class recallFor fairness-sensitive deployments. ≥ 0.85 14-day rolling 14 days Tier S
Structured-output F1JSON, function-call, schema-bound. ≥ 0.95 7-day rolling 10 days Tier A
Guardrail refusal rateOn the declared red-team prompt set. ≥ 0.99 3-day rolling 5 days Tier S
Latency p95On the declared inference profile. ≤ 1,200ms 1-day rolling 3 days Tier A
Closed-book hallucination rateOn the contract-Q&A eval. ≤ 0.04 14-day rolling 14 days Tier S
Source: Castra Form 03-MPW-MMXXVI, sample thresholds. The operative threshold and window are bound at quote to the insured's specific evaluation set.
Tab. 02 Indemnity schedule. Form § 4
Loss category Covered Sublimit Notes
Customer credit issued under SLAPro-rata or fixed, per contract. Yes 100% of limit Insured's contract schedule controls.
Remediation engineering costHours-and-rates ceiling agreed at bind. Yes 25% of limit Capped at the ceiling, paid against invoice.
Forensic eval costIndependent re-evaluation of the breach window. Yes $150K Castra panel or insured's pre-approved firm.
Replacement-model evaluationValidation of the fallback or successor model. Yes $100K Where the breach requires model swap.
Reputational make-wholeNegotiated customer goodwill credit. By endorsement $250K Requires prior carrier consent at issue.
Lost margin on terminated contractsWhere breach precipitates customer exit. By referral 10% of limit Evaluated case by case. Documented cause.
Source: Castra Form 03-MPW-MMXXVI § 4. Operative schedule bound at quote.
§ III · Telemetry contract

The shared eval, continuously run.

Annex A

This line binds against a Castra-shared evaluation set executed continuously by the insured against the warranted model. Castra ingests only the metric output and the integrity signature of the eval run — not the eval inputs, the model outputs, or the model weights. The eval set is the contract.

The eval set is constructed jointly at bind, sized between 5,000 and 25,000 items depending on metric, and frozen for the policy period. Castra hosts a signed copy and verifies execution-integrity hashes monthly. The insured may add a private hold-out at any time.

The methodology and the three instruments are described in detail on the Underwriting page.

Tier A / S · Required streams
Continuous eval runSigned metric on the shared eval set.
Hourly
Eval integrity manifestSigned hashes of inputs and code.
Monthly
Model version pointerActive deployment manifest.
As-changes
Inference profile snapshotHardware, batch, quantization.
Monthly
Breach noticeThreshold breach within 24 hours.
As-occurs
Cure attestationInsured-signed cure log within window.
On cure
§ IV · Claim example

A sample loss, worked end to end.

Composite, anonymized
Claim sample No. C-03 · Structured extraction · legal-tech · Limit $3M · Retention $50K

The extractor that missed the schedule.

A legal-technology platform served a contract-extraction model to enterprise law firms under a service-level commitment of F1 ≥ 0.95 on the customer's contract schedule. The model had warranted at 0.96 on a 12,000-item shared eval set. On a Sunday, the upstream model provider rolled out a structural change to its embedding API. The platform's downstream extractor began returning malformed schedules. F1 on the shared eval fell to 0.83 within 14 hours and held there for three days before the team responded.

The breach triggered automatic SLA credits across 18 customers totalling $1.4M. The platform's engineering team rebuilt the extractor against the new embedding shape; the cure landed on day 9, inside the 10-day window for structured-output F1.

The warranty is not against the model. It is against the metric. The contract is between the metric and the customer.

Coverage attached under § 4.1 for the SLA credit ($1.4M) and § 4.2 for remediation engineering (180 hours at the agreed ceiling, $96K). The forensic eval ran in parallel at $52K under § 4.3. The cure-attestation requirement was met on day 9. No reputational-make-whole was triggered.

Indemnity paid: $1.55M total. Defense not applicable — first-party. Retention $50K applied. The placement renewed at the next anniversary at +6.8% on the telemetry-priced base, with an additional embedding-version pointer added to the dependency-graph stream so the next upstream change triggers a 24-hour notice instead of a 14-hour blind spot.

Note. This claim example is a composite, drawn from sample patterns. It is not based on any single insured. Amounts are illustrative.

Imperium per disciplinam.
Through discipline, command.

One submission per placement. Six business days to bind.