Coverage line No. III · Model Performance Warranty

The eval-anchored performance bond.

The Model Performance Warranty is a first-party line. Where the other three lines pay third-party claims, this one pays the insured directly when a declared model fails to perform within an agreed band on a Castra-shared evaluation set. It is the financial guarantee a buyer offers their own customers, underwritten by us.

This is the line a buyer reaches for when their customer contract carries a service-level commitment denominated in model accuracy, recall, latency, or refusal rate. It converts that commitment from a balance-sheet exposure into a known premium and a known indemnity schedule.

§ I · Definition

What we mean by a warranty trigger.

Form § 2.1

Definition § 2.1.a

A warranty trigger is the sustained breach of a declared performance threshold on the Castra-shared evaluation set, measured over a declared rolling window, where the breach is not attributable to an excluded cause.

— Castra Form 03-MPW-MMXXVI, § 2.1.a

Warrantable metrics

Top-1 / Top-k classification accuracy
Recall on a declared minority class
F1 on a structured-output task
Refusal rate on a guardrail eval
p50 / p95 latency on a declared inference profile
Hallucination rate on a closed-book QA eval
Translation BLEU / chrF on a held-out corpus

Out of scope — refer to other lines

Third-party financial loss → AI Agent E&O
Bodily injury & property damage → Autonomous Systems
Regulatory inquiry on the same model → Regulatory Defense
Provider outage of the underlying inference API → cyber tower
Insured's intentional change of the eval set post-bind
Adversarial inputs the eval set was not designed to capture

§ II · Thresholds & indemnity schedule

What we warrant, and what we pay.

Bind-ready terms

Tab. 01 Threshold & window by metric. Sample — bound to specific eval

Metric	Threshold	Window	Cure period	Telemetry tier
Top-1 classification accuracyDeclared at bind from the eval canary.	≥ 0.92	7-day rolling	10 days	Tier A
Minority-class recallFor fairness-sensitive deployments.	≥ 0.85	14-day rolling	14 days	Tier S
Structured-output F1JSON, function-call, schema-bound.	≥ 0.95	7-day rolling	10 days	Tier A
Guardrail refusal rateOn the declared red-team prompt set.	≥ 0.99	3-day rolling	5 days	Tier S
Latency p95On the declared inference profile.	≤ 1,200ms	1-day rolling	3 days	Tier A
Closed-book hallucination rateOn the contract-Q&A eval.	≤ 0.04	14-day rolling	14 days	Tier S

Source: Castra Form 03-MPW-MMXXVI, sample thresholds. The operative threshold and window are bound at quote to the insured's specific evaluation set.

Tab. 02 Indemnity schedule. Form § 4

Loss category	Covered	Sublimit	Notes
Customer credit issued under SLAPro-rata or fixed, per contract.	Yes	100% of limit	Insured's contract schedule controls.
Remediation engineering costHours-and-rates ceiling agreed at bind.	Yes	25% of limit	Capped at the ceiling, paid against invoice.
Forensic eval costIndependent re-evaluation of the breach window.	Yes	$150K	Castra panel or insured's pre-approved firm.
Replacement-model evaluationValidation of the fallback or successor model.	Yes	$100K	Where the breach requires model swap.
Reputational make-wholeNegotiated customer goodwill credit.	By endorsement	$250K	Requires prior carrier consent at issue.
Lost margin on terminated contractsWhere breach precipitates customer exit.	By referral	10% of limit	Evaluated case by case. Documented cause.

Source: Castra Form 03-MPW-MMXXVI § 4. Operative schedule bound at quote.

§ III · Telemetry contract

The shared eval, continuously run.

Annex A

This line binds against a Castra-shared evaluation set executed continuously by the insured against the warranted model. Castra ingests only the metric output and the integrity signature of the eval run — not the eval inputs, the model outputs, or the model weights. The eval set is the contract.

The eval set is constructed jointly at bind, sized between 5,000 and 25,000 items depending on metric, and frozen for the policy period. Castra hosts a signed copy and verifies execution-integrity hashes monthly. The insured may add a private hold-out at any time.

The methodology and the three instruments are described in detail on the Underwriting page.

Tier A / S · Required streams

Continuous eval runSigned metric on the shared eval set.

Hourly

Eval integrity manifestSigned hashes of inputs and code.

Monthly

Model version pointerActive deployment manifest.

As-changes

Inference profile snapshotHardware, batch, quantization.

Monthly

Breach noticeThreshold breach within 24 hours.

As-occurs

Cure attestationInsured-signed cure log within window.

On cure

§ IV · Claim example

A sample loss, worked end to end.

Composite, anonymized

Claim sample No. C-03 · Structured extraction · legal-tech · Limit $3M · Retention $50K

The extractor that missed the schedule.

A legal-technology platform served a contract-extraction model to enterprise law firms under a service-level commitment of F1 ≥ 0.95 on the customer's contract schedule. The model had warranted at 0.96 on a 12,000-item shared eval set. On a Sunday, the upstream model provider rolled out a structural change to its embedding API. The platform's downstream extractor began returning malformed schedules. F1 on the shared eval fell to 0.83 within 14 hours and held there for three days before the team responded.

The breach triggered automatic SLA credits across 18 customers totalling $1.4M. The platform's engineering team rebuilt the extractor against the new embedding shape; the cure landed on day 9, inside the 10-day window for structured-output F1.

The warranty is not against the model. It is against the metric. The contract is between the metric and the customer.

Coverage attached under § 4.1 for the SLA credit ($1.4M) and § 4.2 for remediation engineering (180 hours at the agreed ceiling, $96K). The forensic eval ran in parallel at $52K under § 4.3. The cure-attestation requirement was met on day 9. No reputational-make-whole was triggered.

Indemnity paid: $1.55M total. Defense not applicable — first-party. Retention $50K applied. The placement renewed at the next anniversary at +6.8% on the telemetry-priced base, with an additional embedding-version pointer added to the dependency-graph stream so the next upstream change triggers a 24-hour notice instead of a 14-hour blind spot.

Note. This claim example is a composite, drawn from sample patterns. It is not based on any single insured. Amounts are illustrative.

§ V · Other lines

The three adjoining lines.

All four available in tower

No. I · Autonomous Systems

Through discipline, command.

One submission per placement. Six business days to bind.

Request a quote Broker submission package