How much inferred-diagnosis PII actually moves through each hop, and what the read costs.

A sampling read recovers each integration hop's true high-sensitivity-PII rate inside a 95% interval for roughly 1% of building a trained classifier first. Here is the per-hop verdict and the cost model, recomputed live. The defaults are illustrative, not a quote.

Per-hop prevalence (sampled, 95% Wilson interval)

Integration hop	Sampled	Flagged	Prevalence	95% CI	vs target

Target = inferred-diagnosis PII should fall below of sampled records at each hop. Adjudication legitimately holds the most; minimization should shrink the rate downstream so the partner egress carries the least.

The economics of running the read

Sample N (records judged / hop / cycle)

$ / judged record

Classifier build $ (one-time, all hops)

The read vs. building a classifier. Sampling 2,400 records across 3 hops per cycle costs about $1.2k per cycle. A trained inferred-PII classifier amortizes to about $54k per cycle, so the sampling read is roughly 2% of it. The interval narrows with the square root of N, not N: quadrupling the sample roughly halves the width, so a few-thousand-dollar read is enough to answer "is this hop above our minimization target?" without building anything first.

Sources & method

All prevalence figures are SYNTHETIC, generated deterministically in this page; no SmithRx PHI, no real claims data. High-sensitivity classes (HIV antiretrovirals, oncology infusions, behavioral-health maintenance drugs) are inferred from drug-to-condition mapping, which is the privacy point.
Method: a sampling read with a Wilson 95% score interval, the low-prevalence, small-sample regime a PII read lives in. Published at jeffpinto.com/notes/llm-as-judge-pii-economics: a real engagement turned a $10M+/yr classifier problem into a roughly $400-per-cycle prevalence answer at 95% confidence.
Cost model: per-cycle read = sample N times hops times $/judged-record, plus a human calibration subset and a fixed overhead; the classifier line amortizes a one-time build over a year of cycles. Plug in your own token price and build estimate; nothing here is a slogan.
Honest bound: this recovers a synthetic hop's rate, not your real one. The real read happens behind your firewall on your actual hops; that is the internal extension this scopes.