Pipeline read PII-prevalence reader Drift-SEV console jeffpinto.com ↑

Synthetic demo · the cheap PII read, per hop

How much inferred-diagnosis PII actually moves through each hop, and what the read costs.

A sampling read recovers each integration hop's true high-sensitivity-PII rate inside a 95% interval for roughly 1% of building a trained classifier first. Here is the per-hop verdict and the cost model, recomputed live. The defaults are illustrative, not a quote.

Per-hop prevalence (sampled, 95% Wilson interval)

Integration hopSampledFlaggedPrevalence95% CIvs target

Target = inferred-diagnosis PII should fall below of sampled records at each hop. Adjudication legitimately holds the most; minimization should shrink the rate downstream so the partner egress carries the least.

The economics of running the read

The read vs. building a classifier. Sampling 2,400 records across 3 hops per cycle costs about $1.2k per cycle. A trained inferred-PII classifier amortizes to about $54k per cycle, so the sampling read is roughly 2% of it. The interval narrows with the square root of N, not N: quadrupling the sample roughly halves the width, so a few-thousand-dollar read is enough to answer "is this hop above our minimization target?" without building anything first.

Sources & method