| Rank? | Claim? | Patient? | Payer? | State? | Age? | ? Contracted | ? Score | Recommended Action? | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CLM25149 | Rodriguez, Sofia | Kaiser Permanente | s₄ Denied | 5d | $11,778 | $1,194 | Resubmit corrected claim | Why? |
| 2 | CLM23788 | Williams, Marcus | Anthem | s₄ Denied | 11d | $10,612 | $1,075 | Resubmit corrected claim | Why? |
| 3 | CLM24317 | Patel, Anika | United Healthcare | s₄ Denied | 9d | $10,296 | $1,043 | Resubmit corrected claim | Why? |
| 4 | CLM24513 | Johnson, Terrence | Cigna | s₄ Denied | 8d | $9,794 | $993 | Resubmit corrected claim | Why? |
| 5 | CLM25031 | Nakamura, Yuki | ACME Health Plan | s₄ Denied | 5d | $9,630 | $976 | Resubmit corrected claim | Why? |
| 6 | CLM22847 | O'Brien, Kathleen | Cigna | s₃ Pending | 14d | $9,120 | $73 | Upload requested records | Why? |
| 7 | CLM23384 | Torres, Miguel | Humana | s₄ Denied | 13d | $8,642 | $876 | Resubmit corrected claim | Why? |
| 8 | CLM25270 | Davis, Angela | Blue Shield | s₄ Denied | 4d | $8,522 | $864 | Resubmit corrected claim | Why? |
| 9 | CLM21903 | Kim, Joon-ho | Blue Shield | s₅ Adjudicated | 3d | $8,340 | $133 | Review contract terms | Why? |
| 10 | CLM21179 | Fischer, Emma | Aetna | s₄ Denied | 24d | $8,028 | $814 | Resubmit corrected claim | Why? |
| State? | Payer? | Reason? | ? Touches (n) | ? Progressed (k) | ? α̂ | 95% credible interval · prior? | Status? |
|---|
| Claim? | Cell? | ? Day in holdout | ? Contracted | Filing deadline? | Outcome? |
|---|
thompson-sampling.md; this page covers the intuition.Every business day we have 320 collector touches to allocate across 3,847 open claims. Some claims are slam dunks — high contracted dollars, denial reason we know how to fix, payer that responds well to resubmission. Others are uncertain — small sample size, unclear whether intervention pays off.
This is the classic explore vs. exploit dilemma:
Thompson sampling resolves this tradeoff automatically and in proportion to uncertainty — without an ε knob to tune, without ignoring what we don't know.
Each business day, every open claim either makes progress or it doesn't. Some progress happens on its own — the payer adjudicates a pending claim, an accepted claim moves into payment. Other progress requires a collector to step in — a denial gets corrected and resubmitted, records get uploaded for a pending request.
α answers one specific question:
That's it. α is a number between 0 and 1, and it depends on the cell.
Take a denied claim. On any given day, untouched, it has these natural transition probabilities (the passive kernel, hardcoded in the simulation engine):
The touch doesn't add new edges or change the paid / written-off rates. It shifts probability mass from "stuck" to "progress", by an amount equal to α times the dwell probability. Row-by-row:
The arithmetic: with α = 0.60 and dwell = 0.68, the touch redirects 0.60 × 0.68 = 0.41 of probability mass. That 0.41 comes off the self-loop and lands on the progress edge. So the size of α literally is the size of the redirection (as a fraction of dwell), which is why we call it the touch transition probability.
Different claims behave differently under a touch. α depends on three things:
Each (state, payer, reason_code) combination is a cell. Collector-IQ keeps one α estimate per cell, learned from observed touch outcomes.
Today the system ships with pooled engineering priors — α = 0.10 for s₂, 0.50 for s₃, 0.60 for s₄ and s₅, identical across all payers and reason codes. That's a launch fudge. Real Cigna CO-16 denials don't behave like real Anthem CO-197 denials. Replacing pooled priors with cell-specific learned values is the entire point of the learning program.
How do you talk about a quantity you don't fully know? Not with a single number — with a probability distribution over what you think the value might be. The right distribution for binary outcomes (touch worked / touch didn't) is the Beta distribution.
A Beta distribution has two shape parameters, written Beta(a, b):
a−1 ≈ the count of touches you've seen succeed (claim progressed).b−1 ≈ the count of touches you've seen fail (claim stayed put).a / (a + b) — your best point estimate.n = a + b grows, the distribution narrows.Three illustrative shapes:
Wide distribution = uncertain. Narrow distribution = confident. Each cell on the Learning Dashboard's posterior table is a Beta distribution like one of these.
Thompson sampling is one rule, applied each day:
Score = Contracted Amount × α̃ using its cell's draw.a by the count that progressed and increment b by the count that didn't. The Beta posterior tightens.That's it. The exploration happens automatically: cells with wide distributions occasionally produce high random draws that push their claims into the top K, even when their mean isn't the highest. Cells with narrow distributions produce predictable draws that act like exploitation.
Two cells from today's posterior table. Both are s₄ Denied with reason code CO-16. Suppose both have a $10,000 claim awaiting work and we have one open touch slot.
Kaiser's empirical mean (0.75) is higher than UHC's (0.68). A greedy policy would always pick Kaiser. Thompson doesn't.
On any given day, both cells produce a random α draw. UHC's draws cluster tightly. Kaiser's draws spread widely. Most days, UHC wins because its draws are reliably high. Some days, Kaiser draws above 0.85 and wins — in proportion to how plausible it is that Kaiser's true α really exceeds UHC's. Each Kaiser win produces a new touch outcome and tightens its posterior.
After enough touches, Kaiser's distribution narrows around its true value — whatever that turns out to be. If it really is 0.78, Thompson plays Kaiser preferentially. If it's actually 0.55, Thompson abandons Kaiser. Either way, the policy self-corrects.
The simplest policy — "always pick the cell with the highest empirical mean" — has a clear failure mode. Imagine a brand-new cell that has been touched twice and progressed both times. Empirical mean: 1.0. A greedy policy would pour the entire 320-touch budget into this cell. But two samples is two samples — the true α might be 0.4, and we'd waste the day.
Three approaches handle this differently:
The full pipeline as a single loop:
(state, payer, reason_code).Beta(a, b) posterior.a and b by the counts.(a, b) pair, the convergence trajectories as they tighten over time, the coverage map showing where touches concentrate, and the cumulative dollar value the learning has produced over the day-0 prior.
thompson-sampling.md — full mathematical treatment, including extensions, pitfalls, and theoretical regret bounds.learning-dashboard.md — design rationale and implementation plan for the Learning Dashboard panel.