Research Phase — All content is currently free. Sign up to get picks and results delivered to your inbox.
Back to all posts
Analysis

Behind The Scenes #5: The Overconfidence Detector

Graveyard of Good Ideas — We knew this one was dead before we started. We tested it anyway.

Before running this experiment, the room was 90% certain it would fail. Tomas gave it 5%. Raj didn't even look up from his coffee. Sora tried to defend it and couldn't get past the first sentence.

So why bother? Because "probably doesn't work" is an opinion, and "definitely doesn't work, here's the proof" is data. We've seen too many maybe-dead ideas resurrect themselves in late-night conversations to leave doors half-closed.

In our earlier analysis, we established that our model is worse than the market (Brier 0.2222 vs 0.2065). Known fact. But when we sliced a 577-game backtest by edge size, this pattern showed up:

Edge BandBetsWin Rate vs MarketROI
0-5%104+6.8%pt+11.6%
5-10%174+2.6%pt+4.2%
10-15%145-1.1%pt-8.2%
15-20%79-0.6%pt-5.8%
20-30%69+10.9%pt+43.9%

A U-shape. Small disagreements with the market: money. Medium disagreements: disaster. Large disagreements: money again? The idea wrote itself — if we could just filter out the 10-20% "overconfidence zone," maybe a losing model becomes a winning one.

Elena drafted an architecture. Nate ran preliminary numbers. For about 45 minutes, the room was cautiously hopeful, which around here is the emotional equivalent of a parade.

Then we actually subjected it to proper testing. Four tests. You need to pass all four.

Bootstrap confidence intervals

Is the 0-5% band's +11.6% ROI real or noise?

95% CI: [-11.0%, +35.2%]. That interval is so wide you could fit a career change inside it. Every band's CI included zero. Not one could prove it wasn't random.

Time-split out-of-sample

Split the 577 games in half chronologically. Does the U-shape appear in both?

Edge Band1st Half ROI2nd Half ROIConsistent?
0-5%+10.4%+12.3%Yes
5-10%-10.3%+16.8%Nope
10-15%-0.5%-14.1%Yes
15-20%-40.3%+50.7%Nope
20-30%+31.7%+81.2%Yes

The "bad zone" at 15-20% flipped from -40.3% to +50.7% between halves. The U-shape is a Rorschach test — you see what you want to see, and it changes depending on which half of the data you're looking at.

Permutation tests

Edge BandROIp-value
0-5%+11.6%0.387
5-10%+4.2%0.644
10-15%-8.2%0.925
15-20%-5.8%0.796
20-30%+43.9%0.019

Only the 20-30% band survived — but 69 bets, and the top 5 winners account for 78% of the profit. Five underdog parlays carrying an entire strategy is less "proven edge" and more "lucky streak with a good story."

Final count

TestResult
Bootstrap CIFAIL — includes zero
Time-Split OOSFAIL — pattern doesn't reproduce
Permutation (0-5%)FAIL — p = 0.39
High-Edge JackknifePASS — stable

One out of four. The U-shape is what happens when you slice data enough times — you'll always find a pattern. It just won't be there next time you look.

Full scoreboard of everything we've tried:

ApproachResult
ELO model (base)Brier worse than market
B2B fatigue correctionMade things worse
Parameter optimizationNo improvement
Spread market entryNegative ROI everywhere
Calibrator tuningData leak (oops)
Bradley-Terry residualsSignal reversed at scale
GLM structural featuresAll non-significant
Player impactFailed out-of-sample
Overconfidence detectorU-shape was a mirage

Marcus, who argued against running this test, said afterward that he sleeps better knowing the door is locked instead of just closed. Tomas said "I told you so" with a level of satisfaction that should probably concern us. But even he admitted: knowing something doesn't work is different from assuming it doesn't work. One costs thirty minutes of compute. The other costs months of "but what if."

Every ELO-based angle has been tested. The research direction has to change now — different model, different sport, or both. We're still tracking live bets (37, hovering around break-even), and the CLV checkpoint is coming. But the flashlight we've been using is out of batteries, and the hallway is dark.


Disclaimer: This content is for informational and educational purposes only. Nothing here constitutes financial or investment advice.

All content is free — always.

Free Member: Email notifications + comments unlocked

Supporter ($4.99/mo): Everything above + you fuel our research

Even a free signup means a lot. It tells us someone's watching.

Sign Up — Free