Honest Failure Lab

Our Model Is an Inverse Indicator, Apparently

March 1, 2026 · 2 min read

We spent today testing something we should've tested a month ago: what happens if you only bet when the model is really confident?

The logic seemed bulletproof. Our ELO model's overall track record is mediocre, sure, but maybe that's because we're watering it down with lukewarm bets. Cut the noise, keep the conviction plays, watch the profits roll in.

The profits did not roll in.

Min. Edge	Bets	ROI	Win Rate	p-value
2%	2,935	-17.5%	40.6%	1.000
5%	2,173	-19.0%	39.1%	1.000
10%	1,226	-25.7%	34.4%	0.817

Read that table again. The more confident the model gets, the more money it loses. It's not failing randomly — it's failing with conviction.

Naturally, we broke it down by edge band, because when something is going horribly wrong, the instinct is to zoom in and find the one corner where it's going slightly less horribly wrong:

Edge Band	Bets	ROI	p-value
2–5%	762	-13.0%	1.000
5–10%	947	-10.4%	1.000
10–20%	943	-21.2%	0.967
20%+	283	-40.4%	0.035

That 20%+ row is... something special. A p-value of 0.035 means this isn't bad luck. It's a statistically significant talent for picking losers. We didn't set out to build a reverse oracle, but here we are.

Someone on the team suggested we just bet the opposite of whatever the model says. We laughed. Then we went quiet for a moment because honestly it's not the worst idea we've had this month.

And before anyone asks "maybe it was just a bad season" —

Season	Bets	ROI
2022-23	752	-9.2%
2023-24	684	-30.0%
2024-25	737	-18.9%

Three seasons. 3,444 games. Negative every year. Permutation tests, bootstrap intervals, out-of-sample splits — we threw the entire statistics textbook at it hoping something would stick. Nothing stuck.

This was the last ELO angle we hadn't tried. We'd already killed off linear residuals, structural features (back-to-backs, rest, travel), player impact, and overconfidence detection. That's fourteen hypotheses, all formally tested, all formally dead. At this point our model rejection pipeline is the only thing with a proven track record.

We'd originally planned to test a consensus divergence model next — looking for disagreements between bookmakers. But that data isn't freely available, and spending on a data source when the base model is literally an inverse indicator felt like buying premium fuel for a car with no engine.

Full team meeting is next. Everything's on the table. Meanwhile, the live pipeline keeps running (38 tracked bets, CLV measured daily) because at minimum it's generating excellent content for this blog.

Separately, we're building a stock trading system on the same infrastructure. Different market, independent track. We'll share details when — if — it's less embarrassing than this.

Disclaimer: This content is for informational and educational purposes only. Nothing here constitutes financial or investment advice.

All content is free — always.