Analysis

Behind The Scenes #1: The ELO Baseline

February 27, 2026 · 3 min read

☠ Graveyard of Good Ideas — Where our hypotheses come to rest in peace.

Let's start with the number nobody wants to talk about: our ELO model's Brier Score is 0.2222. The market's is 0.2065. For the non-statisticians, that means the market is 7.6% better at predicting NBA games than our model. For the statisticians, yes, we've tried everything, and no, it didn't help. We'll get to that.

Metric	Our ELO Model	Market (Pinnacle)
Brier Score	0.2222	0.2065
Permutation Test	p = 0.5138	—
Pinnacle CLV (3 bets)	-1.5%	—
Live ROI (33 bets)	+3.84%	—

That +3.84% live ROI is the number we clung to for about a week before the permutation test showed it's statistically indistinguishable from a coin flip. Hope is a hell of a drug.

ELO is a chess rating system we duct-taped onto basketball. Teams start at 1500, gain or lose points after each game, and the rating difference converts to a win probability. K-factor of 30, 50-point home advantage, margin-of-victory multiplier. It's a perfectly reasonable system that loses to the market for a perfectly obvious reason: the market cheats.

Not literally. But ELO knows exactly one thing — who won and by how much. The market knows that, plus injury reports, travel schedules, back-to-back fatigue, lineup changes, coaching adjustments, and the combined wisdom of every sharp bettor with a bankroll. It's a solo guitarist competing against a full orchestra. The solo might be technically impressive, but it's still losing.

There's also the K-factor problem. K=30 for every single game. Season opener after a blockbuster trade? K=30. Mid-February Tuesday between two teams that haven't changed since October? K=30. It's like setting your thermostat to 72° and never touching it again regardless of whether it's July or January. Systems like Glicko-2 adjust dynamically. Ours just... doesn't.

But the number that really kept us up at night was the edge distribution:

Edge Band	Bets	Win Rate	ROI
10-15%	7	57.1%	+15.5%
15-20%	15	40.0%	-6.8%
20-25%	11	36.4%	-21.2%
25%+	6	33.3%	-13.6%

The more confident the model is, the worse it does. When ELO screams "THIS IS A SURE THING," the correct response is apparently to bet the other side. The model doesn't just lack confidence calibration — its confidence is inversely correlated with reality.

Underdogs were even more brutal:

Odds Range	Bets	Win Rate	ROI
Favorites (< 2.0)	23	47.8%	+12.4%
Underdogs (2.0-3.0)	11	18.2%	-52.1%
Big Underdogs (3.0+)	8	12.5%	-56.3%

Favorites: decent. Underdogs: a bloodbath. If the model says "this underdog has value," the correct translation is "this team is about to lose by 20."

Oh, and we also found three bugs in the pipeline that were actively hiding how bad things were:

The books_count filter was letting single-bookmaker bets through (the equivalent of asking one person for directions in a foreign country and calling it a consensus)
CLV matching had dropped to 50% from a timezone bug. Half our measurements were just... missing
Result matching was failing 38% of the time on multi-game files. We were grading ourselves on an incomplete exam

All fixed now. Which is great, except fixing the thermometer just confirmed the patient's temperature is worse than we thought.

Next up: testing whether ELO's disagreements with the market contain any residual signal. A Bradley-Terry model with market odds as baseline. If there's a valley where ELO sees something the market doesn't, we'll find it.

Strict exit criteria: Pinnacle CLV below +1.0% after 30 clean bets = ELO betting stops. No appeals.

Disclaimer: This content is for informational and educational purposes only. Nothing here constitutes financial or investment advice.

All content is free — always.