BTS #11: Nine Models, Zero Survivors
When the team voted to give soccer one last shot—an Over/Under test using the same Poisson models that had already failed at predicting match outcomes—the optimists in the room were outnumbered roughly seven to zero. But we did it anyway, because "we tried everything" hits different than "we tried almost everything."
The models had already gone 0-for-5 on 1X2 predictions across Bundesliga and Serie A. Dixon-Coles, xG Skellam, a Wilkens-style hybrid—all dead on arrival. The market's Brier score beat every single one of them. Not by a little. The market's ability to distinguish outcomes (Resolution = 0.060) was seven times better than our xG model's (0.009). That's not a gap you close with better calibration. That's a gap that says "you fundamentally don't know what you don't know."
The Last Shot: Over/Under
The logic was simple enough. Our Poisson models produce expected goals for each team (λ and μ). From those, you can calculate P(total goals > 2.5) directly from the probability matrix. No new model needed—just a different question asked of the same math.
Tomas called it "the last bullet in the chamber" and insisted on three locks: half a session maximum, pass/fail criteria defined upfront (ROI > +3%, p < 0.10), and absolutely no follow-up tests regardless of outcome.
Results across four model-league combinations:
| Model | League | ROI | p-value |
|---|---|---|---|
| Dixon-Coles | Bundesliga | -3.97% | 0.915 |
| xG Skellam | Bundesliga | -10.50% | 1.000 |
| Dixon-Coles | Serie A | -3.46% | 0.880 |
| xG Skellam | Serie A | -2.87% | 0.882 |
The xG Skellam result deserves special mention for being spectacularly wrong. It placed 99.9% of all bets on Under. Every match, almost without exception, it thought there would be fewer goals than the market expected. The model wasn't just inaccurate—it was systematically biased in one direction, like a broken compass that always points south.
The Final Count
Nine model-league-market combinations tested across two sports. Nine failures. Not a single one passed Gate 2. The Brier scores told the same story every time: the market knows more than we do.
We held a meeting afterward—seven people, four rounds of debate—and the conclusion was unanimous: soccer is done. The models go into cold storage. The data stays (free data is still free data), but no more development.
The uncomfortable realization, the one Elena articulated most clearly, is that this isn't a soccer problem. It's not even a sports problem. It's an information problem. Public data plus standard statistical models versus a market that aggregates private information from thousands of sharp bettors with real money on the line. The structural disadvantage doesn't change when you switch sports.
What Happens Now
NBA keeps running on autopilot. The ELO model is still making picks, still tracking results. Thirty-nine bets in, ROI sitting at -2.9%. Not dead yet, but not exactly thriving. We set hard exit criteria: if ROI drops below -5%, or Pinnacle CLV goes negative after 50+ bets, or the permutation test comes back above p=0.30 at 100 bets—any one of those triggers, and we pull the plug. No exceptions, no "one more month."
Someone floated the idea of using the models as reverse indicators—if they're consistently wrong, bet the other way. Nate killed that in about thirty seconds with a Bonferroni correction and a reminder about sample sizes. The room moved on.
The honest answer to "what happens now" is: we wait. We watch the numbers. And if the numbers say stop, we stop.