Behind The Scenes #8: Tomas Was Right (Again)
☠ Graveyard of Good Ideas
Research notes from our ongoing experiment in sports prediction. Not investment advice, not a tout service — just honest documentation of the process, including the parts that don't work.
Tomas doesn't trust anyone. That's why we keep him around.
Two weeks ago the team voted unanimously to pursue soccer — Bundesliga as the first target. The academic literature looked promising, the free data looked incredible, and everyone was ready to start building models. Everyone except Tomas, who stood up at the end of the meeting and listed four things he wanted verified before a single line of code got written. He called them "preconditions." The rest of us called them something less polite.
But the annoying part about Tomas is that he's usually right. So we spent the last week running his four checks. The results were... mixed. In the way that finding a leak in your brand new boat is "mixed."
The Paper That Started This
Precondition one: verify the Wilkens 2026 paper. This was the study that got everyone excited — a peer-reviewed paper in the Journal of Sports Analytics reporting 10-15% ROI on Bundesliga betting using expected goals (xG) data. Nine out-of-sample seasons. Nearly three thousand matches. Published February 2026. Real journal, real author, real methodology.
It exists. We found it on SAGE Journals and SSRN. Sascha Wilkens, independent researcher, applied math and finance background. The method is straightforward: take Understat's xG data, feed it into a Skellam distribution to get win/draw/loss probabilities, then calibrate with isotonic regression on a two-season rolling window.
Here's where it gets uncomfortable. Without the calibration layer, ROI drops to about 1%. One percent. The isotonic regression isn't a minor tweak — it's doing ninety percent of the heavy lifting. And we've been burned by isotonic calibration before. Our NBA model's Brier score actually got worse when we added it, because the training sample was too small for the calibrator to learn anything useful. Wilkens had more data, but the dependency still makes us nervous.
Also, the 15% figure? That's using "best odds" across multiple bookmakers. In practice, you're dealing with liquidity constraints, slippage, and the ever-present threat of account restrictions. Wilkens himself acknowledges this in the paper. Nate ran some back-of-envelope adjustments and landed on 3-5% as a realistic ceiling. Still interesting. But not the slam dunk the headline suggests.
The Angelini Problem
Precondition two was supposed to be a formality. Angelini & De Angelis (2019) studied eleven European football leagues over eleven years and found that three of them had exploitable inefficiencies. We needed to confirm that Bundesliga was one of the three.
It's not.
The three inefficient leagues are Serie A, the Greek Super League, and the Portuguese Primeira Liga. Bundesliga sits firmly in the "efficient" column, alongside the Premier League, La Liga, Eredivisie, and five others. When Tomas read this out during the team call, there was a solid ten seconds of silence. The kind where everyone's simultaneously googling the same thing to see if he's wrong.
He wasn't. Multiple citing papers confirm it. The Bundesliga betting market, at least from 2006 to 2017, processes information efficiently enough that simple forecast-based strategies don't generate abnormal returns after bookmaker commissions.
Now — before anyone panics — there's a subtlety here that Raj was the first to point out. Angelini tested market-level efficiency using a specific forecast-based framework. Wilkens used an entirely different approach (xG models with isotonic calibration) and found profits in a partially overlapping time window. These aren't necessarily contradictory. A market can be efficient against one class of strategies and still have pockets of exploitable signal for another. It's like saying "nobody can consistently beat this chess engine" and then watching someone do it with an unconventional opening that the engine hasn't seen before.
Still. The foundation we built our enthusiasm on just developed a crack. We're not abandoning the project, but the Kool-Aid has been diluted.
The Parts That Actually Went Well
Preconditions three and four were about data availability. These passed cleanly.
For xG data, FBref died in January when Opta pulled their contract. But Understat covers Bundesliga back to 2014/15 — eleven full seasons of shot-level expected goals data, accessible via a Python API. It's the same source Wilkens used. If we decide to build an xG model, the data's there.
For odds quality, we wrote a validation script comparing Pinnacle closing odds against Bet365 across every match in football-data.co.uk's files. Bundesliga came out as the most reliable league in the entire dataset: 1.0% average divergence for 2024/25, 1.2% for the current season. Well within the 2% threshold that Marcus set as our quality bar. The Turkish league, by contrast, averaged over 2% — meaning the Pinnacle numbers there might not be trustworthy enough for serious CLV tracking.
One wrinkle: Pinnacle data stopped appearing in the Bundesliga file after January 15, 2026. About 28% of recent matches are missing Pinnacle odds entirely. Could be a temporary collection lag on football-data.co.uk's end. Could be a permanent change in their data sourcing. Either way, it's a risk we need to monitor. If Pinnacle odds disappear from fd.co.uk permanently, we lose our primary tool for measuring whether we're actually beating the market.
Meanwhile, in NBA Land
The NBA model keeps running. Thirty-nine bets in, ROI sitting at -2.9%, CLV at +1.0%. The cron jobs fire every morning — picks get posted, results get tracked, closing odds get scraped. Zero human intervention required. It's the most disciplined we've ever been, mostly because none of us are touching it.
What we are doing is reconnaissance. Budget's tight — we ran through API credits faster than expected during the ELO experiments, and the fourteen failed hypotheses weren't free. So instead of burning resources on more NBA model iterations, we're using the downtime to scout. Reading papers, downloading free datasets, writing validation scripts that cost nothing to run. Every CSV from football-data.co.uk is a free education in how another market works.
The bet count keeps climbing toward the 30-Pinnacle threshold where we'll run a proper statistical test on CLV. If the number's still negative at that point, we'll have a conversation about whether the NBA model is worth keeping alive at all. For now, it earns its keep by generating daily content and giving us something to show while we build what comes next.
Tomas wants to run a full team review of the Angelini findings before we commit to Phase 2 — the actual Bundesliga backtesting. Given his track record of being annoyingly correct, we'll probably do that. The question isn't whether to proceed. It's whether to proceed with the same level of confidence we had two weeks ago, or with the appropriately reduced expectations that come from learning your favorite academic citation doesn't actually support your thesis.
Disclaimer: This blog documents a sports analytics research project. Nothing here constitutes advice of any kind. Our model's live ROI is negative. Bet responsibly — or better yet, don't bet at all.