Analysis

Behind The Scenes #7: Fourteen Failures and a Pivot

March 5, 2026 · 4 min read

☠ Graveyard of Good Ideas

This is my personal archive of failed experiments and rejected hypotheses. A history of setbacks, documented with full transparency so you don't have to repeat them. Feel free to use this data as a reference — or as a cautionary tale.

Note: This article documents a specific phase of our research. The project has since evolved through many iterations — enjoy this as a historical record of the process.

So here we are. Fourteen hypotheses tested on NBA. Fourteen hypotheses rejected. Our ELO model doesn't just fail to beat the market — at higher confidence thresholds, it's a statistically significant reverse indicator. If you'd bet against our model's strongest picks, you'd have made money. That's not just bad. That's impressively bad.

The team sat down last week and made a call: stop throwing good effort after bad. NBA stays on autopilot — the cron jobs keep running, the daily picks keep publishing, the results keep getting tracked. But zero development hours. We're not optimizing a model that's been proven inferior to the market across 3,444 games.

Instead, we're looking at soccer.

Why Soccer, Specifically

This wasn't a random dart throw. Nate pulled together the academic literature, and the picture is genuinely different from NBA. Multiple peer-reviewed papers — Angelini & De Angelis (2019), Koopman & Lit (2015, 2019, 2022) — report exploitable inefficiencies in European football markets. Not theoretical. Actual positive returns in out-of-sample tests.

There's a catch, of course. Winkelmann et al. (2024) found these inefficiencies are short-lived and non-persistent. They exist in individual seasons but don't survive cross-validation across time. Sound familiar? That's basically what happened to every promising signal we found in NBA — looked great in-sample, evaporated out-of-sample.

But there's a crucial difference in the data environment. Our NBA research was crippled from day one by data access. Historical Pinnacle closing odds? Behind a paywall. Multi-bookmaker comparison data? 664 games from Kaggle covering four months. That's not a dataset, that's a napkin.

Soccer has football-data.co.uk.

The Data Environment That Made Us Jealous

Twenty years of CSV files. Pinnacle opening and closing odds. Eight to ten bookmakers per match. Match statistics — shots, corners, fouls, cards. Over/Under, Both Teams To Score, Asian Handicap lines. All of it. Free. Updated weekly. No API credits, no rate limits, no $30/month subscriptions to find out if historical data even exists.

I spent an afternoon downloading CSVs for ten European leagues. Every single one had 100% Pinnacle closing odds coverage. The Turkish Süper Lig file alone has 119 columns per match — more data points per game than our entire NBA pipeline processes.

On top of that, The Odds API (which we already use for NBA) covers 50+ soccer leagues on the free tier. FBref has detailed match statistics. Transfermarkt has squad data. The soccerdata Python package wraps multiple sources into a unified API.

This is what "sufficient data for backtesting" actually looks like. We just didn't know because we'd been staring at NBA's data desert.

The Meeting That Almost Went Too Fast

We ran the Gate 1 evaluation with seven people. The desk research checked all four boxes — academic evidence, free data, match frequency (via multi-league), and infrastructure reuse. On paper, an easy pass.

Then Tomas did what Tomas does. He started asking uncomfortable questions.

"Has anyone actually read the Wilkens 2026 paper? Or are we citing a search engine summary?" Fair point. We hadn't read it. "Angelini says three of eleven leagues are inefficient. Which three? Because if it's the Premier League, Serie A, and La Liga, those are the most efficient markets — and that would contradict the paper's own conclusion." Another fair point.

He had two more. FBref lost its xG data in January 2026 when the Opta contract ended, so half our planned feature engineering disappeared overnight. And there are reports that football-data.co.uk's Pinnacle data quality has been declining since July 2025 — which would undermine the single biggest advantage of the soccer pivot.

The room got quieter. Raj backed him up on the data quality concern. Marcus, who'd been cautiously optimistic, shifted to "let's verify before we commit." Even Elena, who had a full Dixon-Coles model architecture sketched out, agreed to pump the brakes.

What We Actually Decided

Not "let's build a soccer model." Not yet. The unanimous decision was a four-phase plan that doesn't touch a single API credit until June:

March-April: Resolve Tomas's four prerequisites. Get the actual papers. Verify the claims. Quantify the data quality. Find xG alternatives.

April-May: Design the backtest using football-data.co.uk CSVs. Dixon-Coles base model. Walk-forward validation with rolling three-season windows. All offline, all free.

June-August: Formal Gate 2 backtest. This is where we spend API credits (capped at 50). Raw edge > 2% and permutation p < 0.10, or we kill it.

September onwards: If — and only if — the backtest survives, we go live with 30 bets on the Bundesliga. CLV > 0% or it's dead.

Primary target: Bundesliga. Secondary: Eredivisie. Serial execution — we finish one before starting the other. No more parallel track sprawl.

Ada pointed out that Phases 1 and 2 cost literally nothing. Worst case, we spend two months reading papers and crunching CSVs, learn that soccer markets are just as efficient as NBA, and move on. That's research, not waste. The expensive commitment doesn't happen until we've verified the foundations.

Nate ran the numbers on statistical power: 306 Bundesliga matches per season, three-season rolling window gives ~918 matches for training. That's enough for meaningful permutation tests — unlike our NBA situation where 664 Kaggle games was all we had.

The ELO chapter taught us one expensive lesson: don't build on unverified assumptions. This time, we verify first.

Disclaimer: This is research documentation, not financial advice. Our track record so far is 38 bets at -2.9% ROI. We are not profitable. We may never be profitable. Bet responsibly.

Why Soccer, Specifically

The Data Environment That Made Us Jealous

The Meeting That Almost Went Too Fast

What We Actually Decided

All content is free — always.