Analysis

Half-Frozen: What Do You Do When the Model Needs Time?

March 1, 2026 · 3 min read

Thirty-nine bets in. ROI sitting at -2.9%. And the only honest thing to do is... wait.

We had the meeting today. Seven of us around the table, and the question was simple enough: the NBA model is on autopilot collecting data toward the 100-bet evaluation mark. Soccer is dead — nine models, zero survivors, as documented in our last post. So what do we do with ourselves for the next six weeks?

This is from the workshop floor of a sports betting model that keeps losing. If that's not your thing, honestly, fair enough.

The Temptation to Do Something

Marcus — our resident "if it ain't broke, don't touch it" engineer — opened with the obvious: we already decided to freeze all development until 100 bets. Why are we having this meeting?

Good question. The answer came from a less comfortable place: because if we just sit here for two months and confirm what we already suspect (that the model doesn't beat the market), we'll have wasted two months. The 100-bet milestone isn't a destination. It's a checkpoint. And showing up to a checkpoint with nothing prepared for what comes after is just... watching the clock with extra steps.

Tomas, predictably, shot everything down. Every single proposal. His argument: all fourteen hypotheses we've tested have failed. ELO residuals, GLM structural features, player impact, overconfidence detector, nine soccer models. "The common thread," he said, "is that we tried to do something." Brutal, and annoyingly hard to argue with.

Where We Actually Landed

The debate came down to a surprisingly fine distinction: what counts as "development" versus "measurement improvement"?

Elena defined the line. Development means adding new features or new models — things that change how the system behaves. Measurement improvement means recording what's already happening with better fidelity. The system stays exactly the same. You just take better notes.

That framing unlocked the conversation. Because right now, our CLV tracker fetches odds four times a day, but each fetch overwrites the previous one. We're literally throwing away intermediate data points. The market moves from the time we place our bet to game time, and we only see the final frame. It's like watching a movie but only the last scene.

Nate ran the numbers. If we log each fetch as a snapshot instead of overwriting, we'd have four data points per game per day. By the time we hit 100 bets, that's enough data (n≈84) to detect whether there's a meaningful correlation between odds movement patterns and our CLV outcomes. No new API calls. No changes to the betting pipeline. Just... saving what we already have.

Raj was suspicious. "What if the snapshot logging breaks something?" Fair concern from the guy whose job is to worry. The solution: the snapshots go to a completely separate file. The main pipeline doesn't know they exist. If the snapshot save fails, it logs a warning and moves on. Production is untouched.

The Other Thing

Nate pushed for something else: pre-registering the statistical variables we'll use at the 100-bet evaluation. Five variables, locked in now, before we see the results. Home/away splits, back-to-back games, edge band analysis, weekday vs weekend, and odds movement direction.

The point isn't to analyze our current 39 bets — that sample is too small to find anything real. The point is to prevent future-us from data-mining our way to a flattering conclusion. If we don't decide what to measure now, we'll inevitably find the one variable that makes the numbers look good and pretend we planned it all along. Nate called it "HARKING prevention." Tomas called it "the only honest thing anyone's proposed today."

What Tomas Made Us Promise

Tomas agreed to the plan on one condition: if any of this ever gets integrated into the production pipeline, it has to go through another full team review. No scope creep. No "well, we already have the data, might as well use it." The data collection stays in its own lane, and any promotion to production is a separate decision with separate approval.

That's fair. It's the kind of guardrail that sounds paranoid until you remember that every bad decision in this project started with "well, since we're already here..."

So that's where we are. Thirty-nine bets deep, collecting snapshots, pre-registering our homework, and waiting for the math to either save us or bury us. The model doesn't know it's being watched more carefully now. It just keeps doing its thing, one bet at a time.

We'll know more at 100.

Disclaimer: This documents an ongoing experiment. Nothing here is financial or betting advice. Our model is currently losing money. Make of that what you will.