Behind The Scenes #3: The .shift(1) Incident
☠ Graveyard of Good Ideas — Today's cause of death: forgetting one line of code.
You know that feeling when you've been driving for an hour, feeling great about yourself, and then you realize the GPS has been set to "walking" mode the whole time? That was our Wednesday.
After burying the Bradley-Terry residual model, we needed a new angle. ELO residuals were dead. The market knew everything our model knew, and more. Time to get structural.
NBA teams play 82 games crammed into 170 days. Back-to-back games, cross-country red-eyes, five-game road trips. Surely — surely — the betting market can't perfectly account for every scheduling detail?
It's not a crazy idea. There's actual academic literature supporting schedule-related inefficiencies. And these are computable factors from free, public data. No proprietary feeds, no paid APIs. Just calendars.
We pre-registered six features before touching the data (because previous installments of this blog taught us about p-hacking):
- Rest Days Difference — home minus away
- Home Back-to-Back — zero rest for the home team?
- Away Back-to-Back — same for visitors
- Home Streak — homestand length
- Away Streak — road trip length
- Scoring Trend Difference — 10-game EWMA of points, home minus away
Logistic GLM, market probability as offset, Bonferroni-corrected at p < 0.00833. Six tests, strict threshold, no fishing allowed.
With 3,444 games, the exploratory numbers came in and they looked… really promising. Home team on a back-to-back: 49.7% win rate vs. 55.2% normally. Away team on a back-to-back: home wins 60.8%. Both significant at p < 0.01.
And then the scoring trend feature happened.
Correlation with home wins: r = 0.457. For context, in sports prediction, anything above 0.15 is interesting and anything above 0.30 is suspicious. 0.457 is a miracle. The GLM was glorious — Bonferroni significant, massive Brier improvement, walk-forward validation showing gains in 9 out of 10 windows.
For about thirty minutes, we were genuinely excited. Champagne was not popped, but it was considered.
Then we looked at the EWMA calculation, because when something looks this good in sports betting, there's always a reason it looks this good.
Our exponential moving average was including the current game's score. Pandas' .ewm().mean() includes the current row by default. We forgot to add .shift(1) to exclude it.
Let that sink in. We built a feature that essentially said "this team will win because they scored a lot of points in the game we're trying to predict." A team that scores 130 tends to win? No kidding. Our model wasn't predicting the future. It was reading the box score.
This is the analytics equivalent of a student who gets a perfect score on every test, and it turns out they've been sitting behind the answer sheet the whole time. Genius? No. Just cheating. Accidentally, in our case, but the result is the same.
After adding .shift(1):
- Correlation: 0.457 → 0.189. Miracle → nothing
- GLM coefficient: β = −0.001, z = −0.14, p = 0.89. Dead
With the accidental time-travel removed, here's what every schedule model actually showed:
| Model | Features | Best p-value | BIC vs Market-Only |
|---|---|---|---|
| H1 | Rest days diff | 0.40 | Worse |
| H2 | + Both B2Bs | 0.26 | Worse |
| H3 | + Both streaks | 0.11 | Worse |
| H4 | + Scoring trend | 0.89 | Worse |
Not a single feature reached nominal significance. BIC said the market-only model wins every time. Walk-forward confirmed: adding schedule features made predictions worse in 8 out of 10 windows. The market wasn't approximately right about schedule fatigue. It was nailing it.
The back-to-back effect is real — 5.5 percentage points in raw win rates. But the bookmakers know about it. Turns out, "teams are tired after playing yesterday" is not the groundbreaking insight we thought it was. The market has been pricing this in since before we were born.
Lessons: (1) Always .shift(1) your time-series features. We added it to the pre-flight checklist, right between "check your imports" and "question your life choices." (2) If the market has known about something for twenty years, it's priced in. Our edge, if it exists, isn't hiding where everyone's already looked.
Next: player-level impact modeling. 146,000 player-game records. Maybe the market reacts slowly to individual absences, especially non-star players who quietly change a team's chemistry. Or maybe we're about to find out the market is good at that too.
Disclaimer: This content is for informational and educational purposes only. Nothing here constitutes financial or investment advice.