Spend Your Time on Your Dataset, not Your Model

Most people focus on model architecture and hyperparameter tuning. They'll spend weeks testing different neural nets (actually, don't use neural nets for sports betting - they're terrible and will waste your time. You won't get better results. That's all I'm saying about them. If you're skeptical, go ask ChatGPT - they'll tell you I'm right), XGBoost configurations, or ensemble methods. But here's the truth: that's the easy part, it is really easy. The real time suck and complexity comes from what you feed into these models, FEATURES.

Why Features Matter More Than Models

In sports betting, where markets are efficient, most basic model architectures will perform similarly when given the same inputs. The key differentiator isn't whether you're using LightGBM or some fancy ridic ensemble you spent three weeks tuning - it's how you've engineered your features.

Think about it:

  • Basic stats are available to everyone
  • Standard ML implementations are well documented
  • The math behind common models is widely understood
  • But feature engineering? Its still complicated, with the help of GPT and Claude, it still took me months to crack and code our ELO (strength) metric.
  • A Overly Simplistic Example: Optimizing Team Shooting

    Let's walk through optimizing a team's shooting efficiency (eFG%). Most people just calculate a 10-game average and call it a day. Then they'll blame their model when they're not beating the market. Let's do better.

    Notion Image

    Step 1: Testing Different Calculations

  • Basic average (what everyone does)
  • Median (better with outliers)
  • Exponential decay (recent games matter more)
  • Some other method
  • Step 2: Building Test Models

  • Create models with each calculation method
  • Compare prediction accuracy
  • Check feature importance
  • Verify stability across seasons
  • Step 3: Window Size Optimization

  • Test 9-game windows
  • Test 11-game windows
  • Test 12-game windows
  • Let the data tell us what works
  • Step 4: Further Refinements

  • Adjust for opponent strength
  • Split home/away performance
  • Consider rest days
  • Add season phase context
  • Making It Work

    The key to making this work is staying systematic. Start simple and make one change at a time. Document everything. Understand why changes help or hurt. Test across different seasons and look for consistent improvements.

    Think of feature engineering like composing music. You don't write an entire symphony at once. You work on one element at a time, refining and tweaking until it's right. Then you move on to the next.

    Beyond Single Features

    Once you understand this process, you can apply it to anything. Rest days impact. Home/away performance. Opponent adjustments. Momentum indicators. The process remains the same: start basic, test variations, optimize parameters, validate thoroughly.

    Iteration

    Every feature can be improved through iteration. Take something basic like points scored. You could evolve it into points per possession. Then adjust for pace. Then opponent strength. Then recency weight. Each step making it slightly more valuable.

    Conclusion

    While everyone else is arguing about model architecture or trying to tune the perfect neural network, focus on your features. That's where the money is, everyone and their grandma can use a LLM to code a model today.

    The market doesn't care how complex your model is. It only cares if you can predict better than everyone else. And that starts with better features.