Tuesday, 9 July 2013

Using Probability Theory & Poisson Distribution to win money!

I’ve been gambling casually on football for the past 8 years or so, and not making a great job of it! I’ve had a few decent returns but I’m almost certainly quite a bit down over the total time I’ve been betting.

Most gamblers will be split into one of a few camps. I do it to make a Saturday afternoon watching Soccer Saturday that bit more interesting (who was it that said it mattered more if there’s money on it?!), there are those that hope to win big and there are those that claim to be able to provide all the answers. I’ve seen a lot of these on twitter who claim to have xx% win rates – doesn’t really help having 90% win rate when you’ve picked a 10 team accumulator does it?!

Football is always a game of randomness and it’s so hard to predict with any great accuracy. My current method of gambling is partly between betting with my knowledge, partly through looking at the odds and seeing the teams with lower odds you’d think are good to include in an accumulator (not a good way of doing it as bookies have full control over the odds) and partly through statistics – things like form/league position/goals scored etc.

Why I’ve never decided to look more in depth at the statistical side I don’t know. Given that it is the area I am involved in I should have looked sooner but never really crossed the two paths of performance analysis and gambling until the past year or so.

After a bit of information gathering on the internet I settled on using Poisson distribution to look at previous scores and primarily the home and away goals scored by each team. (Chris Anderson & David Sally touch on this in their new book The Numbers Game)

So using the data from Football Data I have built a statistical model based on working a few things out and giving me the probabilities on a few things across the top leagues in Europe. The models cover 22 different divisions.

  • England (Premier League, Championship, League One, League Two & Conference)
  • Scotland (Premier League, Division 1, Division 2 & Division 3)
  • France (Ligue 1 & Ligue 2)
  • Germany (Bundesliga & Bundesliga 2)
  • Spain (Primera Liga & Segunda Division)
  • Italy (Serie A & Serie B)
  • Holland (Eredivisie)
  • Belgium (Jupiler Liga)
  • Portugal (Primiera Liga)
  • Turkey (Super Liga)
  • Greece (Super League)

From all of these divisions the model takes into account the home goals scored and conceded and away goals scored and conceded (depending on where each team is playing) and using Poisson distribution and probability theory I can find probabilities of each of the following.

  • Home Win
  • Draw
  • Away Win

  • Home Win or Draw (Double chance results)
  • Away Win or Draw (Double chance results)

  • Predicted Score

  • Both Teams to Score
  • Both Teams NOT to Score

  • Expected Goals Under/Over
  • 0.5
  • 1.5
  • 2.5
  • 3.5
  • 4.5

Am I expecting to become a millionaire? No. But I am hoping that the model will greatly help me in my casual betting and so far I’m quite happy with it (I only began using it at the very end of last season and it needs a lot more testing – the hardest part has been waiting until the leagues start back up again!)

It is my aim to use this blog to provide a few updates of how it’s going and look to integrate other things into it, I’m mostly interested in how accurate it is at predicting the H/D/A and scores – the things with the highest odds and least likely to be predictable. For example if we pick a random game from the 1st week of the Premier League Season – West Brom vs Southampton

  • West Brom to Win – 50.28%
  • Draw – 23.98%
  • Southampton to Win – 25.74%

  • Expected Score - West Brom 2 Southampton 1

Not clear cut by any means and just 1 very small example of what the sheet provides.

Hopefully I’ll provide updates on a regular enough basis to be interesting but not turn this blog into how I lost all my money gambling!


  1. Hello! Nice post and blog at all! I am also using poisson to predict the outcomes but I also use overall stats not only home vs away stats.
    My problem is that I don't know how much data I should use for predicting those outcomes. Know I am calculating average goals from current and last season stats but I don't know maybe I should use more data or maybe less?
    How do you think?

    Thank you,
    Kind regards,

    1. Hi John

      Thanks for your comments! I think the thing with most models is that it is very much trial and error. I actually tested the Spanish Leagues on my model with 3 years worth of data and 1 year of data and surprisingly the 1 year data gave more accurate predictions. I think it's possible to use additional statistics but the problems will be finding ones which affect the outcome and being able to gather the stats. I use the info from football-data and they give some additional info, mainly for the top divisions but I've got 23 leagues running and wouldn't even be able to find the shot data for most of them!
      As I said, very trial and error but interested to see how your model turns out! What divisions/countries have you used?


  2. Hi, have you done any backtesting, i.e. using previous season data to seed the averages and then susequent season data to test the model? Football data also has bookmaker odds, and you could then assess whether the model is generating value or not.

    1. Unfortunately I have not been able to do any backtesting due to not having time. Ideally that's what I'm using this season for - just to test the robustness of the model. I have the odds from football-data as part of the background data for my sheet so I can use these to check whether it's generating value but at the moment I'm just seeing whether it's actually predicting things correctly!