Monday, January 9, 2012

Lies, Damned Lies and Statistics: Cricket's Moneyball effect

Ben Roberts

It's my turn to throw my voice into the vortex of latter opinion and desire borne out of the 2011 release of Moneyball. I only recently watched this film, and enjoyed it immensely. It's release has awoken the rest of the sporting world to a concept that was not even new in the period portrayed in the movie, but has been around for over 30 years. Suddenly everyone wants a piece of the action, and to find the killer measurable statistic for their sport of choice that separates the wheat from the chaff.

Baseball is a sport made for such a concept. Without going into much detail, in my opinion baseball lends itself so easily to this analysis due to: 1) being very static in gameplay with players not moving around the field randomly but in a definite order; 2) having direct cause-and-effect relationships in the game play, (for example an out always equals a run saved); 3) Major League Baseball being a market based sport (also replicated in many others but not cricket) meaning that value is something more easily determined as it comes in dollars and cents.

Cricket by contrast does not have such a static nature nor cause and effect relationships. While commentators always say that the best way to restrict scoring is to take wickets this is not an absolute (like baseball) until you talk about the 10th wicket falling. Neither does the sport have a market-based nature, although players are shifting first-class teams more today, the game remains a sport played at the highest level as a regional representative.

The key premise of the theory is stated early in the movie when Jonah Hill's character tells Brad Pitt's character that for years they have been asking the wrong question. They should be trying to buy wins (a direct result of runs scored and restricted) not players. The improvement in statistics themselves had been around for many years, the trouble was the ignorance of the users.

Cricket has a multitude of data already at its disposal. Former Australian coach John Buchanan was known for recording extensive data and this became the norm for most first-class teams. The difficulty is that unlike baseball - where you can name what you want - in cricket, you cannot be as sure. Yes, more runs are important, but in Test matches you need to take wickets also.

So what if we just use such analysis for limited over matches where it's all about runs. Good idea, except last night I saw a rain interrupted T20 match get decided by the Duckworth-Lewis method which relies on wickets in a calculation of a par score. As well, we still seem to value bowling in limited over games, if we are truly only after more runs why not simply stock your team with 11 batsman who can nominally roll their arm over and field well?

The difficulty is that we do not know what the question to ask is; that is, what constitutes total value in a game of cricket? The entire premise of using such statistics is to restrict the questions that you want the statistics answer, unless you want your statistics to prove any and all manner of things. 

To give an extreme example: You have two batsmen, 1 and 2. In traditional statistics both average 36 and have a strike rate of 72 runs per 100 balls. A normal innings therefore for either is to score 36 runs off 50 deliveries.  We have a dilemma: if we need to choose, both look equal - based on traditional measures. Turning more detailed statistical analysis, we find that Batsman 1 gets those runs in 36 singles, where Batsman 2 usually hits 6 sixes (I told you the example was extreme). Which batsman is the more valuable?

My initial reaction is to say Batsman 1 is more valuable in that they turn the strike over to the other batsman giving greater chance for team scoring while they are at the crease whereas Batsman 2 faces a stack of dot balls. But what is the effect on the bowlers? Does the potentially greater runs scored per single ball by player 2 make them more valuable? Unless you know what you really want statistics can tell you anything.

Don't read me wrong - such analysis has every place in the game but requires a liberal amount of common sense to be applied. You can easily measure the worth of two identically skilled players as above. You may use the above analysis in comparison to what the team needs, but you cannot make the clear cut decisions that they can in baseball as there is no single measure of value.

Ed Cowan at his best; (c) Balanced Sports
How would you statistically make the decision (as for the recent Melbourne Test) whether to play an opening batsman Ed Cowan or all-rounder Daniel Christian? To do so compares apples with oranges. In baseball, you can use a standard measure of total value to the team and cut through inconsistencies, in cricket understanding and intuition must still be applied.

I have only a rudimentary understanding of statistic usage, and someone more esteemed than I may be able to prove that there is a methodology escalating statistical analysis beyond being a support category in cricket decision making. But until that time remain wary of the limitations when trying to apply to cricket. Mark Twain believed it was Benjamin Disraeli who said "There are three kinds of lies: lies, damned lies, and statistics." Though it remains historically an un-sourced statement, there is still much truth to it.


  1. Suspicion of stats - the very thing Billy Beane et al exploited so successfully.

    The mistake would appear to be to get ahead of oneself whan considering Moneyball. The first step that those before Beane and de Podesta, the likes of Bill James, noted (following extensive statistical analysis) was disparity between certain valued stats and valued outcomes (wins). They looked for connections between player stats and team effectiveness. What actually gets wins?

    If one applies this to cricket, one needs to look at issues such as the relative importance of batting, bowling and fielding to getting wins. In the long term is a team with great bowling stats more successful than one with great batting stats or vice verca. How big an impact does fielding have? Do we have sufficient tools to measure it?

    Within each discipline, what attributes (as reflected by stats) affect outcomes most? For example, statistical analysis of thousands of games might reveal that a bowler's strike rate was more important in getting wins than bowling average - I use this just as an example. That small insight might lead to a slight re-evaluation of what one looks for in a bowler and, indeed what one coaches for.

    The main issue is that until vast amounts of numbers are crunched no one can know. We can all opine but the conversation will be little more than a pub conversation about the relative merits of x or y or whether a rat would beat a squirrel in a fight...!

    1. I don't think that this post actually promotes suspicion of statistics, but more the fact that with cricket there aren't any (yet) defining statistics which stand out as ultimate value for money; and also - you don't have to pay/spend for those players.

      I completely agree with your premise that analysing reams of data to come up with "moneyball" stats is plausible - but given the propensity in the game for key moments (ie. missed decisions, dropped catches etc) and also because the sheer number of variable involved: pitch conditions, weather, how much the ball spins/swings/cuts ... the possibilities are endless, meaning my personal view (backed up in a few posts in this blog - particularly stuff on Scoring Stat leaders in the European football) is that numbers are good - and are a useful tool - but require context to be fully efficient; they also rarely tell the full story.

      With regards to the relative importance of batting/bowling/fielding in obtaining wins, please stay tuned as there will be a piece coming up in the next week (or two) which defines measures of success and attempts to "bust" statistical myths based around the Australian First Class cricket season.