Tag

Cricket

Browsing

This original paper was published in the ‘International Journal of Statistics and Applied Mathematics’

Yash Mantri

20th May 2021

Note: IPL 2021 has been suspended during the time of writing this paper due to Covid-19 and is expected to resume when conditions are safer.

Abstract

Founded in 2007, the Indian Premiere League (IPL) is one the most watched sports leagues in the world. The event is followed and enjoyed by millions of Indians and cricket lovers across the globe.

This study determines the quantitative factors that have played a role in winning matches in the last three seasons of the IPL. This is done using statistical tools such as Karl Pearson’s and Multiple Correlation along with regression analysis and the t-test. The qualifiers and finalists of the IPL 2021 are predicted using the binomial distribution with the help of past winning percentages. Cricket is not only governed by numerical factors; therefore, the predicted qualifiers are further investigated for a qualitative factor analysis using the chi-square test and rank correlation. This overall analysis successfully predicts the team that has the highest chance of winning the IPL 2021 considering both quantitative and qualitative factors.

Key Terms

IPL, Bowling average, Net Run Rate, Coefficient of correlation, Rank correlation, Binomial distribution, qualitative and quantitative factors of a team.

1.1 Introduction

Cricket, also known as the Gentleman’s Game, is currently the world’s second most popular sport and loved by people from all around the globe.

The first reference to cricket can be traced back to the early 1600s in England when it was played in grammar schools, villages, and farm communities. The first official international match was played in 1877 between England and Australia.

Test cricket is the traditional form of the game which has been played since then. It comprises of two innings each and is played over five days. It is known as the pinnacle form because it tests teams over a long duration.

Almost a decade later, in the 1980s, One Day Internationals (ODIs) gained popularity. This is a quicker format of the game with 50 overs played each side and comprises of one innings each. The well-known International Cricket Council (ICC) Cricket World Cup is contested every four years in this format.

Twenty20 cricket, the newest and modern format, revolutionized the game when it was introduced in 2003. It brought with it, rule changes and less overs that saw the beginning of power hitting and a whole new audience. It triggered the adoption of new skill sets and innovations in both batsmen and bowlers. A typical T20 match takes three hours to complete and includes creative batting, skillful bowling, and brilliant fielding. Other than the ICC Cricket Twenty20 championship, many T20 tournaments have emerged over the years including the Big Bash League, Caribbean Premiere League, Super Smash and of course the Indian Premiere League.

The Indian Premiere League (IPL) was founded by the Board of Control for Cricket in India in 2007. The league is contested by eight teams based out of eight different Indian cities- Each team is formed with the help of shuffling to arrive at four foreign players and seven Indian players. This shuffling process is achieved through an auction which takes place every year before the start of IPL. The IPL is the most-attended cricket league in the world and was ranked sixth by average attendance among all sports leagues.

1.2 Aim

To determine the most effective quantitative factors that have played a role in winning matches in the last three seasons of the Indian Premiere League and further analyze qualitative factors of teams.

To make predictions about the qualifiers and winner of the 2021 season using various statistical tools and probability distributions.

2.1 Data Collection (IPL 2020)

TeamMatches WonAverage of top 5 run scorersAverage of top 3 Strike RatesTotal number of 6s hitTotal number of 4s hitDot balls per matchBatsmen with strike rate greater than 140Total 50s per teamBowling Average of top 2 opening pacers from each teamTotal number of wickets taken per match
Mumbai Indians9422.4172.051372224661716.625.9
Delhi Capitals8412144.66882364041420.015.7
Sunrisers Hyderabad7369.8148.42792134231718.95.62
Royal Challengers Bangalore7358135.46431763621418.75
Kolkata Knight Riders7321.8141.79861973721126.35.2
Punjab Kings6373156.139817943214245.07
Chennai Super Kings6308.6148.177518838212285
Rajasthan Royals6310.8160.91310517142311304.3

2.2 Karl Pearson’s Coefficient of Correlation:

The Pearson Product Moment Correlation shows the linear relationship between an independent variable and dependent variable . It is used to find the association between the two variables.

Here, (i) n is the sample size of the dataset and equals 8 (ii) x is the data set of the quantitative values of the factor being analyzed (independent variables) (iii) y is the data-set of the number of matches won (dependent variables)

Note: The data from the IPL 2020 season has been analyzed.

First, the factor of the mean of the bowling average of the top 2 opening pacers from each team has been taken. A player’s bowling average is the number of runs they have conceded per wicket taken. Hence, the lower this value is for a bowler, the better it is considered. The correlation coefficient between the bowling average of top 2 opening pacers from each team and the matches won by that team in that season has been calculated:

TeamBowling Average of top 2 opening pacers from each team (X)Matches won(Y)XYX2Y2
Mumbai Indians16.629149.58276.281
Delhi Capitals20816040064
Sunrisers Hyderabad19713336149
Royal Challengers Bangalore19713336149
Kolkata Knight Riders26.37184.1691.749
Punjab Kings24614457636
Chennai Super Kings28616878436
Rajasthan Royals30618090036
 =182.92=56=1251.68=4349.9=400

Calculating r,

(High Inverse Correlation)

Thus, the correlation between the two variables is a high and inverse relationship. Since a lower bowling average is better for a bowler, that is why a negative association can be seen. As the coefficient of correlation is high (lower than -0.75), it can be assumed that it is a contributing factor in winning matches in the Indian Premiere League.

The same formula has been applied to a number of possible factors and the coefficient of correlation between them has been obtained.

Table

S.No.FactorsCoefficient of CorrelationRemark
1.Average runs scored by top 5 run scorers from each team+0.8073Strong Correlation
2.Average of Top 3 Strike Rates from each team+0.2887Weak Correlation
3.Average number of 6s hit per match+0.4174Weak Correlation
4.Average number of 4s hit per match+0.6405Moderate Correlation
5.Total number of 50s scored by each team+0.6188Moderate Correlation
6.Batsmen in each team with a strike rate greater than 140+0.8504Strong Correlation
7.Average number of dots per match+0.3558Weak Correlation
8.Bowling Average of top 2 opening pacers from each team-0.786Strong Inverse Correlation
9.Bowling economy of top two spinners-0.092Weak Inverse Correlation
10.Total number of wickets taken per match+0.826Strong Correlation

It can be observed from the table that the factors which mostly contribute to winning matches are:

  1. Average runs scored by top 5 run scorers from each team
  2. Batsmen in each team with a strike rate greater than 140
  3. Bowling Average of top 2 opening pacers from each team
  4. Total number of wickets taken per match

These four factors affect the winning of the teams by 61%-72.5%

(Coefficient of determination)

2.3 t-test

The widely used t-test is be used next to establish whether this correlation is true for the population data which includes all T20 cricket matches and other Twenty20 Leagues. The t-test value is calculated using the following formula:

After putting the respective for and ,  a t-test value of 3.959 for the relationship between matches won and batsmen in each team with strike rate greater than 140 was obtained. Our significance level has been taken as .

Null Hypothesis(Ho): The correlation between matches won and batsmen with strike rate greater than 140 is significant in the population data as well.

Alternate Hypothesis(Ha): The correlation between matches won and batsmen with strike rate greater than 140 is not significant in the population data.

Degree of Freedom- The formula for obtaining the value of the degrees of freedom is . A value of 6 as n=8 is obtained and finally  a critical t-test value of 2.447 is obtained.

Hence, as the value of 3.959  exceeds the critical value, the null hypothesis is accepted, establishing the correlation observed in the sample data to be statistically significant  for the population data as well. The same was observed for the rest of the factors as well after repeating the same procedure. Thus, it can be proven that for all other twenty20 leagues, the same factors will contribute in winning. This can even be generalised  to everyday gully cricket T-20 tournaments!

2.4 Regression Analysis

A linear regression model is one that assesses the relationship between a dependent variable and an independent variable. Next, the regression analysis is used to predict the matches won by a team with respect to the independent variable such as the total number of wickets taken by a team per match. The following equation is used  in order to calculate the regression coefficient for the linear regression graph of y against x:

A value of 1.789 for   is used using which the linear regression model expression using the following equation is calculated:

Finally, the equation can be re-arranged to form a simplified equation for y:

Over here: (i) , having a value of 7, represents the mean value of (number of matches won) and (ii) , having a value of 5.23, represents the mean value of (number of wickets taken per match)

Similarly, the regression equations were found for the other three factors as well:

Average runs scored by top 5 run scorers from each team:

Batsmen in each team with a strike rate greater than 140: 

Bowling Average of top 2 opening pacers from each team:  

2.5 Multi-Variable Correlation

The relationship between two variables has been determined using correlation and used regression as a prediction tool to predict matches won in a season depending on certain factors. Multi-variable correlation is a measure of how well a given variable (in this case it is the matches won by a team in a season of the IPL) can be predicted using a linear function of a set of other variables. It was decided that multi variable statistics will be used and the multi-variable correlation of total number of wickets taken per match will be measured. In addition, the bowling average of the top two opening pacers with matches won by a team will be calculated, which is given by the equation:

  where

  • = Correlation between matches won and total wickets taken per match.
  • = Correlation between total wickets taken per match and bowling average of top 2 opening pacers from each team.
  • = Correlation between matches won and bowling average of top 2 opening pacers from team.

The values of the Pearson’s Co-efficient of Correlation have been tabulated in the table below.

FactorsPearson’s Co-efficient of Correlation
Matches won and Total wickets taken per match ()+0.826
Matches won and Bowling average of top 2 opening pacers from team ()-0.786
Total wickets taken per match and Bowling average of top 2 opening pacers from each team ()-0.802

Therefore, in the Indian Premiere League, Total wickets taken per match and Bowling average of top 2 opening pacers together have a positive impact of 72.51% on the Total number of matches won by a team in a season.

Similarly, we can take multiple factors into account and calculate how much they affect winning.

3.1 Net Run Rate

Teams that have a higher overall net run rate in the season tend to be the teams that end up qualifying for the playoffs and the winning teams usually have one of the highest net run rates in the season.

So, what is net run rate? One hears of this terminology many times when it comes down to choosing between two teams that have won the same number of games. The team with a higher net run rate is always ranked higher than a team that has a lower net run rate given that the matches won by both teams are same.

Net run rate is a statistic used in cricket used to put runs scored and conceded in comparison with the number of overs faced and bowled.

Net Run Rate

Now, a table with the average net run rate over 3 years has been calculated and shown below:

TeamAverage Net Run Rate between 2018-20
Mumbai Indians0.615
Delhi Capitals-0.083
Sunrisers Hyderabad0.490
Royal Challengers Bangalore-0.216
Kolkata Knight Riders-0.085
Punjab Kings-0.305
Chennai Super Kings-0.008
Rajasthan Royals-0.427

Thus, it can be observed that the top four teams with the best average net run rate over the last 3 seasons are:

  1. Mumbai Indians
  2. Sunrisers Hyderabad
  3. Chennai Super Kings
  4. Delhi Capitals

IPL 2021 was unfortunately postponed midway due to COVID-19. Most teams have already played 7 out of 14 matches. Looking at the teams above, three out of the four teams with highest net run rates are in the top four teams of the current points table so far.

Thus, the teams that we will further investigate are:

  1. Mumbai Indians
  2. Chennai Super Kings
  3. Delhi Capitals

The aim is to make predictions about the qualifiers and winner of the IPL 2021 once it resumes.

3.2 Binomial Distribution

Cricket is a game of glorious uncertainties. Anything can change at any point in time and that is what makes it so exciting. The IPL has witnessed the breaking of numerous T20 records and many nail-biting matches. An example of this is the match played between Mumbai Indians and Rajasthan Royals in IPL 2014. This is considered to be one of the most dramatic thrillers in IPL history. All the hard work of the season came down to one ball. Mumbai Indians had to score a boundary off the last ball and Aditya Tare helped Mumbai clinch the game with a six, advancing Mumbai Indians to the playoffs leaving Wankhede Stadium in wild celebrations.

In mathematics, uncertainty is measured by probability. The binomial distribution is a common discrete probability distribution used in statistics. Using the winning probabilities of our three shortlisted teams, it is possible to use binomial distribution to calculate the probability of these teams making it to the playoffs and winning. The binomial distribution only counts two states, typically representing success or failure (in this case win or loss) given a fixed number of trials in the data.

The first step is to calculate the winning probabilities of the three teams based on the last three years.

Using this we get,

Mumbai Indians= 59.1 %

Chennai Super Kings= 57.2 %

Delhi Capitals= 52.38 %

Before we use binomial distribution, we need to take into account that before the IPL was postponed , Delhi Capitals had played 8 games and Mumbai Indians and Chennai Super Kings had played 7 games each.

Observing the last three seasons, a team that wins 7 or more matches(given a high net run rate), makes it to the playoffs. In most cases, a team that wins more than 8 matches, goes on to make it to the finals.

Now we can calculate the following probabilities for each team using the binomial distribution:

Mumbai Indians

Games left=7

Games required to win to reach playoffs= 3 or more

Games required to win to play the final= more than 4

Calculating,

Probability (Mumbai Indians will reach the playoffs):

P (Winning 3 matches) + P (Winning 4 matches) + P (Winning 5 matches) + P(Winning 6 matches) + P(Winning 7 matches)=

= 0.89477 = 89.477%

Using the same procedure,

Probability (Mumbai Indians will reach the final) =0.4004= 40.04%

This is the graphical representation of the binomial probability distribution of

Mumbai Indian’s next seven games.

Chennai Super Kings

Games left=7

Games required to win to reach playoffs= 2 or more

Games required to win to play the final= more than 3

Calculating,

Probability (Chennai Super Kings will reach the playoffs) =0.97275= 97.275%

Probability (Chennai Super Kings reach the final) =0.65427= 65.427%

Delhi Capitals

Games left=6

Games required to win to reach playoffs= 1 or more

Games required to win to play the final= more than 2

Calculating,

Probability (Delhi Capitals will reach the playoffs) =0.98833=98.83%

Probability (Delhi Capitals make the final) =0.6997=69.97%

It can be observed that it is very likely that all three of these teams will make the playoffs. The teams that are most likely to win 9 or more matches and reach the finals are Delhi Capitals with a chance of 69.97% and Chennai Super Kings with 65.427%. Thus, it can be predicted that Delhi Capitals or Chennai Super Kings have the highest chance of reaching the final in IPL 2021 currently.

3.3 Qualitative Factor Analysis using Chi-Square Test

Cricket can certainly not only be controlled by quantitative factors. There are numerous other qualitative factors that contribute to the overall performance of a team. These include fielding, fitness, team motivation, home ground advantage and many more.

To investigate whether teams rank differently based on the qualitative factors, it was decided to use a chi-square test. A questionnaire was sent out to a random sample of 100 cricket enthusiasts who had been following the IPL 2021. They were asked to rate teams on a scale of 1 to 5(5 being best) keeping in mind their overall fielding, fitness, and team spirit.

The three teams we will apply the test on are Mumbai Indians, Delhi Capitals and

Chennai Super Kings.

Below shows the responses to the fielding rating of the three teams.

 Rating of 1,2 or 3 (Low)Rating of 4 (Medium)Rating of 5 (High)
Mumbai Indians266311
Chennai Super Kings283636
Delhi Capitals235621

Ho (Null Hypothesis): The rating of IPL teams is independent of fielding performance.

Ha (Alternative Hypothesis): The rating of IPL teams is dependent on fielding performance.

Degrees of Freedom= (Number of rows – 1) (Number of columns – 1) = 4

Level of Significance ()= 5%

Critical value of for 4 degrees of freedom at 5% level of significance is 9.488

Calculating the Chi-square statistic, the value we get is 21.5554.

The graphical display of the chi-square distribution at 4 degrees of freedom.

Conclusion: Since the calculated value is more than the critical value; we can reject the null hypothesis proving that the rating of teams is dependent on fielding performance.

The same procedure was carried out for overall fitness and team spirit. The results obtained were the same and there was statistically significant evidence that the rating of teams is dependent on the qualitative factors.

3.4 Spearman’s Rank Correlation Coefficient

It was decided to finally use Spearman’s Rank correlation to investigate 6 qualitative factors for the two predicted finalists-Delhi Capitals and Chennai Super Kings. In order to carry out the most effective research, two senior cricket experts and analysts Mr. Gaurav Kalra (Senior Editor at ESPN) and S. Ganesh (Sports consultant and ex-manager of Punjab Kings) graciously provided us with their opinionated scores based on these factors out of 10. The six factors included Fielding, Fitness, Popularity, Team Energy, Team Balance, Hitting Power, and Death-Over Bowling.

The formula for Spearman’s Rank Correlation is:

As ranks are repeated, the rank correlation formula with the repeated ranks is as follows:

Where is the rank difference and n is the total number of samples.

Table of Ranks for Both Teams:

                                                     Delhi Capitals             Chennai Super Kings

FactorExpert 1Expert 2Expert 1Expert 2
Fielding6776.5
Fitness4567
Popularity6688
Team Energy7688.5
Team Balance7865.5
Hitting Power86.578.5
Death Over Bowling7676.5

After the total calculation using the formula, the rank correlation between ratings given by both experts is 0.712 for Chennai Super Kings and 0.411 for Delhi Capitals. These values indicate that both the senior analysts have moderately similar views regarding the qualitative aspects of both the teams, justifying their ratings. It was  also observed that Chennai Super Kings was rated higher than Delhi Capitals by the two experts for most of the qualitative factors for the year 2021.

Based on the overall analysis, it can be predicted that Chennai Super Kings has the highest chance of winning the IPL 2021, having a high probability of winning more than 8 matches along with high ratings in their qualitative aspects-fielding, fitness, death-over bowling etc.

Conclusion

In accordance with the aim, the quantitative factors that have played a role in winning matches in the last three seasons have been obtained. These include total number of wickets taken per match, average of the top two opening bowlers (bowling average), top five run scorers average and number of batsmen with a strike rate of greater than 140 in each team. The qualifiers and finalists of the IPL 2021 have been predicted using the binomial distribution. The study predicts that three teams- Mumbai Indians, Chennai Super Kings, and Delhi Capitals have the highest probability (more than 90%) of making it to the qualifiers. Delhi Capitals and Chennai Super Kings are the two predicted finalists. Using the Chi-square test and Rank Correlation, a qualitative factor analysis was successfully carried out. This gave us the result that the ratings of IPL teams are dependent on qualitative factors. The Rank Correlation showed a moderate positive correlation between the ratings given to both the teams by both the experts. Since Chennai Super Kings was rated higher than Delhi Capitals by both the experts in their qualitative aspect, they have been predicted as the winners of the IPL 2021.

It must be noted that cricket has numerous limitations. Probability and data analysis can only make predictions about the game. The fact that we cannot fully predict anything in the game of cricket till the match is over is what makes it so exciting.

References

https://www.mathsjournal.com/pdf/2021/vol6issue4/PartA/6-4-1-292.pdf