This is an academic research by apply R statistics analysis to an agency A of an existing betting consultancy firm A. According to the Dixon and Pope (2004)1 Kindly refer to 24th paper in Reference for industry knowdelege and academic research portion for the paper. in 7.4 References, due to business confidential and privacy I am also using agency A and firm A in this paper. The purpose of the anaysis is measure the staking model of the firm A. For more sample which using R for Soccer Betting see http://rpubs.com/englianhu. Here is the references of rmarkdown and An Introduction to R Markdown. You are welcome to read the Tony Hirst (2014)2 Kindly refer to 1st paper in Reference for technical research on programming and coding portion for the paper. in 7.4 References if you are getting interest to write a data analysis on Sports-book.
Before we start modelling, we look at the summary of investment return rates.
table 4.1.1 : 5 x 5 : Return of annually investment summary table.3 Kindly refer to the list of colors via Dark yellow with hexadecimal color code #9B870C for plot the stylist table.
\[\Re = \sum_{i=1}^{n}\rho_{i}^{EM}/\sum_{i=1}^{n}\rho_{i}^{BK} \cdots equation 4.1.1\]
\(\Re\) is the return rates of investment. The \(\rho_i^{EM}\) is the estimated probabilities which is the calculated by firm A from match 1,2… until \(n\) matches while \(\rho_{i}^{BK}\) is the net/pure probability (real odds) offer by bookmakers after we fit the equation 4.1.2 into equation 4.1.1.
\[\rho_i = P_i^{Lay} / (P_i^{Back} + P_i^{Lay}) \cdots equation 4.1.2\]
\(P_i^{Back}\) and \(P_i^{Lay}\) is the backed and layed fair price offer by bookmakers.
We can simply apply equation above to get the value \(\Re\). From the table above we know that the EMPrice calculated by firm A invested at a threshold edge (price greater) 1.0769894, 1.1072203, 1.0781056, 1.1148426, 1.0671108 than the prices offer by bookmakers. There are some description about \(\Re\) on Dixon and Coles (1996)4 Kindly refer to 25th paper in Reference for industry knowdelege and academic research portion for the paper. under 7.4 References. The optimal value of \(\rho_{i}\) (rEMProbB
) will be calculated based on bootstrapping/resampling method in section 4.3 Kelly Ⓜodel.
table 4.1.2 : 48640 x 45 : Odds price and probabilities sample table.
Above table list a part of sample odds prices and probabilities of soccer match \(i\) while \(n\) indicates the number of soccer matches. We can know the values rEMProbB
, netProbB
and so forth.
graph 4.1.1 : A sample graph about the relationship between the investmental probabilities -vs- bookmakers’ probabilities.
Graph above shows the probabilities calculated by firm A to back against real probabilities offered by bookmakers over 48640 soccer matches.
Now we look at the result of the soccer matches.
table 4.1.3 : 7 x 8 : Summary of betting results.
The table above summarize the stakes and return on soccer matches result. Well, below table list the handicaps placed by firm A on agency A. I list the handicap prior to test the coefficient according to the handicap in next section 4.2 Linear Ⓜodel.
table 4.1.4 : 6 x 8 : The handicap in sample data.
From our understanding of staking, the covariates we need to consider should be only odds price since the handicap’s covariate has settled according to different handicap of EMOdds.
Again, I don’t pretend to know the correct Ⓜodel, here I simply apply linear model to retrieve the value of EMOdds derived from stakes. The purpose of measure the edge overcame bookmakers’ vigorish is to know the levarage of the staking activities onto 1 unit edge of odds price by firm A to agency A. By refer to figure 4.4.1, I includes the models which split the pre-match and in-play ito comparison.
When I used to work in 188Bet and Singbet as well as AS3388, we know from the experience which is the odds price of favorite team win will be the standard reference and the draw odds will adjust a little bit while the underdog team will be ignore.
Steven Xu (2013)5 Kindly refer to 16th paper in Reference for industry knowdelege and academic research portion for the paper. has do a case study on the comparison of the efficiency of opening and closing price of NFL and College American Football Leagues and get to know the closing price is more efficient and accurate compare to opening price nowadays compare to years 1980~1990. It might be due to multi-million dollars of stakes from informed traders or smart punters to tune up the closing price to be likelihood.
In order to test the empirical clichés, I used to conduct a research thoroughly through ®γσ, Eng Lian Hu (2016)6 Kindly refer to 3rd paper in Reference for industry knowdelege and academic research portion for the paper. under 7.4 References, I completed the research on year 2010 but write the thesis in year 2016. and concludes that the opening price of Asian Handicap and also Goal Lines of 29 bookmakers are efficient than mine. However in my later ®γσ, Eng Lian Hu (2014)7 Kindly refer to 4th paper in Reference for industry knowdelege and academic research portion for the paper. under 7.4 References applied Kelly staking model where made a return of more than 30% per sesson. Meanwhile, the Dixon and Coles (1996) and Crowder, Dixon, Ledford and Robinson (2001)8 Kindly refer to 27th paper in Reference for industry knowdelege and academic research portion for the paper. under 7.4 References has built two models which compare the accuracy of home win, draw and away win. From a normal Poison model reported the home win is more accurate and therefore an add-hoc inflated parameter required in order to increase the accuracy of prediction. You are feel free to learn about the Dixon and Coles (1996) in section 4.4 Poisson Ⓜodel.
Based on table 2.2.1 we know about the net bookies probabilities and EM probabilities, here I simply apply linear regression model9 You can learn from Linear Regression in R (R Tutorial 5.1 to 5.11). You can also refer to Getting Started with Mixed Effect Models in R, A very basic tutorial for performing linear mixed effects analyses and Fitting Linear Mixed-Effects Models using lme4. Otherwise you can read Linear Models with R and somemore details about regression models via Extending the Linear Model with R : Generalized Linear, Mixed Effects and Nonparametric Regression Models. Besides, What statistical analysis should I use? summarise a table for test analysis and data validation. Fit models to data provides examples for application of linear regression and model selection, the main model-fitting commands covered lm (linear models for fixed effects), lme (linear models for mixed effects), glm (generalized linear models), nls (nonlinear least squares), gam (generalized additive models) and also visreg (to visualize model fits). The answer from How to use R anova() results to select best model? eleborates the use of ANOVA and AIC criterion to choose the best fit model. How to Choose the Best Regression Model describes how to find the best regresion model to fit and applicable to the real world. ANOVA - Model Selection summarised a lecture notes in slideshow while Model Selection in R conducts a research on model selection for non-nested linear and polynomial models. and also anova to compare among the models.