以ATP排名差距預測網球賽事勝負及勝負機率：三種預測方法與比較

預測一場網球單打賽事的勝負以及兩選手各自獲勝的機率，對網球愛好者來說是一件有趣的事，對運動彩券投注者與經營者也極為重要，因為兩選手各自獲勝的機率是設定賠率的重要參考。本文利用三種方法，以男子網球單打賽事對戰選手之職業網球協會（ATP）之官方排名差距，預測兩人勝負並推估各自的獲勝機率，這三種方法為邏輯斯迴歸法、二元有序機率單元迴歸法與多重有序機率單元迴歸法，其中第三種方法（多重有序機率單元迴歸法）係先利用排名差距預測賽事的盤數比，再由所預測的盤數比預判出賽事的勝方。本文利用ATP之2019年全部1992場巡迴賽事中的996場建構預測模型，並以其餘996場賽事的結果評估預測模型的準確性，研究結果顯示多重有序機率單元迴歸模型可以得出較好的預測結果。然而，從實際賽事資料可以發現，約略有1/3的賽事係由排名落後的選手獲勝，且排名愈接近這種現象愈頻繁，這使得上述三種模型的預測正確率約略為2/3。因此純粹以選手排名差進行預測有其侷限性，本文建議將其他排名計算方式納入考慮或使用選手的各項能力指標進行勝負預測，以提高預測準確率。

關鍵字

排名差距；邏輯斯迴歸；有序機率單元迴歸法；勝負機率；勝負預測

並列摘要

The prediction of the winner of a single tennis match and the estimation of a player's winning probability are interesting to tennis fans and are also important to sport lottery betters and bookmakers, since the betting odds for a tennis player are proportional to the reciprocal of her/his winning probability. In this presentation, we use three models to predict the winner as well as to estimate a player's winning probability of a men's single tennis match, using the difference of two players' ATP (Association of Tennis Professionals) official ranks as the predictor variable. The three models are logistic regression, ordered probit regression model with binary response and the ordered probit regression with multiple-level responses. The third method (the ordered probit regression with multiple-level responses) first predicts the real score in sets (say 0:2, 1:2, 2:1 and 2:0 for best of three matches) and then determines the winner of a tennis match based on the prediction. We split the 1992 match records of year 2019 ATP tournaments into two datasets of equal size 996, such that the first half of the records are used to estimate the above three models and the second is thus used to evaluate the accuracy of the three models. The results show the third model, namely the multilevel ordered probit model, has the best performance in predicting the winner. However, the historical match results show that roughly one third of the winners were the lower-ranked players than their opponents and such a counter-intuitive results occurred more often in practice when two players' ranks are closer, hindering prediction accuracies of the above three models considered. This suggests that the use of two players' rank difference alone has its limitations and that we need alternative mechanisms to rank the tennis players and that the above models should incorporate more (tennis skill based) explanatory variables to achieve more accurate predictions.

並列關鍵字

rank difference ； logistic regression ； ordered probit regression model ； winning probability ； winner prediction

參考文獻

施致平 (2001)。中華職籃觀眾參與之預測模式研究。體育學報，30，131-142。doi:10.6222/pej.0030.200103.3513

倪瑛蓮、施致平 (2010)。臺北市運動中心顧客參與預測模式分析。體育學報，43(3)，91-108。doi:10.6222/pej.4303.201009.1007

黃昱仁、蔡俊傑 (2011)。邏輯斯迴歸在體育統計的運用。中華體育季刊，25(3)，486-498。doi:10.6223/qcpe.2503.201109.2011

許伯陽、高俊雄 (2010)。台北市民眾運動參與行為之經濟決策。臺灣體育學術研究，48，79-96。doi:10.6590/TJSSR.2010.06.05

Angelini, G., Candila, V., & De Angelis, L. (2021). Weighted Elo rating for tennis match predictions. European Journal of Operational Research, 297(1), 120-132.doi:10.1016/j.ejor.2021.04.011

主題瀏覽