透過您的圖書館登入
IP:18.216.233.58
  • 學位論文

結合投打時間序特徵的投手球路預測研究

A Study of Pitch Prediction Using Temporal Pitching and Batting Features

指導教授 : 余執彰

摘要


隨著科技的日新月異,情報蒐集的方式也跟過去不同。過往需要耗費大量人力和時間才能完成的情蒐工作,在現今科技的幫助下,可以很有效率地完成。本文的研究是想獲得投手在面對當前打者和當下場上的情況時,傾向投哪一種球路這個情報。 本研究使用的資料是2015~2021年的美國職棒大聯盟的棒球資料。其中,我們不採用2020年的資料,因為是縮水賽季,聯盟中的球員會因為疫情影響到他的出席。另外,疫情也會影響球團對這個賽季的方針。在特徵上,我們將特徵分為三大類,分別是場上情況特徵、投手特徵以及打者特徵。實驗結果探討了我們選擇的特徵對模型學習是否有幫助。 我們使用機器學習以及深度學習的方式對棒球資料進行分析以及學習,最終輸出一個預測的投手球路。我們將球路的預測定為分類問題,在機器學習模型的選擇上,我們選擇在分類問題上表現很優秀的機器學習模型:極限梯度提升(eXtreme Gradient Boosting,XGBoost)。在深度學習模型的選擇上,第一個我們選擇深度神經網路(Deep Neural Network, DNN)。DNN在分類問題上的表現可以高過許多傳統機器模型,因此我們認為對本篇論文是有幫助的。第二個我們選擇長短期記憶(Long Short-Term Memory, LSTM)。此外,由於棒球資料是有序列的,我們認為資料中可能在時間軸上存在有意義的資訊,因此我們使用具有時間記憶力特性的LSTM模型。 在本研究中,我們構建了三個通用模型用以預測投手的下一個球路,以及參考了時間序的特徵來訓練我們的模型,實驗顯示,三個模型的平均預測準確率在速球或非速球的二元分類問題中可以達到66.33%,相比直接猜測的平均準確率63.5%來的更有預測力,而在多球種預測問題中的平均準確率能達到50.13%,相比直接猜測的平均準確率46.8%來的也更有預測力。

並列摘要


With the help of improved technology in recent years, the way of information acquisition is different from the past. In the past, the work of information acquisition required a lot of human labor. But now we can easily collect a huge amount of statistics with many mounted sensors. The purpose of this research is to obtain information on which pitches pitchers tend to throw according to the which batter he faced and on-field situations. The data used in this thesis is the baseball data of the Major League Baseball from 2015 to 2021. Among them, we don’t use the data of 2020, because it is a shortened season and the batters’ conditions are not the same as in other years. Players may be absence due to the COVID-19. And COVID-19 also affected the team’s policy for that season. This study classifies the collected features into three categories, namely on-field situation characteristics, pitcher characteristics and batter characteristics. We explores whether the features we choose benefit the model or not by lots of experiments. We use machine learning and deep learning models to predict pitchers’ next pitch. We treat the prediction of ball pitches as a classification problem. In terms of the selection of machine learning models, we choose the eXtreme Gradient Boosting (XGBoost) model because it performs very well in many classification problems. In the choice of deep learning model, the first one we choose Deep Neural Network (DNN). DNN can outperform many traditional machine models on many classification problems, so we think it is may helpful for our problem. The second one is Long Short-Term Memory (LSTM). Since baseball pitches are sequential shown, we thought that there might be meaningful information on the timeline in the data, we use LSTM model which have ability to remember information. This study hopes that the team's coaching staff can use the results of the ball path predicted by our model, so that the batters in the team can have more reference and more complete coping methods. This study combines time-series and time-invariant features to train the XGBoost and LSTM models. We devise three combinations and test the performance of the models. In terms of predicting whether the next pitch is a fastball or not, the average prediction accuracy of three models can reach 66.33%, which is better than the average accuracy of 63.5% for naive prediction. In the multi-classification problem, the average accuracy can reach 50.13%, which is also better than the average accuracy of 46.8% for naive prediction.

並列關鍵字

XGBoost DNN LSTM Pitch type predict Information acquisition

參考文獻


[1] 吳明倫,”用於棒球情蒐之運動資訊記錄與分析系統”, 國立臺北科技大學, June,2009.
[2] https://www.brooksbaseball.net/
[3] D. Jordan, “Measuring Baseball Defensive Value using Statcast Data,” M.S. thesis, Dept. Statistical Science, DU., North Carolina, United States of America, 2017.
[4] S. R. Bailey, J. Loeppky, and T. B. Swartz, “The prediction of batting averages in major league baseball,” Stats, vol. 3, no. 2, pp. 84-93, Mar. 2020.
[5] A. Jim, “Sabermetrics: The past, the present, and the future,” MAS, vol. 43, pp. 3-14, Mar. 2010, doi:10.5948/UPO9781614442004.002

延伸閱讀