透過您的圖書館登入
IP:18.221.236.224
  • 學位論文

對非職業棒球員之表現預測平台與訓練數據視覺化

Performance Prediction Platform and Training Data Visualization for Non-Professional Baseball Players

指導教授 : 余執彰

摘要


在過去幾年裡運動團隊的表現分析在研究與實驗中迅速增長,而近年來棒球體育 數據更是受到醫學以及科學等領域的關注,基於計算機科學方面更容易取得、處理與 分析數據應用更是廣泛。 本研究開發了一個紀錄球員訓練數據的平台,除了利用數種圖表將球員訓練數據 視覺化之外,也提出了一個預測球員打擊表現的機制。研究中分析了台灣北部某體育 大學棒球隊的訓練數據,蒐集多名球員打擊的相關數據(例如:擊球初速、擊球仰角、擊 球距離、擊球方向等等),並且預測球員的表現。由於我們蒐集的非職業球員數據量不 足以用來訓練預測模型,在研究中我們嘗試透過美國職業棒球大聯盟打者的打擊數據 運用分群法分出幾個相似的表現趨勢型態分類,利用同型態的球員擊球初速和擊球仰 角來做預測模型;預測下一年度的擊球初速與仰角表現,並且將之用於非職業球員的 打擊表現趨勢預測。利用大聯盟分群數據來彌補非職業球員數據量不夠而無法針對球 員個人訓練預測模型的問題。 本研究利用了 Pearson correlation coefficient 與 Spearman's rank correlation coefficient 兩種相關係數計算球員之間的數據相關程度,以階層式分群 (Hierarchical Clustering)與 DBSCAN 聚類方法對球員做分類,再以均方根誤差(root-mean-square error, RMSE)與對 稱性平均絕對百分比誤差(Symmetric Mean Absolute Percentage Error, SMAPE)作為比較 預測模型表現的依據,之後再對每一群的球員使用長短期記憶模型(Long Short-Term Memory, LSTM)與一維卷積神經網路(One-dimensional Convolutional Neural Networks, 1D CNN)模型預測球員表現。本研究測試了三種輸入與輸出組合,分別是一對一預測、二 對一預測及二對二預測。在多組實驗比較下,不論是預測初速或者仰角的數值趨勢, 一對一的 LSTM 模型都獲得最佳的預測效果。在單一輸入與單一輸出的 LSTM 模型預 測初速中,獲得了 2017~2019年測試集平均均方誤差(RMSE) = 1.468,SMAPE = 0.838% 的優異效果。 在數據可視化方面,本研究針對運動訓練中的重點加以著墨,例如透過在打擊練 習影片中加入骨架繪製讓打擊姿勢的轉動更加清楚;比原始影片更能清楚的檢視姿勢 是否正確,這不僅僅是可以提升打擊技巧對於預防運動損傷也很有效果。除了打擊姿 勢的重要性,打擊數據的進步與否也是球員與教練最為關心的。研究中透過搭配適當 的圖表(例如:長條圖、散點圖、折線圖與擊球落點圖等等)來呈現數據讓球員的表現狀 況隨時獲得掌控。透過數據表現分析不僅僅可以減少人工記憶判定的誤差,透過實際 數據的統計分析結合圖表與介面互動性,可以提升教練對於訓練方案的配置效率性; 更可以提高球員對於自我訓練結果的可視性。

並列摘要


Performance analysis of sports teams has grown rapidly in research and experiments. Due to baseball sports data is easier to obtain, process and analyze, it gets more attention from the fields of medicine and science in recent years. This study develops a platform that collects and analyzes players’training data. In addition to visualizing player training data using several charts , such as scatter charts, line charts, bar charts, and Hits Spray charts.We also propose a mechanism for predicting players’ batting performance. The training data of a baseball team of a sport university in northern Taiwan was analyzed, statistics, about the batting results of players (for example: exit velocity, launch angle, distance, direction, etc.) were collected and used as features to train the model. After that, the performance of players was predicted from the trained model. Since the amount of nonprofessional player data we collected was insufficient to train a robust model, in our study we adopted an alternative approach which used clustering methods to find batters with similar performance trend typologies from the batting data of Major League Baseball batters. The exit velocity and launch angle of professional players are used to make a prediction model. This model is used to predict the batting performance trend of non-professional players. With such design, we can obtain prediction results of non-professional athelets even without sufficient data. This study uses the Pearson correlation coefficient and Spearman's rank correlation coefficient to calculate the degree of data correlation between players, followed by the Hierarchical Clustering or DBSCAN clustering methods to group players. Player's performance is then predicted using a long short-term memory model (LSTM) or a one-dimensional convolutional neural network model (1D CNN) for each group of players. The root-meansquare error (RMSE) and the symmetric mean absolute percentage error (SMAPE) were used to compare the performance of the prediction models. This study tests three input/ouput combinations, which are one-to-one, two-to-one and two-to-two. In the comparison of multiple sets of experiments, whether it is to predict the numerical trend of the exit velocity or the launch angle, the one-to-one LSTM model has the best prediction results. In terms of predicting the exit velocity the one-to-one LSTM model has RMSE with 1.468 and SMAPE with 0.838% on the test set. In terms of data visualization, this study focuses on some core information in sports training, such as finding the body skeleton from the batting stances in videos. With such information it is easier to check whether the posture is correct or not. In the meawhile, this information can not only help improve batters’ hitting skills, but also prevent sports injuries. Despite of batting stances, the trend of hitting statistics is also a major concern for players and coaches. In the study, the statistics is presented with appropriate graphs (for example: bar charts, scatter charts, line charts, and Hits Spray charts, etc.) so that the player's performance can be tracked at any time. The provided platform can not only reduce the bias of human judgment, but also improve the efficiency of planning training programs by coaches.

參考文獻


[1] M. Lage, J. P. Ono, D. Cervone, J. Chiang, C. Dietrich and C. T. Silva, "StatCast Dashboard: Exploration of Spatiotemporal Baseball Data", in IEEE Computer Graphics and Applications, vol. 36, no. 5, pp. 28-37, Sept.-Oct. 2016, doi: 10.1109/MCG.2016.101.
[2] G. Healey, "The New Moneyball: How Ballpark Sensors are Changing Baseball", pp. 1999 - 2002, 2017
[3] "MLB 2017", [online] Available: https://www.postandcourier.com/sports/dangerous-trend-in-baseball-as-pitching-speeds-increase-so-do/article_22670354-b5cf-11e8-934a-3b63e3a10c4b.html
[4] M. Woodham, J. Hawkins, A. Singh and S. Chakraborty, "When to Pull Starting Pitchers in Major League Baseball? A Data Mining Approach", 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 2019, pp. 426-431, doi: 10.1109/ICMLA.2019.00080.
[5] L. Wang and L. Huang, "Analysis of The Causes and Prevention of Sports Injuries in School Physical Education and Training Based on Big Data Analysis", 2020 International Conference on Computers, Information Processing and Advanced Education (CIPAE), 2020, pp. 111-113, doi: 10.1109/CIPAE51077.2020.00037.

延伸閱讀