團隊型運動運用機器學習預測比賽結果準確率之探討

緒論：預測比賽結果是觀賽族群對於運動最有興趣的一項參考指標，隨著科技的發展加上數據科學的風潮，使得各項運動逐漸重視數據的預測。而評估機器學習的模型也有眾多指標值得探討，本文以準確率當作主要依變項探討機器學習預測團隊型運動的比賽結果，方法：本文回顧1990～2022年預測團隊型運動的相關文獻，共篩選534篇，依序文獻內容篩選出最後15篇，將篩選的文獻摘要透過文字雲呈現，並且針對文獻結果進行分析。結果：結果顯示機器學習的模型隨著比賽屬性與特徵數量產生不同準確率、比分較低的運動並不容易有效預測，而高比分的運動則可運用較少的特徵值即可有效預測。依據文獻分析可知籃球使用較少特徵值也可具有高準確率，主要原因是用以預測的特徵常包含兩分命中率、三分命中率等皆與分數直接相關之特徵，因此也會提高整體預測準確率；若使用人工類神經網路與決策樹的模型，準確率會高於其他模型。結論：資料維度是影響準確率的一項因素，當資料維度夠大，準確率往往皆會提高。而本研究也發現特徵值的處理（feature engineering）也能提升整體準確率，然而當運動有更多不可控的因素不被列入時，也難以產生高準確率。在機器學習的發展下，需要有更多的實證研究來提升競技場上對於數據的認同，並且善用過去所累積的資料，將競賽所累積的資料發揮最大效益成為機器學習的訓練資料，以利最新模型的驗證與回饋。未來研究方向則可拓展至其他專項運動探討。

關鍵字

預測模型；命中率；運動屬性

並列摘要

Introduction: Predicting the outcomes of sports competitions has become an intriguing area of interest for sports enthusiasts. With technological advancements and the rise of data science, various sports have increasingly emphasized data-driven predictions. Evaluating machine learning models involves examining numerous metrics; in this study, we focus on accuracy as the primary dependent variable to assess machine learning's potential in predicting team sports outcomes. Methods: This paper reviews relevant literature on predicting team sports outcomes from 1990 to 2022, screening a total of 534 papers. Each paper was sequentially evaluated, resulting in the final selection of 15 papers. The abstracts of these selected papers were visualized using word clouds, and the results were analyzed. Results: Findings indicate that machine learning models exhibit varying prediction rates depending on the attributes and quantity of features in the competitions. Sports with lower scores are more challenging to predict accurately, while those with higher scores can achieve accurate predictions with fewer features. Overall, basketball shows higher prediction rates with fewer features, whereas soccer requires more features but does not yield significantly higher accuracy. Models such as artificial neural networks and decision trees demonstrated higher accuracy compared to other models. Conclusion: Data dimensionality is a crucial factor influencing prediction rates, with higher-dimensional data often leading to improved prediction accuracy. This study also observed that feature engineering enhances overall prediction accuracy. However, achieving high prediction accuracy remains challenging in sports with numerous uncontrollable factors that are not included in the models. With the development of machine learning, more empirical research is needed to increase the acceptance of data-driven approaches in sports. Leveraging accumulated historical data from competitions as training data for machine learning models can facilitate model validation and refinement. Future research could explore specialized sports and broaden the scope of investigation.