隨機性在集成學習模型中的角色

隨著資訊科技與時代俱進，機器學習逐漸成為其中相當重要的一部分，它也逐漸取代了以往需要消耗大量人力與時間來進行預測或分類的工作。然而機器學習模型只需要經過足夠的資料訓練就可以進行數值預測或資料分類的工作，不過有時候會因為目標資料分佈的因素使得模型的準確度不盡理想。現實環境中，有時候會因為在群體中每個人思考的出發點不同而提出各種不同的解決方法，使得問題更容易被解決。為了追求更高的準確性，這樣的思路也衍生出了新的機器學習方法『Ensemble Learning(集成學習 )』。集成學習是透過將多個弱學習器(Weak Learner)組合成強學習器(Strong Learner)來獲得更好的結果，這樣的方法與決策樹的想法有些類似，在決策樹方面也有更進階的方法『Random Forest』，這個方法結合多個決策樹模型並針對每個決策樹模型的訓練資料做隨機選取特徵與樣本隨機抽樣，使其成為一個強學習器。我們也使用線性迴歸模型與KNN模型實驗是否同樣能達到類似集成學習模型的效果。在本論文中，在建構集成學習模型會對參數進行隨機調整，並討論弱學習器中是否能夠隨機調整參數，對建構集成學習模型造成影響。

關鍵字

機器學習；集成學習；弱學習器；強學習器；決策樹模型； Random Forest ；線性迴歸模型

並列摘要

With the advance in information technology, machine learning has become a very important part of it, and it has gradually replaced the previous labor and time-consuming work of prediction or classification. However, a machine learning model can perform numerical prediction or data classification only after sufficient data training, but sometimes the accuracy of the model is not ideal because of the distribution of target data. In the real environment, sometimes different solutions will be put forward because of the different standing points of each person in the group, which makes the problem easier to be solved. In pursuit of higher accuracy, such thinking has spawned a new machine learning method called Ensemble Learning. Ensemble learning is to obtain better results by combining multiple Weak Learners to form a Strong Learner. Such a method is similar to the idea of the decision tree, and the more advanced method "Random Forest" in the decision tree family. This method combines multiple decision tree models and performs random feature selection and data sampling on the training data on each of the decision tree model to form a Strong Learner. We also use the linear regression model and KNN model to test whether they can also achieve similar effects to construct ensemble learning models. In this thesis, random adjustment on parameters are performed in the construction ensemble learning model, and we will focus on whether the randomization of parameters in the Weak Learners will affect the construction of the ensemble learning model.