透過您的圖書館登入
IP:3.144.33.41
  • 期刊

Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation

摘要


When learning a dependence from data, to avoid overfitting, it is important to divide the data into the training set and the testing set. We first train our model on the training set, and then we use the data from the testing set to gauge the accuracy of the resulting model. Empirical studies show that the best results are obtained if we use 20-30% of the data for testing, and the remaining 70-80% of the data for training. In this paper, we provide a possible explanation for this empirical result.

延伸閱讀