透過您的圖書館登入
IP:3.145.60.166
  • 學位論文

機器學習於物種數估計之應用

Species richness estimation by machine learning

指導教授 : 邱春火

摘要


準確地估計一個地區的物種數在生態統計的領域中一直是個挑戰,過去的文獻已經發展許多統計方法估計物種數,物種數估計方法可分為有母數估計與無母數估計。有母數估計方法一般假設物種相對豐富度來自特定機率分布,透過傳統的統計推論求得機率分布之參數,在真實的物種相對豐富度組成近似其假設的機率分布時,有母數方法的估計十分準確,不過當真實的物種相對豐富度組成與其假設的機率分布相差許多時,有母數方法則難以保證準確的估計。無母數估計方法則不需要假設物種相對豐富度組成的分布,應用於多樣的生態資料時也能穩健地估計,其中Chao1與Chao2估計式,以及應用摺刀法提出的一階摺刀與二階摺刀估計式,不過當群落的物種豐富度組成異質性增加,亦或是樣本數少時,無母數估計式低估的情形將不可被忽略。本文提出透過Chao1的物種數估計以及信賴區間估計,建構物種相對豐富度之母體的可能分布,並使用機器學習技法預測物種數,解決無母數估計式在小樣本時低估的狀況。本研究使用常見的四種機器學習技法:脊迴歸( ridge regression )、K最近鄰法( K nearest neighbors)、隨機森林( random forest )以及提升方法( boosting )預測物種數。透過模擬試驗選擇變數,並比較機器學習模型與Chao1和摺刀法的統計表現,模擬結果顯示在不同物種豐富度分布假設下,機器學習技法能夠改善無母數估計方法在小樣本時低估的狀況,同時也降低RMSE,而不同的機器學習模型其預測表現並沒有明顯地高低之別,因此依預測速率建議使用脊迴歸模型或隨機森林模型。最後分析臺灣耕地雜草之多樣性資料與巴伐利亞國家公園的甲蟲多樣性資料,比較機器學習模型與無母數估計式的統計表現。

並列摘要


Accurate estimation of richness is still a challenge in ecological statistics. In the past research, there has been several methods for richness estimation. The estimators of richness estimation could be ranged into two principal types: parametric estimators and nonparametric estimators. The assumption in parametric methods is that the relative abundance of richness followed given probability distribution. Then, derive the parameter of the probability distribution by traditional statistic reference. If the exact abundance of richness is similar with the assumption, the parametric estimator will be accurate. However, if the exact abundance of richness is far away from the assumption, the parametric estimator won’t be robust enough. In contrast, there is no assumption for the relative abundance of richness in the nonparametric estimator, which could be applied in various ecological data robustly. The most common nonparametric estimators are Chao1, Chao2 and the first and second order jackknife estimator. In the case of that the variance of richness abundance increases or that the sample size is small, the underestimation of the nonparametric estimators is not neglectable. The study applied the richness estimator and the confidence interval estimator of Chao1 to construct the possible composition of the relative abundance of richness, and predicted richness by machine learning techniques. The machine learning methods used in this investigation are ridge regression, K nearest neighbors, random forest and boosting. Regarding the simulation experiment, compare the statistic performance of nonparametric estimators with the machine learning methods. The results of simulation have demonstrated that the machine learning methods are better than the nonparametric estimators under small sample size. There is no difference of predicted performance between different machine learning techniques. Hence, the faster algorithms, ridge regression and random forest, are recommended. In the last chapter analyzed the diversity data of the cultivated weed in Taiwan and beetle in Bavarian forest national park, and compared the performance of different estimation methods.

參考文獻


臺灣大學農藝系(1968)臺灣耕地之雜草 Vol .1。 國立臺灣大學農藝系。
Breiman, L. 2001. Random forests. Machine learning 45:5-32.
Bulmer, M. 1974. On fitting the Poisson lognormal distribution to species-abundance data. Biometrics:101-110.
Chiu, C. H., Y. T. Wang, B. A. Walther, and A. Chao. 2014. An improved nonparametric lower bound of species richness via a modified good–turing frequency formula. Biometrics 70:671-682.
Fisher, R. A., A. S. Corbet, and C. B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12:42-58.

延伸閱讀