Applying Data Mining Approach to Variable Selection for Maxent: Taiwan Hemlock Case Study

指導教授 : 邱祈榮


由於人為活動增加過量的溫室氣體,導致氣候變遷下環境也可能發生改變,在未知的變動下植群社會要如何面對氣候變遷所造成的衝擊,了解植物社會是如何適應自然環境,將是首要的任務。近年來物種分布模式(Species Distribution Models, SDM)被廣泛的使用在了解物種與環境之間的關係,並且應用在生物多樣性保育與經營上。本研究的目標物種為台灣鐵杉(Tsuga chinensis var. formosana Li and Keng)出現樣點,以16個環境因子(包括大尺度的氣候因子與中尺度的地形因子)為Maxent物種分布預測模式的輸入,並測試三種不同的輸入各是如何影響預測模式的表現:(1)以種成分分析法(principal component analysis, PCA)與分類樹(classification and regression tree, CART)和條件推論樹(conditional inference tree, CIT)分析種環境因子與台灣鐵杉的關係當作預測模式環境因子選擇的依據,(2)比較所有台灣鐵杉出現的樣點數與以矩陣群團分析法分類之台灣鐵杉次植群型單位的樣點數,(3)不同的環境因子解析度。並分析植群與優勢物種分布和環境因子對模式的貢獻程度,進一步以Maxent物種分布模式預測出機率分布圖,預測之結果以受試者工作特徵曲線面積(AUC)值來評估台灣鐵杉植群型分布模式的準確性。應用2種合併模式的方法結合機率模式的結果與門檻值的篩選產生台灣鐵杉的潛在植群圖(potential vegetation map)並以誤差矩陣(confusion matrix)來評估潛在植群圖的準確性。植群分析結果產生四群台灣鐵杉次植群型,環境分析和模式預測結果顯示影響台灣鐵杉的空間分布為主要的環境因子為海拔,次之為雨量,都屬於氣候因子;地形因子及對預測模式沒有主要的貢獻,但是仍然使預測模式更加精確。樣點數較小較且均質的植群型單位模式有著比樣點數較多的物種單位模式還高的模式預測能力。本研究中環境圖層的解析度對模式的預測能力沒有特別顯著的影響,預測的區域因為受到樣本數跟著改變的影響來無法突顯預測範圍的大小是否影響模式的表現,潛在植群圖的合成有助於應用的決策和考量,使得物種分布模式的應用更具有彈性。最後預測植群圖的可適用性能需要進一步的實驗預測的環境條件是否真的是和目標物種的生存來加以支持預測物種的空間分布。


To know the adaptation of plant society under climate change impacts is based on knowledge of the potential distribution of vegetation distributions. Vegetation is a society of plant species. Applying combination of species distribution models (SDMs) results to establish potential vegetation maps (PVMs) need determination strategies. This article firstly analyzes the relationship between Taiwan Hemlock (Tsuga chinensis var. formosana Li and Keng) and 16 topographical and climatic variables and then to generate a probability map by Maxent to test how 3 different situations of model input affects the model performance: (i) selection and analysis of suitable environmental variables by principal component analysis (PCA), classification and regression tree (CART) and conditional inference tree (CIT) method, (ii) sample size and homogeneity of species and vegetation sub-unit occurrence data (iii) resolution for environmental layers. Model evaluated by area under receiver-operating characteristic (ROC) curve (AUC) and Kappa statistic. 2 model combination approaches is also applied in this study to aid to generate the potential vegetation map (PVM) of Taiwan Hemlock. PVM is evaluated by error matrix and its derived indices. The result of vegetation analysis by cluster analysis classified Taiwan Hemlock into 4 sub-unit vegetation type. The result of environmental analysis and modeling revealed that the environmental variable that is affecting spatial distribution of Taiwan Hemlock most is majorly elevation gradient and the secondary is precipitation and both are climatic variables. Topographical showed minor contribution to the model. Sample size test showed more accurately when input the smaller size and more homogeneous samples. Resolution of environmental layers showed no sigibificant effect on model performance in this case. Overlaying Taiwan Hemlock vegetation sub-unit probability maps with 2 deterministic combination approaches synthesizes a potential vegetation map of Taiwan Hemlock. Modification of strategy for predicting PVMs is according to local ecological theory and further study on testing the potential ability from the environmental variable is really suitable for the target species.


Tsuga chinensis CART CIT Maxent AUC confusion matrix PVM


