本研究以2020年至2022年河川監測水質歷史數據為依據,將數據依據檢測物質類型分為非金屬與金屬兩組資料,並應用決策樹、隨機森林、支援向量機、XGBoost及關聯分析等多元資料探勘技術,分析河川水質數據,評估不同模型在預測準確率上的優劣,並探討河川污染的主要影響因子,以提出改善水質的建議。研究結果顯示,XGBoost模型在兩組資料中均具有最高的預測準確率。在非金屬資料中,決策樹、隨機森林與XGBoost模型的分析結果指出,影響污染程度的主要因子包括懸浮固體、氨氮、生化需氧量及化學需氧量。金屬資料中,總磷為主要污染物質。透過關聯分析,發現非金屬物質的污染關鍵因子為溶氧(電極法)、導電度、大腸桿菌群、生化需氧量、溶氧飽和度、化學需氧量及氨氮;金屬物質則以總有機碳、汞及亞硝酸鹽氮為主。綜合分析結果顯示,氨氮與總磷為河川污染的主要指標,應加強工業區及畜牧業廢水排放的控管與宣導。同時,需關注懸浮固體、生化需氧量及化學需氧量濃度特別高的區域,以有效改善河川水質。
This study analyzes river water quality data from 2020 to 2022, categorizing it into non-metallic and metallic groups. Using decision trees, random forests, SVM, XGBoost, and association analysis, the study evaluates model accuracy and identifies key factors influencing pollution. XGBoost achieved the highest predictive accuracy across both groups. For non-metallic data, major pollutants include suspended solids, ammonia nitrogen, biochemical oxygen demand (BOD), and chemical oxygen demand (COD). For metallic data, total phosphorus is the primary pollutant. Association analysis highlights key contributors like dissolved oxygen, conductivity, E. coli, mercury, and nitrite nitrogen. The findings suggest ammonia nitrogen and total phosphorus as critical pollution indicators. Recommendations include stricter wastewater controls in industrial and livestock areas and focusing on regions with high levels of suspended solids, BOD, and COD to improve river water quality.