基於人工智慧特徵處理技術之分析與研究

隨著網際網路、手機行動上網及社交軟體(如Facebook、Instagram等)的盛行，數據產生正在以前所未有的方式增加。然而，由於數據量龐大、格式多樣化、維度(變數)過多，對機器學習而言不利，過多的變量會妨礙模型找出預期之規律，而計算量較大、訓練時間長等問題，亦導致訓練後的結果產生不如預期的效果。因此在機器學習項目中，特徵處理是通常會先進行的前處理程序。本論文分析及比較現有的特徵處理相關技術，包括從原有的特徵建構新的特徵提取方法，如：主成分分析(PCA)與線性判斷分析(LDA)，以及保留原始數據之訊息且做出篩選的特徵選擇方法，如：過濾法(Filter)與包裝法(wrapper)，以期達到有效地利用特徵處理方法來實現高性能的學習算法。本論文所分析及整理的多種特徵處理方法，更能了解特徵處理之流程內容，且提供使用者清晰的參數設定與運作模式，進一步提升資料之可用性。

關鍵字

特徵處理；特徵提取；特徵選擇；主成分分析(PCA) ；線性判斷分析(LDA) ；皮爾森相關係數(PCC) ；向前選擇法(SFS) ；向後消去法(SBS)

並列摘要

With the popularity of Internet, mobile Internet and social software (such as Facebook, instagram, etc.), data generation is increasing in an unprecedented way. However, due to the large amount of data, the diversity of formats and the excessive number of dimensions (variables), it is disadvantageous for machine learning. Too many variables will hinder the model to find out the expected law, and the amount of calculation is large and the training time is long This paper analyzes and compares the existing feature processing technologies, including constructing new feature extraction methods from the original features, such as principal component analysis (PCA) and linear discriminant analysis (LDA), and preserving the original data In order to effectively use feature processing methods to achieve high-performance learning algorithm, we also make feature selection methods, such as filter and wrapper. The various feature processing methods analyzed and sorted out in this paper can better understand the process content of feature processing and provide users with clear parameter setting and operation mode, so as to further improve the availability of data.

並列關鍵字

Feature processing ； Feature extraction ； Feature selection ； Principal Component Analysis(PCA) ； Linear Discriminant Analysis(LDA) ； Pearson correlation coefficient(PCC) ； Sequential forward selection(SFS) ； Sequential backward selection(SBS)

參考文獻

[1]R. Ramachandran, G. Ravichandran and A. Raveendran, "Evaluation of dimensionality reduction techniques for big data," 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, March 2020, pp. 226-231

Google Scholar

[2]Sushma Niket Borade and R. P. Adgaonkar, "Comparative analysis of PCA and LDA," 2011 International Conference on Business, Engineering and Industrial Applications, Kuala Lumpur, Malaysia, June 2011, pp. 203-206

Google Scholar

[3]Jinghua Wang, Binglei Xie, Jiajie Xu and Haifen Chen, "A fast KPCA-based nonlinear feature extraction method," 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), Wuhan, Nov 2009, pp. 232-235

Google Scholar

[4]Aparna U.R. and S. Paul, "Feature selection and extraction in data mining," 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, Nov 2016, pp. 1-3

Google Scholar

[5]M. S. S. Sumi and A. Narayanan, "Improving classification accuracy using combined filter+wrapper feature selection technique," 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, Feb 2019, pp. 1-6

Google Scholar

國際替代計量

基於人工智慧特徵處理技術之分析與研究

全文下載

主題瀏覽