針對資料缺失時的學習方法：只基於注意力機制且無缺失值插補之架構

在具有缺失值的資料上套用機器學習模型已成為許多實際應用中普遍面臨的挑戰。現有用來處理資料不完整情形的方法可能會遇到一些局限性和缺點，例如兩階段方法中的插補誤差可能會傳播擴散到下游模型進而對預測準確性產生負面影響，或由於在模型設計中的前提假設造成其缺乏彈性去適應混合類型特徵的資料等。在這篇碩士論文中，提出了一個可彈性調整的端對端下游標籤預測框架，該框架直接以存在缺失值的數據樣本作為輸入，從而避免插補錯誤對下游任務預測可能的不良影響，並透過特徵嵌入層的設計來同時支援連續和分類特徵。借助具有自注意力機制的轉換編碼器架構，該框架能夠通過捕捉特徵間的資訊與關聯性，來提升每個特徵（特別是缺失的特徵）的隱藏表徵品質。經由對性能變化的實證分析和消融研究，對於此學習框架提供了幾種有效的組件設置。最後，通過實驗與當前最先進的端對端解決方案和基於插補的方法相比，此篇論文所提出的框架達成了更好的下游標籤預測性能。

關鍵字

機器學習；資料探勘；資料缺失；特徵嵌入；自注意力機制

並列摘要

Machine learning from data with missing values has become a commonly faced challenge in many real-world applications. Existing approaches to dealing with data incompleteness may encounter some limitations and drawbacks, like possible negative affection to the downstream model caused by the propagation of imputation error in two-stage methods, lack of ability to accommodate data with mixed types of features due to some assumptions in the model design. This work proposes a flexible framework for end-to-end downstream label prediction that directly takes data samples in the presence of missing values as inputs, thus avoiding the possible bad affection to the downstream task prediction by the imputation error and supporting both continuous and categorical features with the feature embeddings layer. With the help of the transformer encoder architecture with the self-attention mechanism, the framework is enabled to improve the hidden representations of each feature (especially those missing features) by capturing inter-feature information and relationship. Several effective settings of framework components are provided through empirical analysis and ablation study on performance change. Experiments show promising results that the proposed framework achieves better downstream label prediction performance than the state-of-the-art end-to-end solution and imputation-based methods.

並列關鍵字

Machine Learning ； Data Mining ； Missing Data ； Feature Embedding ； Self-Attention Mechanism

參考文獻

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, volume 70, pages 214–223. PMLR, 2017.

Google Scholar

J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.

Google Scholar

L. Brieman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth Inc, 1984.

Google Scholar

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1–27, 2011.

Google Scholar

D. Dua and C. Graff. UCI machine learning repository, 2017.

Google Scholar

國際替代計量

針對資料缺失時的學習方法：只基於注意力機制且無缺失值插補之架構

未授權

主題瀏覽