透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

提升不平衡輸入資料集下分類器之召回率

Boosting recall of data classifiers with imbalanced input datasets

指導教授 : 歐陽彥正
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


深度學習在電腦視覺和語音辨識的領域舉足輕重,越來越多深度學習的應用程式在各個領域獲得突破。為了獲得足夠良好的效果,具有良好特性的資料集最為關鍵。資料量龐大、素質整齊、雜訊頻率低、分佈平均等特性能夠幫助深度學習模型在學習問題上獲得足夠好的表現。但是,真實世界的資料會因為充斥雜訊而錯誤分類或是資料量巨大難以逐一確定標籤,而正確的標籤極其昂貴而耗時的,這造就了大量的數量不平衡的資料。隨著大數據時代的來臨,各類型的應用有多元的需求,而不僅僅是要求準確率,諸如召回率和精確度。本篇論文討論了不平衡資料集的特性,並探討了召回率和精確率是否能夠被目標函數控制,最後使用了交叉熵和相對熵提升召回率。最後分類器在序列型資料上可以提升5%的召回率並討論了為什麼在影像資料上沒有獲得同等的效果。

並列摘要


Deep Learning becomes important on speech recognition and computer vision, and it also has been deployed on many applications. Typically, considerable, good quality, clean and balanced datasets are necessary to have a deep learning model with good performance. However, labeling a great deal of data is expensive and time consuming, so large real world datasets are generally extremely imbalanced and noisy, and Researcher and users usually need various goals on statistics such as recall and precision rather than accuracy. I discuss the property of imbalance dataset and try to understand if recall and precision could be controlled by objective function. This study proposes using Cross entropy and Kullback–Leibler divergence to boost recall. The resulting classifier performance on sequential dataset could usually rise 5% recall, but failing on images classification and we also discuss it.

並列關鍵字

Deep Learning classifier Imbalanced Dataset

參考文獻


1.M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875,2017.
2.S . Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classificationwith deep convolutional neural networks. InNIPS, 2012.
3.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InCVPR, 2016.
4.Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks.arXiv preprint arXiv:1608.06993, 2016a.
5.H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recur-rent neural network architecturesfor large scale acoustic modeling,”inProc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH),2014, pp. 338–342. [Online]. Available: http://193.6.4.39/~czap/letoltes/IS14/IS2014/PDF/AUTHOR/IS141304.PDF

延伸閱讀