透過您的圖書館登入
IP:3.134.103.74

摘要


This work presents an in-depth study on the classification of regional accents in Mandarin speech. Mel-Frequency Cepstral Coefficients (MFCC) with 13 features are used as the input in this work. The dataset used is generated by speakers from 96 cities, which covers 13 large dialect areas in China. Eight dialect areas which cover about 75% of the dataset are selected in this work. The work explores 1-Dimensional Convolutional Neural Network with Stochastic Gradient Descent as an optimizer to extract the acoustic feature and get relative high accuracy of 67.15%. It is an important step to reduce the character error rate of ASR models. Meanwhile, it is useful to narrow a criminal's location of living.

參考文獻


F. Weninger, Y. Sun, J. Park, D. Willett, and P. Zhan, “Deep Learning Based Mandarin Accent Identification for Accent Robust ASR,” Interspeech 2019, pp. 510–514, 2019.
X. Sui, H. Wang, and L. Wang, “A general framework for multi-accent Mandarin speech recognition using adaptive neural networks,” 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 118–122, 2014.
Zhang, S., Zhang, S., Huang, T., & Gao, W. (2017). Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia, 20(6), 1576-1590.
H, Wang, “Research on Multi-Accent Mandarin Speech Recognition Based on Neural Network,” Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, 2014.
W. Hong, “Corpus-based Regional Mandarin Recognition,” Chinese Journal of Forensic Sciences, vol. 1671-2072, no. 01, pp. 75–79, 2014.

延伸閱讀