透過您的圖書館登入
IP:18.117.158.47
  • 學位論文

利用長短期記憶網絡之遺忘閘提取語音及文字流暢度特徵 用以改善自閉症孩童說故事自動辨識系統

Learning Lexical and Speech Coherence Representation by Using LSTM Forget Gate

指導教授 : 李祈均

摘要


根據眾多泛自閉症的研究證實,相較於典型孩童,自閉症小孩在口語能力上普遍較為遲緩,較無法流暢的用口語敘事已成為一個用來診斷自閉症兒童的重要指標。以往要衡量不論是語音或是文字的流暢度,都是依靠費時的人工標註,或是得由訓練過的專家設計出的特徵來當作指標。這篇論文提出一種直接資料導向的流暢度特徵學習方式,利用長短期記憶模型(LSTM)架構中的忘記閘(forget gate),在文字上,導出一種嵌入式含有語意流暢度概念的文字特徵,在語音上,導出一種嵌入式含有發聲流暢度概念的文字特徵,在文字上,用這種嶄新的流暢度特徵來區分正常小孩與自閉症小孩的任務上,能夠達到92%的準確率,而以語音流暢度特徵來區分被評為說話流暢與不流暢的小孩任務上,能夠達到75%的準確率。 對照傳統方法使用語法、語詞使用頻率、潛在語意模型分析(LSA)當作流暢度特徵的方法,準確率是73%,在準確率上有顯著的提升。 在這篇論文也進一步驗證提出的此一新特徵值所含的意義。藉由隨機打亂正常小孩敘述故事的流暢句子中的語詞順序以及句子順序,我們製造出這些不流暢的語句,而我們發現,透過我們的特徵擷取模型,擷取出的這些不流暢化句子的特徵值分布,會趨向用我們模型擷取出的自閉症小孩不流暢語句的特徵值分布。因此驗證,我們導出的這一新特徵值,含有流暢度的概念存在。

並列摘要


Since autistic children are less able to carry out a fluent story than typical children, measuring verbal fluency becomes an important indicator when diagnosing autistic children. Fluency assessment, however, needs time-consuming manual tagging, or using expert specially designed characteristics as indicators, therefore, this study proposes a coherence representation learned by directly data-driven architecture, using forget gate of long short-term memory model to export coherence representation from text and audio, at the same time, we also use the ADOS coding related to the evaluation of narration to test our proposed representation. Our proposed lexical coherence representation performs high accuracy of 92% on the task of identifying children with autism from typically development from text modality, and performs high accuracy of 83% on the task of identifying disfluent autistic children’s speech from relatively fluent speech. Comparing with the traditional measurement of text and audio, there is a significant improvement. This paper also further introduces incoherency into coherent samples by randomly shuffling the word order and sentence order on text and adding some pulse or repetitive signal on speech. These processes make the coherent children's story content become incoherent. By visualizing the data samples after dimension reduction, we further observe the distribution of these coherent, incoherent, and those artificially incoherent data samples. We found the artificially incoherent typical samples would move closer to incoherent autistic samples which prove that our proposed representation contains the concept of coherency.

參考文獻


[1] Shrikanth Narayanan and Panayiotis G Georgiou, “Behavioral signal processing: Deriving human behavioral informatics from speech and language,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
[2] Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, and Shrikanth Narayanan, “Signal processing and machine learning for mental health research and clinical applications [perspectives],” IEEE Signal Processing Magazine, vol. 34, no. 5, pp. 196–195, 2017.
[3] Wenbo Liu, Ming Li, and Li Yi, “Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework,” Autism Research, vol. 9, no. 8, pp. 888–898, 2016.
[4] Erik Marchi, Bj¨orn Schuller, Simon Baron-Cohen, Ofer Golan, Sven B¨olte, Prerna Arora, and Reinhold H¨ab- Umbach, “Typicality and emotion in the voice of children with autism spectrum condition: Evidence across three languages,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[5] Daniel Bone, Chi-Chun Lee, Matthew P Black, Marian E Williams, Sungbok Lee, Pat Levitt, and Shrikanth Narayanan, “The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody,” Journal of Speech, Language, and Hearing Research, vol. 57, no. 4, pp. 1162– 1177, 2014.

延伸閱讀