透過您的圖書館登入
IP:3.149.214.32
  • 期刊
  • OpenAccess

透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統

Automating Behavior Coding for Distressed Couples Interactions Based on Stacked Sparse Autoencoder Framework using Speech-acoustic Features

摘要


在過去人類行為分析是透過傳統人為觀察方式來記錄。像婚姻治療方面,評分者利用觀看錄影的方式來對一整段夫妻對話中所展現的行為作評分。藉由這樣取得各種行為表達程度的量化,針對此量化分數來更進一步研究夫妻婚姻治療成效,但這種做法非常耗時且會因為評分者的各種主觀因素影響最後的準確性。如果能透過機器學習的方式來自動化處理辨識,將會節省非常多的人工時間和提升客觀性。深度學習(Deep Learning)在目前機器學習上是很熱門的話題。本論文提出以堆疊稀疏自編碼器(Stacked Sparse Autoencoder,SSAE)方式對聲音訊號特徵進行降維,並找出相對關鍵的高階特徵,最後再利用邏輯迴歸分析(Logistic Regression,LR)來辨識。此方法的整體準確率為75%(丈夫行為平均辨識準確率為74.9%、太太為75%)。相對於過去研究的74.1%(丈夫行為平均準確率75%,太太為73.2%) (Black et al., 2013),提升0.9%。我們提出的方法在使用更低維度的聲音特徵值中可有效的提升行為辨識準確率。

並列摘要


Traditional way of conducting analyses of human behaviors is through manual observation. For example in couple therapy studies, human raters observe sessions of interaction between distressed couples and manually annotate the behaviors of each spouse using established coding manuals. Clinicians then analyze these annotated behaviors to understand the effectiveness of treatment that each couple receives. However, this manual observation approach is very time consuming, and the subjective nature of the annotation process can result in unreliable annotation. Our work aims at using machine learning approach to automate this process, and by using signal processing technique, we can bring in quantitative evidence of human behavior. Deep learning is the current state-of-art machine learning technique. This paper proposes to use stacked sparse autoencoder (SSAE) to reduce the dimensionality of the acoustic-prosodic features used in order to identify the key higher-level features. Finally, we use logistic regression (LR) to perform classification on recognition of high and low rating of six different codes. The method achieves an overall accuracy of 75% over 6 codes (husband’s average accuracy of 74.9%, wife’s average accuracy of 75%), compared to the previously-published study of 74.1% (husband’s average accuracy of 75%, wife’s average accuracy of 73.2%) (Black et al., 2013), a total improvement of 0.9%. Our proposed method achieves a higher classification rate by using much fewer number of features (10 times less than the previous work (Black et al., 2013)).

參考文獻


(Heavey, C., Gill, D., & Christensen, A. (2002). Couples interaction rating system 2 (CIRS2)., University of California, Los Angeles. Los Angeles, CA, USA.).
(Jones, J., & Christensen, A. (1998). Couples interaction study: Social support interaction rating system. University of California, Los Angeles. Los Angeles, CA, USA.).
Andrew, G.,Gao, J.(2007).Scalable training of L 1-regularized log-linear Models.Proceedings of the 24th international conference on Machine learning.(Proceedings of the 24th international conference on Machine learning).
Black, M.,Katsamanis, A.,Baucom, B.,Lee, C.,Lammert, A.,Christensen, A.,Georgiou, P.,Narayanan, S.(2013).Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features.Speech Communication.55(1),1-21.
Burkhardt, F.,Polzehl, T.,Stegmann, J.,Metze, F.,Huber, R.(2009).Detecting real life anger.Proc. IEEE Int'l Conf. Acous., Speech, and Signal Processing.(Proc. IEEE Int'l Conf. Acous., Speech, and Signal Processing).

延伸閱讀