透過您的圖書館登入
IP:3.144.187.103
  • 學位論文

使用深層強化學習之互動式語音數位內容檢索

Interactive Spoken Content Retrieval with Deep Reinforcement Learning

指導教授 : 李琳山

摘要


本論文之主軸在探討語音數位內容之互動式檢索(Interactive Retrieval of Spoken Content)。近年來多媒體數位內容(Multimedia Content) 如線上課程、影音節目、會議錄音等大幅增加,語音數位內容( Spoken Content ) 之檢索也因而大受重視。本論文之目標放在互動式檢索; 由於語音或多媒體文件很難呈現在螢光幕上故瀏覽耗時,而過差的語音辨識率更可能使檢索結果不如人意,因此藉由系統與使用者互動使系統對使用者想找的資訊有更多瞭解,是一個有效改善此一問題的方法。 在本論文中,我們不但模仿前人用馬可夫決策模型(Markov Decision Process,MDP)來建立互動式檢索的模型,採用強化學習(Reinforcement Learning)演算法來學習出最佳系統決策, 更使用深層強化學習(Deep Reinforcement Learning)解決問題讓整個技術向前邁進一大步。實驗顯示,我們提出的方法確實能夠大幅改善檢索程序,幫助使用者更有效的找到所要找的資訊。

並列摘要


Interactive retrieval is important for spoken content. The reason is because when looking for text documents, one can easily scan through and select on a search engine result page, whereas similar privileges don not exist when searching for spoken content. Besides, it is hard for the users to find the desired spoken content when the search results are noisy, which usually happens due to the imperfect speech recognition components in spoken content retrieval. A way to counter the difficulties of spoken content retrieval is human-machine interaction that machine takes different actions to request additional information from the user to obtain better retrieval results. The most suitable actions depend on the situations, so in previous works, some hand-crafted states estimated from the current search results are used to determine the actions, but the hand-crafted states are not necessary the best indicator for choosing actions. In this paper, we applied the Deep-Q- Learning method in interactive retrieval of spoken content. Deep-Q- Learning sidesteps the estimation of the hand-crafted states and can directly determine the action based on retrieval results without any human knowledge. It reached discernible improvements compared with the hand-crafted states.

參考文獻


[2] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,” Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005.
[4] Jerome R Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech communication, vol. 42, no. 1, pp. 93–108, 2004.
[8] Hung-Yi Lee, Tsung-Hsien Wen, and Lin-Shan Lee, “Improved semantic retrieval of spoken content by language models enhanced with acoustic similarity graph,” in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 182–187.
[9] Hung-yi Lee and Lin-shan Lee, “Enhanced spoken term detection using support vector machines and weighted pseudo examples,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 6, pp. 1272–1284, 2013.
[14] Ian Ruthven and Mounia Lalmas, “A survey on the use of relevance feedback for information access systems,” The Knowledge Engineering Review, vol. 18, no. 02, pp. 95–145, 2003.

延伸閱讀