透過您的圖書館登入
IP:44.200.179.138
  • 學位論文

語音文件檢索的進一步研究:基於次詞成分的技術及使用者與系統的互動

Improved Approaches of Spoken Document Retrieval – Subword-based Techniques and User/System Interaction

指導教授 : 李琳山

摘要


本論文主要分兩大部份,第一部份提出兩種以次詞成分為基礎的索引結構,S-PSPL 和S-CN;第二部份提出一種新的對話結構以利語音文件檢索。 S-PSPL是一種能有效率紀錄詞圖中所有次詞成分的事後機率和位置資訊的資料結構,在文中我們先提出一種簡單的估計法則能快速推估詞圖中所有次詞成分的事後機率(SPP),利用此SPP 及我們所推導的動態規劃演算法,我們便可順利建構出S-PSPL的完整結構。由於SPP是由估計而來,在3.3中我們利用S-CN檢驗其效果,發現所推估的SPP在S-CN的架構中確實能有效降低次詞成分的辨識錯誤,於此我們判斷所提出的SPP推估方式是合理的,此外,由於和S-PSPL有相當程度的相似性,用來驗證的S-CN也能成為索引資料結構的一種,在第四章及第五章中我們進行一系列嚴謹的討論分析及實驗比較PSPL、CN、S-PSPL、S-CN,並得到S-PSPL是較佳的索引結構的結論。第六章中我們提出一種可能的方式來進一步提高S-PSPL 與S-CN的效能。 在第二部分,我們探討使用者與系統間的互動。由於語音文件中特殊的不確定性再加上使用者查詢語句所不可避免的混淆性,我們認為要提高檢索效率除了改善索引結構本身之外,也一定要考慮使用者的角色,讓使用者加入系統流程,界定他所要求的目標資訊,可視為一個標準的對話流程。但是我們並不能套用既有的對話系統架構,因為既有的架構中都仰賴一個有完善整理過的資料庫系統作為系統的知識庫,這對語音文件檢索而言是難以做到的。在文中所提出的新架構中,有三個重要的元件:語音為主的資料擷取,多形的使用者介面,及對話模型。語音為主的資料擷取採用的就是第一部分中所討論的索引結構。多形的使用者介面中需要多元的知識抽取技術,對話模型中也牽涉到使用者模型及訓練語料不足的問題,這些在文中都有充分的討論。最後透過實驗我們可以發現經由所建構的對話架構,確實能有效率的幫助使用者更準確更快速的獲得他想獲得的資訊。

並列摘要


This thesis consists of two parts. In the first part, we propose two new subword-based approaches for Spoken Document Retrieval (SDR), including Subword-based Position Specific Posterior Lattices (S-PSPL) and Subword-based Confusion Network (S-CN). These approaches are motivated by the PSPL and CN, respectively, but based on subword units instead of words. We introduce S-PSPL first. In the S-PSPL approach we encode the posterior probabilities and proximity information of subword units in a word lattice. A critical issue in S-PSPL is to calculate the subword posterior probabilities (SPP) in a word lattice, which can not be carried out directly by simple dynamic programming. We make solve the problem by a simple approximation. To verify that this subword posterior probability (SPP) approximation procedure is accurate enough, we bring Subword-based Confusion Network (S-CN) onto stage. As the original goal of Confusion Network (CN) is to construct a decoding structure to meet the minimum word error rate criterion, S-CN can be used for minimum subword error rate. We embed the SPP approximation in the S-CN structure and achieved significant improvement in subword error rate reduction. This implicitly verifies the feasibility of the SPP approximation. Moreover, though introduced as a decoding structure, S-CN can be used as an efficient and compact indexing structure. This is the second subword-based approach for SDR proposed in this thesis. Extensive evaluations are then made on S-PSPL and S-CN to verify their superiorities. Further discussion and analysis are also given to compare the two very similar data structures PSPL/S-PSPL and CN/S-CN. In the evaluation and analysis S-PSPL is proved to be very attractive and even better than S-CN since it requires less or fairly equal resources while offers better accuracies under most circumstances. There are some possibilities to improve S-PSPL/S-CN system. In the thesis we propose an algorithm, Lexicon Adaptation with Reduced Character Errors (LARCE), to adapt the lexicon in the LVCSR system to improve the character recognition accuracy. In the evaluation, LARCE gives significant improvements in terms of character accuracy. It can be expected that with the improved subword recognition, S-PSPL/S-CN can be improved respectively. In the second part, we present a formulation and a framework for a new type of dialogue systems, referred to as the extit{type-II dialogue systems}, which evolves from the SDR systems but with a whole new definition and formulation. extit{Type-II dialogue systems} are proposed for the difficulties which can not be solved by traditional SDR systems. The new definition and formulation emphasize the interactions between the user and the system and this carries the term extit{dialogue systems}. However, it is significantly different from the conventional spoken dialogue systems and this is why we refer to it as extit{type-II}. The distinct feature of such dialogue systems is their tasks of information access from unstructured knowledge sources, or the lack of a well-organized back-end database offering the information for the user. Typical example tasks of this type of dialogue systems include information retrieval/browsing and question answering. The functionalities of each module in such extit{type-II dialogue systems} are analyzed, presented, and compared with the respective modules in extit{type-I dialogue systems}. A series of novel technologies helpful in constructing extit{type-II dialogue systems} are then proposed in the thesis. In addition to the new SDR technologies already presented in part one, Named Entity Recognition (NER) from text and spoken documents, topic hierarchy construction for spoken documents, and dialogue modelling for information access are discussed here. For the NER, two novel approaches are proposed for text and spoken documents, respectively. For text documents we introduce to use global information in addition to local information (internal and external information) widely used in the NER community. For spoken documents, we propose to utilize the relevant documents retrieved from internet to augment the new NEs into the recognized lattice to compensate for the defects of the ASR system since many NEs are Out-of-Vocabulary words (OOVs). For the topic hierarchy construction, a novel approach HAC+P proposed recently cite{ChuangTOIS05} is used. We use the NEs extracted from the spoken documents to construct the balanced tree structures by HAC+P, to be used as a convenient system output for user interaction. For the dialogue modelling, a Markov Decision Process (MDP) based method is proposed to learn the best path to guide the user during the retrieval process. In many cases, the user's initial query leads to too many retrieval results and the way for the system to guide the user is through the query expansion to specify user's information need more clearly. In the proposed approach, the system learns to predict the user's information need so as to be able to recommend the most discriminative and informative terms for query expansion with an MDP-based method. There is still a long way to go in the research and development of SDR technologies. It is hoped that the works in this thesis will be helpful in this research topic.

參考文獻


[2]C.Chelba,J.Silva,and A.Acero,“Soft indexing of speech content for searchin spoken documents computer speech and language,”Computer Speech and Language,vol.21,no.3,pp.458–478,July2007.
[1]S.-L. Chuang and L.-F. Chien ,“Taxonomy generation for text segments: Apractical web-based approach,” ACM Trans. Inf .Syst., vol.23,no.4,pp.363–396,2005.
[3]Z.-Y.Zhou,P.Yu,C.Chelba,andF.Seide,“Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures,”in HLT,2006,pp.415–422.
[4]J.Mamou,D.Carmel,andR.Hoory,“Spoken document retrieval from call-center conversations,”in SIGIR,2006,pp.51–58.
[5]P.Yu,K.Chen,L.Lu,and F.Seide,“Searching the audio notebook: Keyword search in recorded conversations,”inHLT,2005,pp.947–954.

延伸閱讀