簡易檢索 / 詳目顯示

研究生: 陳珮寧
論文名稱: 查詢模型化於語音文件檢索之研究
A Study of Query Modeling for Spoken Document Retrieval
指導教授: 陳柏琳
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 61
中文關鍵詞: 語音文件檢索關聯性語言模型查詢模型化主題資訊非關聯性資訊
英文關鍵詞: Spoken document retrieval, relevance language model, query modeling, topic, non-relevance information
論文種類: 學術論文
相關次數: 點閱:52下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語音文件檢索(Spoken Document Retrieval)在語音處理研究領域一直是令人感興趣的研究題目。語音文件檢索的研究常面臨的問題可歸納成三大層面:(1)通常查詢(Query)傴是使用者資訊需求(Information Need)的一種用較含糊的表達方式,並不能完整代表使用者資訊需求所欲表達的語意;(2)在語音文件與使用者查詢中常會使用不同的詞彙來表相同的主題或概念(Topic or Concept);(3)語音文件經自動語音辨識(Automatic Speech Recognition, ASR)轉寫成文字時,常受限於語音辨識之正確率,而導致資訊檢索效能的降低。基於上述觀察,本論文提出許多查詢模型化(Query Modeling)改進方式,用以減輕語音文件檢索面臨的問題。未達此目的,吾人嘗詴探索關聯性語言模型(Relevance Language Model)於語音文件檢
    索之使用;同時, 吾人在此模型架構中融入了文件層次主題資訊(Topic
    Information)與查詢非相關資訊(Non-relevance Information),以期增進查詢模型化之效果。本論文的實驗是進行在國際廣泛使用的Topic Detection and Tracking(TDT)語料庫;實驗結果顯示吾人所提出之檢索方法,相較於一些現有檢索方法,能達到更好的檢索效能。

    Spoken document retrieval (SDR) has recently become a more interesting research avenue due to increasing volumes of publicly available multimedia associated with speech information. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there
    probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from
    representing the true theme of a spoken document. Many efforts have been devoted to developing elaborate indexing and modeling techniques for representing spoken documents, but few to improving query formulations for better representating the users‟ information needs. In view of this, we presented a novel language modeling framework exploring both lexical- and topic-based relevance formation for improving query effectiveness. We further explore various ways to glean both relevance and non-relevance information from the document collection so as to enhance the modeling of a given query in an unsupervised fashion. Experiments conducted on the TDT (Topic Detection and Tracking) SDR task demonstrate the perofrmance merits of the methods deduced from our retrieval framework deliver
    when compared to other existing retrieval methods.

    目錄 摘要 I Abstract II 誌謝 III 圖目錄 VII 表目錄 VIII 第一章 緒論 1 1.1 研究動機 1 1.2 統計式語言模型(Statistical Language Model ) 4 1.3 語音文件之檢索(Spoken Document Retrieval) 7 1.4 文件模型化(Document Modeling)與查詢模型化(Query Modeling) 10 1.5 論文貢獻 14 1.6 論文架構安排 15 第二章 文獻回顧 16 2.1 語言模型的種類與延伸 16 2.2相關聯性為基礎的語言模型(Relevance-based Language Model) 18 2.3 概念式語言模型(Concept Language Model) 20 2.4主題模型(Topic Models) 23 2.4.1 潛藏語意分析(Latent Semantic Analysis) 23 2.4.2 詞主題模型(Word Topic Model) 25 2.4.3機率式潛藏語意分析(Probabilistic Latent Semantic Analysis, PLSA) 26 2.5 非關聯性之負回饋(Non-relevance Feedback) 27 第三章 語言模型之改進 28 3.1 語言模型調適 28 3.2 查詢模型之改善 30 3.2.1 潛藏主題式關聯性模型(Topic Relevance Model) 30 3.2.2詞組關聯性模型(Pairwise-Word Relevance Model, PRM)與主題式詞組關聯性模型(TPRM) 32 3.2.3 負關聯性模型(Non-Relevance Feedback Model) 33 3.2.4 詞概念式關聯性語言模型(Word-level Conceptual Relevance Model) 35 第四章 實驗設置與結果 37 4.1 資訊檢索之實驗設定 37 4.1.1實驗資料 37 4.1.2評估方式 38 4.2 應用於語音文件檢索之實驗與結果 39 第五章 結論與未來展望 55 5.1 結論 55 5.2 未來展望 55 參考文獻 57

    [Baeza-Yates and Ribeiro-Neto, 2011] R. Baeza-Yates and B. Ribeiro-Neto. Modern
    Information Retrieval: The Concepts and Technology behind Search. ACM Press,
    2011.
    [Balog, Weerkamp and Rijke, 2008] K. Balog, W. Weerkamp, and M. de Rijke, “A
    few examples go a long way: Constructing query models from elaborate query
    formulations,” In Proc. SIGIR, pp. 371-378, 2008.
    [Berger and Lafferty, 1999] A. Berger and J. Lafferty, “Information retrieval as
    statistical translation,” In Proc. SIGIR, pp. 222–229, 1999.
    [Bilmes, 1997] Bilmes, J., “ A gentle tutorial on the EM algorithm and its
    application to parameter estimation for Gaussian mixture and hidden Markov
    models” (Tech. Report ICSI-TR-97-021). ICSI 1997
    [Blei and Lafferty, 2009] D. Blei and J. Lafferty, “Topic models,” In A. Srivastava
    and M. Sahami, (eds.), Text Mining: Theory and Applications. Taylor and
    Francis, 2009.
    [Blei, Ng and Jordan, 2003] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet
    allocation,” Journal of Machine Learning Research, 3: 993-1022, January 2003.
    [Cartright, Allan , Lavrenko and McGregor, 2010] Cartright M-A., Allan J., Lavrenko
    V. and McGregor, “A fast query expansion using approximations of relevance
    models,” In Proc. CIKM, 2010.
    [Chelba et al., 2008] C. Chelba, T. J. Hazen, and Saraclar, M., “Retrieval and
    browsing of spoken content,” IEEE Signal Processing Magazine, Vol. 25, No. 3,
    pp. 39–49, 2008.
    [Chen, 2009] B. Chen, “Word topic models for spoken document retrieval and
    transcription,” ACM Transactions on Asian Language Information Processing,
    58
    Vol. 8, No.1, pp. 2:1-2:27, 2009.
    [Chen, 2009] B. Chen, “Latent topic modeling of word co-occurrence information for
    spoken document retrieval,” In Proc. ICASSP, 2009.
    [Chen and Chen, 2010] K.Y. Chen, B. Chen, “A study of topic modeling techniques
    for spoken document retrieval,” In Proc. APSIPA, 2010.
    [Chen and Chen, 2011] K. Y. Chen and B. Chen, “Relevance language modeling for
    speech recognition,” In Proc. ICASSP, 2011.
    [Chen and Goodman, 1998] S. F. Chen and J. Goodman, “An empirical study of
    smoothing techniques for language modeling,” Technical Report TR-10-98,
    Computer Science Group, Harvard University, Aug. 1998.
    [Chen, Wang and Lee, 2001] B. Chen, H.-M. Wang and L.-S. Lee, “Improved spoken
    document retrieval by exploring extra acoustic and linguistic cues,” In
    Proceedings of the 7th European Conference on Speech Communication and
    Technology, 2001.
    [Chen, Chen and. Chen, 2011] P.-N. Chen, K.-Y. Chen, B. Chen, “Leveraging
    relevance cues for improved spoken document retrieval,” In Proc. Interspeech,
    2011.
    [Chen, Chen and. Chen, 2011] B. Chen, P.-N. Chen, K.-Y. Chen, “'QUERY
    MODELING FOR SPOKEN DOCUMENT RETRIEVAL”, In Proc. ASRU,
    2011.
    [Chiu and Chen, 2007] H.-S. Chiu and B. Chen, “Word topical mixture models for
    dynamic language model Adaptation,” In Proc. ICASSP, 2007.
    [Croft. and Ponte, 1998] W. B. Croft. and J. Ponte. “A language modeling approach to
    information retrieval,” In Proceedings of the ACM SIGIR. 1998. pp. 275–281,
    1998.
    [Furnas et al. 1988] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A.
    59
    Harshman, L. A. Streeter, and K. E. „Lochbaum. Information retrieval using a
    singular value decomposition model of latent semantic structure.‟ In SIGIR 1988
    [Gauvain and Lee, 1994]., Gauvain, J. L. and Lee, C.-H., Maximum a posteriori
    estimation for multivariate Gaussian mixture observations of Markov chains,
    IEEE Trans. Speech Audio Process. 2 (1994), 291–298.
    [Garofolo, Auzanne and Voorhees, 2000 ] J. Garofolo, G. Auzanne, and E. Voorhees,
    “The TREC spoken document retrieval track: A success story,” In Proceedings of
    the 9th TREC, National Institute of Standards and Technology (NIST), 2000.
    [HOFFMANN, 1999] T. HOFFMANN, 1999, “Probabilistic latent semantic indexing,”
    In Proc. SIGIR, pp. 50–57, 1999.
    [Lavrenko and Croft, 2001] V. Lavrenko and W.B. Croft, “Relevance-based language
    models,” In Proc. ACM SIGIR 2001.
    [Lee and Chen, 2005] L.-S. Lee and B. Chen, “Spoken document understanding and
    organization,” IEEE Signal Processing Magazine, Vol. 22, No. 5, pp. 42–60,
    2005.
    [Lin and Chen, 2009] S.-H. Lin and B. Chen, “Topic modeling for spoken document
    retrieval using word- and syllable-level information,” In Proc. SSCS, 2009.
    [Lin, Yeh and Chen, 2011] S.-H. Lin, Y.-M. Yeh and B. Chen, “Leveraging
    Kullback-Leibler divergence measures and information-rich cues for speech
    summarization,” IEEE Transactions on Audio, Speech and Language Processing,
    19(4), pp. 871–882, 2011.
    [Lu et al., 2010] Y. Lu et al., Investigating task performance of probabilistic topic
    models: an empirical study of PLSA and LDA. Information Retrieval, 2010.
    [Lu et al., 2010] Y. Lu, Q. Mei, and C.X. Zhai, “Investigating task performance of
    probabilistic topic models – an empirical study of PLSA and LDA,” Information
    Retrieval, pp. 1–26, 2010
    60
    [Lv and Zhai, 2009] Y. Lv and C. X. Zhai, “A comparative study of methods for
    estimating query language models with pseudo feedback,” In Proc. CIKM, 2009.
    [Lv. et al., 2011] Yuanhua Lv, C. X. Zhai and W. Chen, A Boosting Approach to
    Improving Pseudo-Relevance. Feedback. In Proc. SIGIR, 2011
    [Meij et al., 2008] E. Meij, W. Weerkamp, J. He, and M. de Rijke, “Incorporating
    non-relevance information in the estimation of query models,” In Proc. 7th
    TREC, 2008.
    [Meij et al., 2010] E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij, “Conceptual
    language models for domain-specific retrieval,” Information Processing &
    Management
    [Rabiner, 2003] L. Rabiner. “The power of speech,” Science, Vol. 301, pp. 1494–1495,
    2003.
    [Salton, 1968] G. Salton. Automatic information organization and retrieval. New
    York: McGraw-Hill, 1968
    [Salton and Buckley, 1988] G. Salton and C. Buckley, “Term-weighting approaches in
    automatic retrieval,” Information Processing and Management, Vol. 24, No. 5,
    512–523, 1988.
    [Salton and Buckley, 1990] G. Salton and C. Buckley, “Improving retrieval
    performance by relevance feedback,” Journal of the American Society for
    Information Science, Vol. 44, No. 4, pp. 288–297, 1990.
    [Wang et al., 2007] X. Wang, H. Fang, and C. Zhai. “Improve retrieval accuracy for
    difficult queries using negative feedback,” In Proceedings of the 16th CIKM
    2007.
    [Wang et al., 2008] X. Wang, H. Fang, and C. Zhai, “A study of methods for negative
    relevance feedback,” In Proc.SIGIR, 2008.
    61
    [Zhai, 2008] C.X. Zhai, Statistical Language Models for Information Retrieval
    (Synthesis Lectures Series on Human Language Technologies). Morgan &
    Claypool Publishers, 2008.
    [Zhai and Lafferty, 2001] C. Zhai and J. Lafferty. “Model-based feedback in the
    language modeling approach to information retrieval,” In Proc. CIKM, 2001.
    [Zhao and Yun, 2009] Zhao, J., Yun, Y. “A proximity language model for information
    retrieval,” In Proc. of SIGIR 2009.

    下載圖示
    QR CODE