Title

語音查詢檢索語音文件之初步研究

Translated Titles

A Study for Spoken Document Retrieval by Spoken Queries

DOI

10.6342/NTU.2010.01954

Authors

杞俊賢

Key Words

語音查詢 ; 語音文件檢索 ; 潛藏語意分析 ; Spoken query ; Spoken document retrieval ; Probability Latent Semantic Analysis

PublicationName

臺灣大學資訊工程學研究所學位論文

Volume or Term/Year and Month of Publication

2010年

Academic Degree Category

碩士

Advisor

李琳山

Content Language

繁體中文

Chinese Abstract

在這個資訊爆炸的時代,如何在資料中尋找使用者需要的文件已經 成為一個重要的課題, 使用語音作為查詢的工作是一個自然且方便 的方法。 本論文的重點在於使用語音查詢進行語音文件檢索。和文字文 件不同的是,語音文件檢索尚需面對使用者在不同環境、或者是語 者說話型態等,而可能會產生的高辨識錯誤率。因此將使用混淆網 路為語音查詢索引,接著將欲查詢的語音文件使用混淆網路及位置 特定事後機率詞圖為索引,其中使用次詞為單位以處理詞典外詞彙 的問題,在查詢和文件都保留語音資訊時再加以檢索。 由於在查詢時常出現查詢詞與文件不匹配的情形,本論文最後 使用潛藏語意分析模型檢索與查詢詞不匹配但是字面上意義相關的 文件。

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學研究所
Reference
  1. document retrieval track: A success story,” in in Text Retrieval Conference (TREC)
    連結:
  2. [4] Jonathan Mamou, David Carmel, and Ron Hoory, “Spoken document retrieval from
    連結:
  3. ACM SIGIR conference on Research and development in information retrieval,
    連結:
  4. [6] Ciprian Chelba and Alex Acero, “Position specific posterior lattices for indexing
    連結:
  5. on Speech and Natural Language, Morristown, NJ, USA, 1989, pp. 199–202, Association
    連結:
  6. for Computational Linguistics.
    連結:
  7. [8] Matthew A. Siegler, “Integration of continuous speech recognition and information
    連結:
  8. retrieval for mutually optimal performance,” Tech. Rep., Computer Science Department, Carnegie Mellon University, 1999, http://www.cs.cmu.edu/
    連結:
  9. [10] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in
    連結:
  10. [11] Jen-Wei Kuo Hsin-Min Wang, Berlin Chen and Shih-Sian Cheng, “MATBN: A
    連結:
  11. mandarin chinese broadcast news corpus,” in International Journal of Computational
    連結:
  12. Linguistics and Chinese Language Processing, 2005.
    連結:
  13. [14] SRI Speech Technology and Research Laboratory, “SRILM,” http://www.
    連結:
  14. [15] Slava M. Katz, “Estimation of probabilities from sparse data for the language model
    連結:
  15. Signal Processing, 1987, pp. 400–401.
    連結:
  16. [16] Thomas Hofmann, “Probabilistic latent semantic analysis,” in Proc. of Uncertainty
    連結:
  17. in Artificial Intelligence, UAI’99, 1999, pp. 289–296.
    連結:
  18. and Richard A. Harshman, “Indexing by latent semantic analysis,” Journal of the
    連結:
  19. American Society of Information Science, vol. 41, no. 6, pp. 391–407, 1990.
    連結:
  20. [18] Lawrence Saul and Fernando Pereira, “Aggregate and mixed-order markov models
    連結:
  21. for statistical language processing,” 1997.
    連結:
  22. [20] 張弘霖, 基於位置特定事後機率詞圖及潛藏語意分析之語音文件檢索, 碩士論
    連結:
  23. to query-by-example spoken document retrieval,” in SIGIR ’08: Proceedings of the
    連結:
  24. information retrieval, New York, NY, USA, 2008, pp. 363–370, ACM.
    連結:
  25. [1] MIT, “Project oxygen,” 1995, http://oxygen.lcs.mit.edu/.
  26. [2] Text REtrieval Conference, ,” http://trec.nist.gov/.
  27. [3] John S. Garofolo, Cedric G. P. Auzanne, and Ellen M. Voorhees, “The trec spoken
  28. 8, 2000, pp. 16–19.
  29. call-center conversations,” in SIGIR ’06: Proceedings of the 29th annual international
  30. New York, NY, USA, 2006, pp. 51–58, ACM.
  31. [5] L. Mangu, E. Brill, and A. Stolcke, “Finding consensus in speech recognition: Word
  32. error minimization and other applications of confusion networks,” Computer Speech
  33. and Language, vol. 14, pp. 373, 2000.
  34. speech,” in Proceedings of ACL, Ann Arbor, 2005, pp. 443–450.
  35. [7] Yen-Lu Chow and Richard Schwartz, “The N-best algorithm: an efficient procedure
  36. for finding top N sentence hypotheses,” in HLT ’89: Proceedings of the workshop
  37. ˜msiegler/publish/PhD/thesis.ps.gz.
  38. [9] Frank Seide, Peng Yu, Chengyuan Ma, and Eric Chang, “Vocabulary-independent
  39. search in spontaneous speech,” in Proceedings of ICASSP, 2004.
  40. speech recognition,” vol. 77, no. 2, pp. 257–286, 1989.
  41. [12] 潘奕誠, 大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法, 碩
  42. 士論文,國立台灣大學資訊工程研究所, 2002.
  43. [13] Machine Intelligence Laboratory Cambridge University Engineering Dept. (CUED),
  44. “HTK,” http://htk.eng.cam.ac.uk.
  45. speech.sri.com/projects/srilm/.
  46. component of a speech recognizer,” in IEEE Transactions on Acoustics, Speech and
  47. [17] Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas,
  48. [19] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete
  49. data via the EM algorithm,” Journal of the Royal Statistical Society, Series B,
  50. vol. 39, no. 1, pp. 1–38, 1977.
  51. 文,國立台灣大學資訊工程研究所, 2008.
  52. [21] Tee K. Chia, Khe C. Sim, Haizhou Li, and Hwee T. Ng, “A lattice-based approach
  53. 31st annual international ACM SIGIR conference on Research and development in
  54. [22] Christopher White Wade Shen and Timothy J. Hazen, “A comparison of query-byexample
  55. methods for spoken term detection,” in InterSpeech, 2009.