Title

用機器學習整合索引資訊之中文語音文件檢索

Translated Titles

Integrating Indexing Information by Machine Learning for Chinese Spoken Document Retrieval

DOI

10.6342/NTU.2011.01582

Authors

楊家銘

Key Words

語音 ; 語音文件 ; 語音文件檢索 ; 機器學習 ; 語音文件索引 ; Speech ; Spoken Document Retrieval ; Machine Learning ; Spoken Document Retrieval Indexing

PublicationName

臺灣大學資訊工程學研究所學位論文

Volume or Term/Year and Month of Publication

2011年

Academic Degree Category

碩士

Advisor

李琳山

Content Language

繁體中文

Chinese Abstract

語音文件檢索在資訊爆炸的多媒體時代日益重要。大部分的語音文件檢索的 技術包含兩大步驟, 一是自動語音辨識技術, 二是使用辨識後產生的索引資訊 進行檢索。第一個步驟面對的是可能的高辨識錯誤率, 會影響產生的語音文件 索引所攜帶資訊的正確性; 第二個步驟就是如何充分使用這些索引所帶的資訊 並將之發揮到極致。本論文所研究的主題方向屬於上述第二部份, 考慮如何將 中文語音中不同語言單位(例如:詞(Word)、字(Character)、音節(Syllable)、聲韻 母(Initial-Final)等...) 所產生的索引資訊, 透過排序學習(Learning to Rank)的方法 整合起來。 本論文共研究了兩種排序學習(Learning to Rank)的方法︰調適排序(AdaRank)及 針對平均準確率的支撐向量機(Support Vector Machine for Optimizing Mean Average Precision, SVM-map)。 實驗結果顯示, 使用針對平均準確率的支撐向量機的結果是比較好的, 比起調適排序, 最佳的平均準確率均值進步是4.70%; 比起已知個別檢索效能 最佳(Oracle)的索引, 綜合查詢指令進步了8.67%, 其中辭典內查詢詞彙的部份 進步了6.30%, 而辭典外查詢詞彙效果最為明顯, 有約11.63%的直接進步。這 些實驗結果也驗證, 使用不同語言單位所產生的語音文件索引, 透過排序學 習找到適當的對應權重, 予以加成, 可以使得語音文件檢索的效能以及強健 性(Robustness)獲得更進一步的提昇。

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學研究所
Reference
  1. [1] Vannevar Bush, “As we may think,” in The Atlantic Monthly, 1945.
    連結:
  2. [2] Sparck, G. J. F. Jones, J. T. Foote, and S. J. Young, “Experiments in spoken document
    連結:
  3. July 1996.
    連結:
  4. [3] J. Garofolo, G. Auzanne, and E. Voorhees, “The TREC Spoken Document Retrieval
    連結:
  5. Track: A Success Story,” 2000.
    連結:
  6. NISO press, 2004.
    連結:
  7. [5] Berlin. Chen, Jen-Wei. Kuo, Yao-Min. Huang, and Hsin min. Wang, “Statistical
    連結:
  8. chinese spoken document retrieval using latent topical information,” Interspeech,
    連結:
  9. [7] van Rijsbergen Cornelis Joost., Information Retrieval, 2nd edition, Butterworths.,
    連結:
  10. [8] Gerard. Salton, Automatic Information Organization and Retrieval, McGraw Hill
    連結:
  11. [9] J. Xu, Y. Cao, H. Li, and Y. Huang, “Cost-sensitive learning of svm for ranking,”
    連結:
  12. [10] S. E. Roberson and K. Sparck Jones, “Relevance weighting of search terms,” Journal
    連結:
  13. [11] R. Nallapati, “Discriminant models for information retrieval,” SIGIR ’04: Proceeding
    連結:
  14. development in information retrieval, 2004.
    連結:
  15. [12] H. Drucker, D. Wu, and V. N. Vapnik, “Support vector machines fpr spam categorization,”
    連結:
  16. IEEE Transactions on Neural Networks, 1999.
    連結:
  17. ordinal regression,” Advances in Large Margin Classifiers, 2000.
    連結:
  18. [15] Luc De Raedt and Stefan Wrobel, Eds., Learning to rank using gradient descent.
    連結:
  19. “Adapting ranking svm to document retrieval,” in Proceedings of the 29th annual
    連結:
  20. [17] Christopher J. C. Burges, Robert Ragno, and Quoc V. Le, “Learning to Rank with
    連結:
  21. Hoffman, Bernhard Sch‥olkopf, John C. Platt, and Thomas Hoffman, Eds. 2006, pp.
    連結:
  22. 193–200, MIT Press.
    連結:
  23. [18] Jun Xu and Hang Li, “AdaRank: a boosting algorithm for information retrieval,” in
    連結:
  24. and development in information retrieval, New York, NY, USA, 2007, SIGIR ’07,
    連結:
  25. vector method for optimizing average precision,” in Proceedings of the 30th annual
    連結:
  26. [21] Ciprian Chelba, “Spoken document retrieval and browsing,” Hopkins CLSP, 2007.
    連結:
  27. [22] L. Mangu, “Finding consensus in speech recognition: word error minimization and
    連結:
  28. [23] Ciprian Chelba and Alex Acero, “Position specific posterior lattices for indexing
    連結:
  29. for Computational Linguistics.
    連結:
  30. [24] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in
    連結:
  31. [25] the Cambridge University Engineering Department (CUED), “Htk,”
    連結:
  32. continuous mandarin speech recognition,” Master Thesis in National Taiwan
    連結:
  33. University, 2002.
    連結:
  34. lattices adn latent semantic analysis,” Master Thesis in National Taiwan University,
    連結:
  35. document retrieval by directly learning from hte evaluation measure,” ICASSP,
    連結:
  36. retrieval,” Information Processing & Management, vol. 32, no. 4, pp. 399–417,
  37. [4] NISO National Information Standards Organization, UnderstandingMetadata,
  38. 2004.
  39. [6] Amit. Singhal, “Modern information retrieval: A brief overview,” IEEE, 2004.
  40. 1979.
  41. Text, 1968.
  42. Proc. ECML, 2006.
  43. of American Society for Information Sciences, pp. 129–146, 1976.
  44. of the 27th annual international ACM SIGIR conference in Research and
  45. [13] Adam. Berger, “Statistical machine learning for information retrieval,” CMU Phd.
  46. Thesis, 2001.
  47. [14] R. Herbrich, T. Graepel, and K. Obermayer, “Large margin rank boundaries for
  48. ACM, 2005.
  49. [16] Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, and Hsiao-Wuen Hon,
  50. ACM SIGIR conference, Seattle, Washington, USA, 2006, pp. 186–193, ACM.
  51. Nonsmooth Cost Functions,” in NIPS, Bernhard Sch‥olkopf, John C. Platt, Thomas
  52. Proceedings of the 30th annual international ACM SIGIR conference on Research
  53. pp. 391–398, ACM.
  54. [19] Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims, “A support
  55. international ACM SIGIR conference on Research and development in information
  56. retrieval, New York, NY, USA, 2007, SIGIR ’07, pp. 271–278, ACM.
  57. [20] Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li, “Learning to rank:
  58. from pairwise approach to listwise approach,” in ICML ’07: Proceedings of the
  59. 24th international conference on Machine learning, New York, NY, USA, 2007, pp.
  60. 129–136, ACM.
  61. other applications of confusion networks,” Computer Speech & Language, vol. 14,
  62. no. 4, pp. 373–400, Oct. 2000.
  63. speech,” in Proceedings of the 43rd Annual Meeting on Association for Computational
  64. Linguistics, Stroudsburg, PA, USA, 2005, ACL ’05, pp. 443–450, Association
  65. speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb.
  66. 1989.
  67. http://htk.eng.cam.ac.uk/.
  68. [26] SRI International, “Srilm,” http://www.speech.sri.com/projects/srilm/.
  69. [27] Yi-CHeng. Pan, “One-pass and word-graph-based search algorithms for large vocabulary
  70. [28] Hung ling. Chang, “Spoken document retrieval based on position specific posterior
  71. 2008.
  72. [29] Chao hong. Mong, Hung yi. Lee, and Lin shan Lee, “Imporved lattice-based spoken
  73. 2009.
  74. [30] Jun Xu, Tie Y. Liu, Min Lu, Hang Li, and Wei Y. Ma, “Directly optimizing evaluation
  75. measures in learning to rank,” in SIGIR ’08: Proceedings of the 31st annual
  76. international ACM SIGIR conference on Research and development in information
  77. retrieval, New York, NY, USA, 2008, pp. 107–114, ACM.