透過您的圖書館登入
IP:216.73.216.156
  • 學位論文

應用機器翻譯系統評量口譯表現:語料庫可行性研究

Using Machine Translation Systems to Assess Interpreting Performance: A Corpus-Based Feasibility Study

指導教授 : 高照明

摘要


人工評量口譯品質面臨許多理論上的挑戰,且在實務上曠日費時。本文提出應用機器翻譯及其品質評量指標自動評量口譯品質的方法,並測試其在大規模高風險的短逐步口譯測驗中的可行性。以公開語料庫為基礎,基準指標(F1)與人工評量之間呈現穩定高度的相關性,其中包含的兩個組成指標(Recall & Precision)具有直觀解釋,為學生改進口譯表現提供可能的方向。統計數據還顯示,當未編輯的機器翻譯已適用於目的時,人工後期編輯可能會對基準指標的可靠性產生反效果。如果語音辨識技術足夠準確且容易取用,本方法將具有實用價值。理論價值方面,基準指標有潛力作為衡量口譯品質的標尺,而其兩個組成指標可能替代忠實度與流暢度這對模糊的標準,有待口譯訓練資料中的驗證。

並列摘要


Human assessments of interpretation quality face many theoretical challenges and tend to be quite laborious and expensive in practice. This study proposes a method that applies MT and highly successful MT quality metrics to assess interpretation quality automatically and tests its feasibility in a use case of large-scale, high-stakes short consecutive interpretation tests. Based on a public corpus, the baseline metric (F1) proves to have consistently high correlations with human assessments, with two component metrics (Recall and Precision) having intuitive interpretations that facilitate insights into how a student might improve. Statistics also show that human post-editing is counter-productive for the reliability of the baseline metric when unedited MT is already fit for purpose. This method will have practical value if ASR is sufficiently accurate and readily available. Regarding theoretical merits, the baseline metric shows potential as a yardstick of interpretation quality, with the component metrics as a potential alternative to the fuzzy criteria of fidelity & fluency, which calls for verification based on trained interpreters’ data.

參考文獻


Babych, B., & Hartley, T. (2004). Extending the BLEU MT assessment method with frequency weightings. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (Barcelona, Spain, July 21-26, 2004), 621-628.
Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT assessment with improved correlation with human judgments. In Goldstein, J., Lavie, A., Lin, C. Y., & Voss, C. (Eds.), Proceedings of the ACL Workshop on Intrinsic and Extrinsic Assessment Measures for Machine Translation and/or Summarization (June, 2005), 65-72.
Callison-Burch, C., Osborne, M., & Koehn, P. (2006). Re-evaluating the role of BLEU in machine translation research. In McCarthy, D., & Wintner, S. (Ed.), 11th Conference of the European Chapter of the Association for Computational Linguistics (Trento, Italy, April 3-7, 2006). Pennsylvania: Association for Computational Linguistics, 249-256.
Eyckmans, June, Anckaert, Philippe & Segers, Winibert (2009). The perks of norm-referenced translation assessment. In C. V. Angelelli & H. E. Jacobson (Eds.), Testing and Assessment in Translation and Interpreting Studies. Amsterdam: John Benjamins, 73–93.
Harari, Y. N. (2018). 21 Lessons for the 21st Century. New York: Random House, 215.

延伸閱讀