透過您的圖書館登入
IP:3.147.61.142
  • 學位論文

現代漢語平行語料庫建構及其應用

Construction and Applications of a Modern Chinese Parallel Corpus

指導教授 : 謝舒凱

摘要


隨著中文使用者的人數增加,語言上的變異也會隨之產生,這些變異可能來自外來或本身擁有的因素。雖然已存在研究中文變異的語料庫,但是這些資源不適用於研究篇口語的語域,一個能夠反應非正式的語言是影片裡的字幕,此論文的目的是以電影字幕和TED Talks字幕為基礎建構一個平行語料庫,方便學者研究台灣國語和大陸國語之間的變異

並列摘要


As the number of Mandarin Chinese speakers continues to increase, variations will inevitably begin to emerge as all speakers do not reside in one place. This variation can stem from internal factors or external ones, such as culture or location. While there exist corpora that can be used to study Mandarin Chinese variation, the existing resources do not offer insight into more colloquial registers. A good source of material that can more reliably reflect everyday speech is subtitles for TV shows, movies, and videos in general. Because the subtitles are meant to reflect dialogue heard on screen, it can better reflect colloquial speech. The goal of this thesis is to create a parallel corpus based on movie subtitles and TED Talks that can allow researchers to study language variation between Taiwan Mandarin and Mainland Mandarin.

參考文獻


Srt subtitles. Accessed: 2019-04-15.https://matroska.org/technical/specs/subtitles/srt.html.
Al-Obaidli, Fahad, Stephen Cox & Preslav Nakov. (2016). Bi-text alignment of movie subtitles for spoken english-arabic statistical machine translation. In International conference on intelligent text processing and computational linguistics, 127–139. Springer.
Alammar, Jay. (2018). Visualizing a neural machine translation model (mechanics of seq2seq models with attention.http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/.
Aziz, Wilker, Sheila Castilho Monteiro de Sousa & Lucia Specia. (2012). Cross-lingual sentence compression for subtitles. In The 16th annual conference of the european association for machine translation, 103–110.
Bahdanau, Dzmitry, Kyunghyun Cho & Yoshua Bengio. (2014). Neural machine translation by jointly learning to align and translate.

延伸閱讀