現今的主流語料庫系統皆已提供諸多以頻率為本的量化數值,也能提供超語言資料的篩選功能。不過,大部分的系統工具主要聚焦在非互動的文本或文集語料。因此本論文主要核心內容為擴建電腦中介言談(亦即網路語言)語料庫系統的功能,讓語料庫系統能在言談以及構式方面更為透明。此兩方面為目前語料庫系統尚未出現的搜尋功能。在本論文建置語料庫的過程中,採用網路論壇作為網路語言的實例,進行功能開發與探索。網路論壇中的語言使用,因其非同步性、引用機制與匿名性等等媒介特質,產生出大量語言遊戲、雙關語、交錯話輪為特色。在言談方面,我加入的是搜尋詞於話輪和語句中的位置篩選功能,以及相鄰對語料的處理。在構式方面,我將計算語言學中分布式語意的成果整合進語料庫系統,提供一種檢索語料的想像,讓系統能根據使用者所提供的構式檢索進行語意相似的檢索。本研究希望在語料庫系統新功能的開發與探索上,一方面提供研究網路語言的更實用的資源,另一方面則在語料庫語言學中能對言談與構式的研究更加深入。
Current corpus systems have already offered a wide variety of tools for frequency-based quantitative measurement at different levels and metadata filtering mechanisms. However, most of the systems focus mainly on non-interactional textual data. Extending from the corpora we already constructed, the main goal of this thesis is to implement new functions into the PTT Corpus, a corpus of computer-mediated communication, with regards to discourse- and construction-aware aspects. PTT, as an instance of web forum, is used as an exemplar for the exploration of new functions in corpus compilation. Due to technological affordances in a web forum such as asynchronicity, quoting mechanism, and anonymity, the language use of the web forum is characterized by language game, wordplay, and interleaved turn exchanges. For the discourse-aware aspect, the function of searching and filtering the position in a turn and an utterance has been implemented, along with a more transparent layout for interactional pairs. As for the construction-aware aspect, the achievement of distributed semantic models is integrated into the corpus to provide a more intuitive way for semantic-driven construction extraction. To conclude, our study not only offers a practical language resource for research, but also deepens the possibilities of the role of the corpus in discourse and construction studies.