許多數位人文的研究會需要使用到文本中的詞彙標記,而目前已經有許多現有的文本標記工具可以使用,由於各個工具擅長的詞彙標記不同,故本論文希望能夠整合多個工具去使用,但是因為各個工具所使用之格式不同,所以若要直接整合使用是無法辦到的事情,勢必要進行格式之間的轉換。為此本論文分析出文本標記格式中會有哪些資訊,並且將這些資訊進行分類,最後定義出了新的文本標記格式STAML去儲存這些資訊,並且將STAML作為各種不同文本標記格式之間轉換的中介語言,接著再利用網頁平台將這個轉換程式實際地開發出來。透過這個STAML格式與其轉換程式,本論文達到可以將這些文本標記工具整合使用的目的,藉此希望讓數位人文的研究能夠更加地順利。
Tagging named entities in a text is often an essential part of preparing the text to be used in digital humanities research. Although there are several text-tagging tools available to researchers, each tool is designed for a specific purpose and the tagging formats that they use are often different. Conse- quently text tagged using a specific tool cannot be reused by another person with a different tool. In this thesis we propose an approach to integrate different text-tagging formats produced from different tools. We introduce the Simple Text-Annotation Markup Language (STAML), which serves as an intermediary representa- tion between different tagging formats. Through STAML, texts tagged us- ing one format can be used in another tagging tool without disrupting the existing annotations. STAML and web-based programs are implemented for several common Chinese language based tagging formats such as those used by MARKUS, a popular tagging tool, THDL, and TEI.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。