透過您的圖書館登入
IP:18.225.255.134

數位典藏與數位人文/Journal of Digital Archives and Digital Humanities

  • OpenAccess

臺灣數位人文學會 & Ainosco Press,正常發行

五年影響係數 0.087
0.087 2023 年
學門 領域排序
圖書資訊學 9
歷史 20
數據由ACI學術引用文獻資料庫提供

選擇卷期


已選擇0筆
  • 期刊
  • OpenAccess

In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese newspaper. Our basis is a small fraction of the image corpus for which text ground truth exists. We introduce a character segmentation method which produces over 90,000 labeled images of single characters and train a GoogLeNet classifier as an OCR model. In addition, we create synthetic training data from character images extracted from Song-Ti fonts. Randomly augmented on the fly and used for pre-training, they increase OCR accuracy from 95.49% to 96.95% on our test set. Finally, we employ post-OCR correction based on a pre-trained masked language model and present heuristics to select the required hyperparameters, by which we are able to correct 16% of remaining classification errors, increasing accuracy on the test set to 97.44%.

  • 期刊
  • OpenAccess

Purpose: This paper aims to study relationships existing in Chinese rubbings to supplement the Relationship Taxonomy for Linked Data Models of Chinese Resources (Relationship Taxonomy), for the purpose of Linked Data modeling and further enhancing the discoverability and visibility of Chinese library resources. Design/methodology/approach: This study adopts a qualitative content analysis method and purposive sampling strategy. Two datasets, the Chinese Rubbings Collection from the Fine Arts Library and Harvard-Yenching Library and the rubbings records from the Palace Museum (Beijing), with around 5,500 records in total, were used to analyze the relationships. The Chinese rubbings collection with 524 rubbing records from the Smithsonian National Museum of Asian Art was used to evaluate the Relationship Taxonomy. Findings: A four-layer relationship model of Chinese rubbings was created. Supplementary relationships of resource-resource, person-resource, and person-person relationships were added to the existing Relationship Taxonomy. Resource-event was added as a fourth top-level category. Chinese rubbings collections were a good supplement to the Relationship Taxonomy. Originality: This study examines the relationships in Chinese rubbings in a Linked Data context. Relationships were added to improve the Relationship Taxonomy. The Relationship Taxonomy is new and culturally specific. Research limitations/implications: The proposed rubbings relationship model is a good reference to examine relationships for other artworks and artifacts. It can also be extended to the rubbings created in other cultures/countries. Practical implications: The Relationship Taxonomy can be used as predicates (linkages) to create RDF triples. It can also be used to improve resource cataloging especially the cataloging of rubbings and their objects. More plentiful relationships also help improve information retrieval and knowledge discovery. Social implications: This study provides a window to help readers become familiar with a specific type of Chinese historical resource and its contributions to the digital age.

  • 期刊
  • OpenAccess

詞嵌入是利用語料庫自動產生語義向量的方法,本論文的目標為探索詞嵌入在Comprehensive Buddhist Electronic Text Archive(CBETA)漢文佛典中的可能應用面向。為取得適用於佛學研究的詞嵌入最佳模型,本文利用莊春江辭典、丁福保辭典和Digital Dictionary of Buddhism辭典建立實驗資料集,並設計偵測同義詞及干擾詞等兩種評估實驗來取得模型優化的基線。結果發現Word2Vec CBOW(continuous bag-of-words)、Dimension 400、Window 10、Epoch 10為最佳超參數組合,驗證正確率為0.87,測試正確率為0.86。據此,我們將CBETA語料分類訓練出不同詞嵌入模型,再跑出依據年代、譯者及部類的不同範圍語料對比詞表,並進行實際應用分析。本論文的主要貢獻有三:一、建置適用於漢文佛典研究之詞嵌入同義詞資料集;二、找出適於漢文佛典文本之詞嵌入超參數;三、探討與分析詞嵌入於漢文佛典研究之實例,包括可用於判斷譯詞的語義核心演變、能用於界定不明確的語義、能透過語義類比找出相關概念、能找出各部類的核心概念、能藉以拓展研究廣度和深度,以及可用於驗證傳統研究結果等面向。

  • 期刊
  • OpenAccess

1922年,旅歐中國共產組織(簡稱「中共歐」)在法國建立。該組織包括中國共產黨旅歐支部和旅歐中國共產主義青年團這兩個團體。該組織的成員包括了一批最有名的中國革命領導人,如周恩來、趙世炎、鄧小平、聶榮臻、朱德、蔡暢、李富春等。本文使用整合式歷史分析對這一組織及其領導人物進行研究。本文使用的新資料包括1985年和1990年在中國大陸展開兩次口述史訪談紀錄和筆者創建的「中國人物傳記資料庫」(China Biographical Database, CBD)。本文對188名中共歐成員的分析都是以這個資料庫為基礎的。本文的分析方法包括三個方面:一、介紹基礎的資料,包括兩次訪談紀錄和中國人物資料庫;二、簡要描述中共歐的形成過程及其領導人對中共歐活動的看法,尤其是最早的三個中共歐秘書—趙世炎、周恩來、任卓宣—在中共歐的角色和地位;三、用定量分析方法(如樹狀圖和網絡分析)來對歷史上團體與個人之間的關係及其意義加以探討。這三種方法的有機結合展示了一個整合式歷史分析法的潛力。

  • 期刊
  • OpenAccess

傳統英美文學課程主要挑選知名英美作品進行導讀,採近讀方式分析文學作品的寫作技巧與修辭特色,並介紹文學術語與風格,然而學生自主閱讀的能力不足、深度不夠,課堂上只能被動地接收老師傳遞的資訊,對於所講解之作品一知半解,遑論對於文學術語概念之理解,或進行文學文本的詮釋和分析,學習成效低落在所難免。本文以「文學作品讀法」課程為例,提出結合文本編碼規範(Text Encoding Initiative, TEI)/可擴充標誌語言(Extensible Markup Language, XML)與數位人文工具來標注、處理、呈現與分析文學文本資料,不僅能提升學生資訊素養與批判思考能力,從學生實作可看到詮釋與分析文學作品能力的進步表現,藉此討論TEI作為近讀教學工具的優勢與挑戰,並針對數位人文課程之教學設計提出建議。