透過您的圖書館登入
IP:18.223.239.15
  • 學位論文

結合Tesseract與LSTM之篆字辨識系統之研究

Research on the Integration of Tesseract and LSTM for Seal Character Recognition System

指導教授 : 鄭立德

摘要


摘要 小篆體,古代中國書法藝術的珍品之一,承載了中華文明漫長歷史的沉澱與累積。作為中國最早的統一文字之一,小篆體具有深厚的歷史淵源和文化底蘊,其獨特的書寫風格和形態特徵在千百年來一直被人們所推崇和傳承。不過現今的時代大眾所使用的文體為楷書繁或簡體與白話文,導致在閱讀上會有一定程度的障礙。 本研究透過結合光學字元辨識引擎(Tesseract)與長短期記憶網路(LSTM)來進行文字的辨識,藉由Tesseract將圖像中的文字轉換為可編輯的文字數據,使得電腦可以進一步處理和分析這些文字資訊後,在透過LSTM演算法的加強Tesseract的辨識準確率,最後將小篆體文字辨識成現代相對應的文字。

關鍵字

篆字 小篆

並列摘要


Seal script, one of the ancient treasures of Chinese calligraphy, embodies the long and rich history of Chinese civilization. As one of China's earliest unified scripts, seal script has deep historical roots and cultural significance. Its unique writing style and characteristics have been admired and passed down through generations. However, in modern times, the common scripts used are regular script, either traditional or simplified, and vernacular Chinese, which can create some degree of difficulty in reading seal script. This study combines Optical Character Recognition (OCR) using Tesseract with Long Short-Term Memory (LSTM) networks to recognize characters. By using Tesseract, the text in images is converted into editable text data, allowing computers to further process and analyze this textual information. Subsequently, the LSTM algorithm enhances the recognition accuracy of Tesseract, ultimately transforming seal script characters into their modern equivalents.

並列關鍵字

LSTM Tesseract seal characters small seals

參考文獻


參考文獻
[1] 戴琼,周明全,付倩. 小篆文字的自动识别[J]. 计算机技术与发展,2016,26(3):1-4. DOI:10.3969/j.issn.1673-629X.2016.03.001.
[2] 李凯, 邓杰荣, 张鑫, 李勇博, 习雨璇, & 李淄博等. 基于人工智能的大篆字体识别系统研究与验证. 微纳电子与智能制造, 2(1), 5. (2020).
[3] 徐宇(主編)、高松、徐運全等。鄧石如篆書千字文,江蘇鳳凰美術社,天津。(2022年7月重印)。
[4] https://hackmd.io/@defu/OpenCVreview

延伸閱讀