透過您的圖書館登入
IP:3.17.184.90
  • 期刊

文件影像之手寫中文字擷取技術

Handwritten Chinese Character Extraction from Document Images

摘要


光學手寫字的資訊不像印刷字般的穩定性,因此對於手寫稿的文字擷取必須有別於印刷文件的擷取方法。傳統的文字擷取方法,不外乎是利用相連元件偵測法和投影輪廓分析法,來找出可能的文字區塊位置。由於手寫文件的傾斜校正不易,前者的方法通常比後者有較佳的文字抽取結果。在本研究當中,我們先利用堆疊方式偵測出所有相連元件,而後提出三階段的元件合併過程,利用元件間彼此的重疊性、元件間的間隔大小、與元件兩側鄰居間隔差異性,來決定兩元件是否合併。最後再利用字元區塊投映與鄰近元件位置差異檢測方式,決定出所有字元的讀序。在實驗過程,我們請15人寫了57張手寫稿,合計1148字元。以掃描器數位化成tif格式影像後,利用本系統作字元抽取,可得到98.43%的正確抽取率。

並列摘要


Information in handwritten characters is unstable when compared to printed characters in document images. The extraction approach for handwritten characters must be different from that for printed characters. Traditionally, connected component detection and projection profile analysis have been two common approaches for determining character positions in document images. Since the skew-detection result from handwritten document images is unsatisfactory, the former extraction approach can obtain better results than the latter. In this study, connected components are detected by using a stack-based approach, and a three-stage process is proposed to merge components belonging to the same character. Overlapping of components, a gap between components, and an opposite-gap ratio of components are used to determine the possibility that two components may be merged. Finally, character-block mapping and position difference among neighboring components are used to determine the reading order for the extracted characters. In this experiment, 1148 characters in 57 manuscript documents written by 15 persons are used to test the proposed technique. The extraction rate is 98.43%.

被引用紀錄


吳家宏(2009)。基於馬可夫隨機場之表格文件擷取系統〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2009.00347
蕭文海(2008)。針對數位保存而設計之具彈性且智慧的影像轉置系統〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2008.00304
翁偉鐘(2008)。利用Adaboost演算法於數位影像中進行文字定位〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916284439

延伸閱讀