透過您的圖書館登入
IP:3.15.188.27
  • 學位論文

一個簡單相連中文手寫工整字的新切割法

A New Segmentation Technique for simply Connected Handprinted Chinese Characters

指導教授 : 林啟芳
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


光學文字辨識(Optical Character Recognition)一直是文件處理自動 化系統中最重要的部份。然而,文字辨識率的高低往往受到文字切割的正 確性和完整性影響。所以,文字切割技術可說是文件處理自動化的關鍵技 術之一。雖然有關這方面的研究開始的相當早,但主要的處理對象是英文 字和數字(如郵遞區號),對於字形結構複雜、字集龐大的中文字來講, 要使用那些方法來解決切割的問題,實在有很大的困難。尤其是手寫中文 字的切割,更有著極大的問題存在。本論文中,我們提出一新的文字切割 方法,可以同時解決中文手寫工整(handprinted)字的切割問題。所提 方法主要分成兩個單元,首先是筆劃抽取單元,負責將跑長碼表示的文字 影像轉換為以筆劃為基本單位的集合。作法包括區域抽取、雜訊去除、變 異度檢查、水平筆劃抽取以及垂直筆劃抽取等步驟。另一則是分類單元。 我們採用的是階層群集法,其主要目的在將上一單元所抽取出之筆劃重新 歸類,使同一字的筆劃能夠歸於同一類,以達成文字切割的目的。最後, 我們以「四庫全書」的內容為實驗對象,驗證所提方法之可行性。

並列摘要


The recognition of optical characters is one of the most important process in the development of automatic document processing system, and the recognition rate is deeply affected by the correctness and completeness of the segmentation of characters. So, to develop a powerful segmentation technique for the system is necessary. Many techniques have been proposed in the past few years, but they are concentrated on solving the segmentation problems of the alphabetic characters and the numerical digits(e.g., the postal codes.) They cannot apply to the Chinese characters well because of the huge character set and the complicated topology in shapes. In this thesis, we present a new method to solve the segmentation problem of simply connected handprinted Chinese characters. The method consists of two units. In the stroke extraction unit, characters are scanned and represented by runlength codes, and the strokes are extracted by applying the following five processes: (1) the region extraction process; (2) the noise removal process; (3) the variance check process; (4) the horizontal extraction process; (5) and finally the vertical extraction process. In the classification unit, by applying the hierarchical clustering method, the strokes of the same character are grouped together by a certain criterion. Finally, experiments are held to demonstrate the superiority and correctness of the method.

延伸閱讀