透過您的圖書館登入
IP:18.191.194.2
  • 學位論文

期刊文件之掃描影像分析

Document Image Analysis on a Scanned Journal Page

指導教授 : 陳永盛
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


文件影像分析在影像處理與樣型識別的領域中是非常重要的一環。為了能辨識文件中所有的物件以及理解文件內容,發展有效之物件切割與定位方法實有其必要性,此即為本研究之主要目的。近年來,數位館藏與出版發展快速,有非常大量的文件陸續被掃描數位化並典藏於世界各地網站中。另外,將文件內容辨識並轉換為新的形式亦是一項極具挑戰的課題。雖然已有許多文件影像分析研究被發展出來,但仍有許多難題值得研究。本論文我們研究的對象為常見之學術期刊頁面。常見之學術期刊的排版格式有單欄式、兩欄式、以及雙欄混單欄式三種。我們稱這些期刊頁面之掃描影像為J-image。我們對於在J-image中出現的影像歪斜、文件方向不正確以及影像顛倒等問題進行研究以及提出有效的校正方法,得以將這些情形調整為正常J-image,以便將有用之物件進行定位。此處所提及的正常情況意指在J-image中沒有出現影像歪斜、文件方向不正確以及影像顛倒等情況。由於在J-image中,實際存在不同種類之資訊行,例如:一般文字行、表格行、統計圖行、圖形行、嵌入式數學公式行以及獨立式數學公式行等,欲有效定位資訊行是具有挑戰性的。基於輪廓與邊界框分析,我們提出有效的物件偵測方法,得以將資訊行區分成四個類別,分別是普通文字、圖形(包含表格、統計圖以及圖形)、獨立式數學公式以及嵌入式數學公式,這結果將有助於進一步之分析與辨識。在沒有使用OCR的情況下,我們的實驗結果顯示文字行偵測方法有90%以上的正確率,因此證實所提方法的可行性。

並列摘要


Document image analysis is of great importance in the field of image processing and pattern recognition. To recognize each object in a document and further understand the document, an effective approach for object segmentation and positioning should be developed in advance and thus is the main goal of this study. In recent, the digital library and publish are growing very fast. There is a strong need to digitize the documents into a digital library. Moreover, it is a challenge to recognize information from a document and transfer it into a new format. Even many researches in the document image analysis have been created, some issues are still worthy of studying. In this thesis, three types of scanned journal page, namely one-column, two-column, two-column mixed with one-column, are considered. The image of scanned journal page is called J-image here. The issues of skew, page orientation and inversion for a J-image are investigated. To position all useful objects in a J-image, an effective algorithm is developed for adjusting any cases resulted form these issues to a normal case. Here the normal case means that the adjusted J-image is not skewed, not oriented, and not inverted. It is a fact that a variety of lines are existed in a J-image, which may be normal text lines, tables, graphics, figures, embedded mathematical expressions, and isolated mathematical expressions (all these are called lines in this thesis). It is a challenge to identify them since their positions are varied. Based on the analysis of bounding boxes and contours, an effective object detection algorithm is also developed to classify lines into four categories, namely, normal texts, figures (including tables, graphics, and figures), isolated mathematical expressions, and embedded mathematical expressions, which will be helpful for the further analysis and recognition. Without the assistance of OCR, experiments show a 90% above accuracy of lines detection, and thus confirm the feasibility of the proposed approach.

參考文獻


[2] Y. Li, K. Wang, W. Shang Guan, and L. Tang, The research of mathematical formula recognition method base on baseline structure analysis, International Conference on Internet Computing in Science and Engineering, 53-59, 2008.
[3] X.-D. Tian, H.-Y. Li, X.-F. Li, and L.-P. Zhang, Research on symbol recognition for mathematical expressions, International Conference on Innovative Computing, Information and Control, Vol. 3, 357-360, 2006.
[4] S. Chowdhury, S. Mandal, A. Das, and B. Chanda, Automated segmentation of math-zones from document images, International Conference on Document Analysis and Recognition, 755-759, 2003.
[5] U. Garain, B. Chaudhuri, and A. Chaudhuri, Identification of embedded mathematical expressions in scanned documents, International Conference on Pattern Recognition, Vol. 1, 384-387, 2004.
[6] Garain, U., Identification of Mathematical Expressions in Document Images, International Conference on Document Analysis and Recognition, 1340-1344, 2009.

延伸閱讀