本篇論文,提出了一個從包含數學方程式的文件影像中自動擷取數學方程式的方法,使用文件分析的步驟將數學方程式從混合各種類型資料的文件中擷取出來,而不需經過文字辨識的過程。 有效地從含有數學方程式的科學性文章中擷取出數學方程式對於一個可辨識學方程式辨識的OCR系統是個關鍵的步驟。本篇論文所提出的方法可以有效的從含有數學方程式的文章中截取出數學方程式,並且找出數學方程式在文件影像中所在的位置及所佔的範圍以提供數學OCR系統辨識數學方程式時所需的重要資訊。且所提出的方法可以將文章中的數學方程式和一般文字分離,讓一般的OCR系統只處理一般文字的辨識,這可以提高一般OCR系統對於含有數學方程式之文章的辨識正確率。 本篇論文從IEEE以及AMC的digital library中收集時用所需的科學性質的文章,用來驗證所提出的方法。實驗結果顯示出所提出的方法有94%的正確率,將數學方程式從所選取的文章中分離。並進一歨的提供正確率的評估和結果的比較。
This paper presents a method for automating the extraction of the mathematical equations from selected document images. This method uses the proposed document analysis procedures to separate the mathematical equations from the textual images without character recognition. Efficiently extracting the mathematical equations from scientific documents is a key step to an OCR system for recognizing the mathematical equations and improving the accuracy. The proposed method finds the exact areas of mathematical equations and extracts mathematical equations. The located area information is fundamental to recognizing the mathematical equations. Extracting the mathematical equations enables the commercial OCR system to process only the usual text and improves the recognizing rate on the documents containing mathematical equations. This paper conducts experiments using the scientific document images, which are selected from IEEE and ACM digital libraries, to examine the proposed method. The experiment results show that the proposed method is able to separate the mathematical equations from the given document with the accurate rate more than 94%. In addition, the accuracy evaluations and results comparisons are provided and discussed.