透過您的圖書館登入
IP:3.17.173.138
  • 學位論文

用於影像中表格偵測與辨識的語義分割模型

A semantic segmentation model for table detection and recognition from images

指導教授 : 黃乾綱
本文將於2027/07/04開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


電子文件取代以往透過紙本保存資料的方式,提高了便利性,是目前保存資料重要的方式。相較於文字段落,表格形式的訊息更具有結構化以及可理解性。隨著手機的普及手機拍攝技術的進步,越來越多人使用手機以拍攝方式儲存生活中可見的表格。然而,大多數現有的表格辨識方法都是為掃描的文檔圖像或可攜式文件格式(PDF)而設計的,鮮少有針對拍攝表格辨識的相關研究。影像擷取表格時可能會遭遇下列問題,拍攝視角不正、表面扭曲傾斜問題、光影干擾以及印刷品質及顏色不佳,這些都會導致表格辨識的困難。 本論文在研究全格線表格的辨識。辨識模型使用深度學習模型MobilenetV2_U-net,而訓練資料使用拍攝全格線表格。為了節省標註拍攝表格時所耗費的大量時間以及人力,本研究藉由電腦視覺的方法快速大量標記電子文件表格並進行資料增強以模擬拍攝表格。辨識完成的表格線透過電腦視覺的方法並結合現有的文字辨識API,將拍照表格轉為可在電腦上編輯的電子文件檔案。實驗結果顯示,單元格辨識率的部分為準確率95.2%,召回率為94.9%,F1-Score為95%。

並列摘要


Replacing the previous method of saving data through paper, electronic documents have improved convenience and are now an important way to save data. Tabular messages are more structured and understandable than text paragraphs. With the popularity of mobile phones and the powerful camera technology, more and more people use mobile phones to capture table information by photograpgh. However, most of the publish table recognition methods are designed for scanned document images or Portable Document Format (PDF), and there are few related researches on table recognition in photo. In addition to serious distortion and inclination caused by the shooting angle of view or external force, the shooting table in the real world, the interference of environmental shadows and the printing or color of the table line itself will make the existing table recognition methods unable to identify. This paper is studying the identification of the whole grid table. The recognition model uses the deep learning model MobilenetV2_U-net, and the training data uses the full grid shooting table. In order to save a lot of time and manpower when annotating shooting tables, this study uses computer vision to quickly and massively mark electronic document tables and perform data augmentation to simulate shooting tables. The table lines which were completely recognized can use methods in computer vision, classification of coordinates and existing OCR API to convert the shooting tables into the editable electronic document. The experimental results show that the part of the cell recognition rate is the accuracy rate of 95.2%, the recall rate of 94.9%, and the F1-Score of 95%.

參考文獻


[1] D. Avis, D. Bremner, and R. Seidel. How good are convex hull algorithms?Computational Geometry, (5-6):265–301, 1997.
[2] A. L. Bertrand Coüasnon. Recognition of Tables and Forms. Springer London,London, 2014.
[3] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
[4] chineseocr. Chineseocr, 2018. https://github.com/chineseocr/chineseocr.
[5] D. Doermann, J. Liang, and H. Li. Progress in camera-based document image analysis. In Seventh International Conference on Document Analysis and Recognition,2003. Proceedings., pages 606–616. IEEE, 2003.

延伸閱讀