透過您的圖書館登入
IP:18.116.26.90
  • 學位論文

低成本高效能單本書頁面邊界偵測模型與攤平演算法設計

Design of Low-cost and High-performance Book Page Edge Detection Model and Flattening Algorithm for Single Book

指導教授 : 張時中 李長鴻 王富平
本文將於2025/08/14開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


教學者使用書本進行遠距線上教學時,會搭配「遠距教學視訊系統」並使用其中一個相機拍攝桌面上的書本。而攤開書本的彎曲外型與桌面上並存的非書頁雜物,都會影響透過相機拍攝並傳輸給學習者的書頁影像內容可讀性。因此「遠距教學視訊系統」需要快速且有效的書頁偵測能力與攤平功能,使學習者在可容許的等待時間(約2秒)內取得高品質且可讀的書頁影像。 「遠距教學視訊系統」通常包含兩台相機與一支麥克風一併連接到電腦主機上並搭配遠距教學平台使用,後續簡稱為系統。為了攤平桌面上的彎曲書頁,市面上會額外添加成本高達NT$23900的直立式掃描機到系統中,可在4秒內一併完成書頁偵測與攤平。另有僅使用系統中的一個相機來拍攝桌面影像,並在此桌面影像中偵測書頁的彎曲特徵,再根據此彎曲特徵攤平書頁的方法。此方法雖能省下額外的硬體設備成本,但計算時間平均都在4秒以上。而且,這兩類方法都要求書本放置在單純黑色背景上,以及從書頁正上方拍攝影像。對環境要求與成本偏高,未能供給一般個人教學者於遠距教學環境中運用。因此「遠距教學視訊系統」需要更低成本、更彈性的書頁偵測能力與更有效率的攤平方法。 針對供給一般個人遠距教學系統適用的低成本、彈性、有效率的書頁偵測與攤平,本論文主要的研究問題(P)、相應的挑戰(C)和我們提出的解決方法(M)如下: P1) 不單純桌面上單本書的頁面偵測問題: 在多種背景且含有雜物的不單純桌面上快速偵測單本書的頁面邊界特徵。 C1) 現行使用「基礎影像物件輪廓偵測技術」偵測書頁,容易受到背景顏色與雜物的影響,在不單純桌面上的偵測效果很差。另外,偵測與攤平合併的運算時間預算不超過兩秒。因此,要在不單純桌面上快速有效的偵測書頁邊界特徵是一項挑戰。 M1) 新設計與建構書頁邊界偵測模型 (Page Edge Detection Model,PEDM): 訓練基於捲積神經網路與多尺度網路的深度學習模型,來偵測書頁邊界特徵。此方法有效避免非書頁雜物與背景顏色的干擾,在書頁邊界完整且不受到遮擋的情況下,可約在0.5秒內快速偵測書頁邊界特徵。 P2) 更有效率的書頁攤平演算法設計問題: 需要設計能夠處理不同拍攝角度且更有效率的算法來攤平書頁影像。 C2) 現行書頁攤平算法要求相機必須從書頁正上方拍攝,且其準確度與運算速度也不足。要將不同角度拍攝而得的書頁影像,在符合運算時間預算內獲得對學習者優良可讀的書頁影像是一項挑戰。 M2) 新設計以書頁二維邊界特徵估計其三維彎曲失真之立方曲線書頁攤平演算法(Cubic Curve Flattening Algorithm, CCF): 我們新設計從二維特徵精準地估計書頁三維彎曲失真的演算法,能在1.5秒內有效攤平常見拍攝角度範圍下的書頁影像中的一頁。 P3) 書頁攤平結果品質優劣評斷方法問題: 需要使用客觀且符合人眼感受的方法來評估書頁攤平結果的優劣。 C3) 現行評估方法所使用的光學字元辨識技術只限應用於純文字內容的書頁,且與人眼實際感受沒有直接的關聯。因此,要選取或設計更全面且貼近人眼實際感受的評估方法是個新挑戰。 M3) 新應用更彈性且符合人眼實際感受的多尺度結構相似性(Multi-Scale Structure Similarity, MS-SSIM)評估方法: MS-SSIM描述兩影像之間的亮度、對比度和結構關係的相似性,不限應用於純文字內容的書頁。此外,由於人眼習慣擷取影像中物件之間的結構關係資訊,因此MS-SSIM符合人眼的實際感受,是更客觀的評估方法。 本論文的研究發現與貢獻如下: (1) 新設計PEDM書頁偵測模型,在一般個人遠距教學非單純桌面環境中,可有效地在約0.5秒內偵測單本書的頁面邊界特徵。 (2) 創新設計出比現行更精準的CCF演算法,攤平運算既有效率且可容許從0度到30度的桌面拍攝角與±40度的書頁旋轉角。與現行方法相比,在測試影像集中,文字相似性評估的結果平均提升14%以上,影像的結構相似性指標也平均提升182%以上。 (3) 整合PEDM深度學習模型與CCF演算法並建立使用者介面應用程式,讓教學者透過點擊按鈕輕易地完成書頁偵測與攤平。教學者不需額外添購硬體設備,只需使用既有的遠距教學設備搭配Windows系統即可順利運行。本應用程式僅用CPU運算一張1080p解析度的桌面影像,從取像到攤平所需時間小於2秒,具有快速、低成本與容易使用的特色,符合一般個人遠距教學應用的需求。 (4) 新應用多尺度結構相似性(MS-SSIM)指標評估書頁攤平系統的果效,攤平測試影像不再受限於純文字內容,且MS-SSIM的評估結果符合人眼的實際感受,是更客觀的評估方法。

並列摘要


In the online teaching environment, the instructors teaching with paper books often use a “remote teaching video system,” which typically consists of two cameras and a microphone connected to a computer host and used with an online teaching platform, to capture the desktop images. However, the content of the book-pages in the captured images cannot be read directly due to the curved shape of the opened books and non-book objects on the desktop. To solve this problem, “the remote teaching video system” requires the functions for book-pages detecting and flattening, allowing users to obtain high-quality and readable images in a tolerable time (within 2 seconds). The existing popular approach to flatten curved book-pages is to set up an upright document scanner integrated into the system for detecting and flattening in about 4 seconds, which needs additional cost up to NT$23900. Now, an alternative method is developed by detecting the curvature features of the book-pages within the image and flattening the book-pages by these features. The advantage of the latter approach is that it can achieve the same performance without additional hardware costs. However, the average computational time of this approach is more than 4 seconds. The limitation of these two methods is that (1) the book needs to be placed on a plain black background, and (2) the shooting angle of the cameras is fixed above the target book-pages. To sum up, these two methods are not useful and economical for the instructors to set up in the online teaching environment. Therefore, to optimize the system, a more flexible, more efficient and low-cost method for detecting and flattening book-pages is required. To solve the problem, this thesis offers better solutions. It focuses on the research problems (P), corresponding challenges (C), and proposed solutions (M) as follows: P1) Problem of detecting book-pages of a single book on an impure desktop: How to rapidly detect book-pages boundary features of a single book on an impure desktop with various backgrounds and non-book objects? C1) Current methods use “basic image object contour detection technique” to detect book-pages, which is easily influenced by the color of backgrounds and non-book objects, resulting in a poor detection performance. M1) Newly design and construct a Page Edge Detection Model (PEDM): Train a deep learning model based on multi-scale networks and convolutional neural networks to detect the boundary features of book-pages. This method effectively avoids the interferences from the non-book objects and background colors. When the input image has complete and unobstructed book-page boundaries, PEDM can rapidly detect boundary features of book-pages within approximately 0.5 seconds. P2) Design problem for a more efficient book-pages flattening algorithm: How to design an algorithm that is capable of handling different shooting angles and achieving greater efficiency in flattening book-page images? C2) Current algorithms require cameras to capture the image directly above the book with low-accuracy and inefficient computational speed. Obtaining high-quality, readable book-pages images for instructors within the computational time-budget while dealing with different shooting angles is a challenge. M2) Newly design Cubic Curve Flattening Algorithm(CCF): Design an algorithm which accurately estimates the three-dimensional curvature distortion of book-pages from two-dimensional features. It can effectively flatten a book-page in desktop images within common shooting angles less than 1.5 seconds. P3) Problem of evaluating the quality of flattening results: How to objectively evaluate the quality of book-page flattening results and conform to human visual perception? C3) Current methods use optical character recognition (OCR) techniques for evaluation, which are limited to book pages with purely textual content and lack a direct correlation with human visual perception. Therefore, selecting or designing an evaluation method which is more comprehensive and closely aligned with the real visual perception of human is a new challenge. M3) New application of the Multi-Scale Structural Similarity (MS-SSIM): The MS-SSIM index measures the similarities in terms of brightness, contrast, and structural relationships between two images. This index is not limited to book-pages with purely textual content. Furthermore, the human visual system is used to capture structural relationships among objects within an image, so the index aligns with human visual system, making it more objective. The findings and contributions of this research are as follows: (1) Newly design PEDM, which can effectively detect the book-page boundary features of a single book within 0.5 seconds in the online teaching environment. (2) Newly design more accurate and more efficient CCF algorithm, which can deal with image shooting angles respecting to desktop ranging from 0 to 30 degrees and book-page rotation angles ranging from -40 to +40 degrees. Compared to the current method, the results of textual similarity evaluation improved by an average of over 14%, and the structural similarity index of images improved by an average of over 182% in the test image dataset. (3) Integrate PEDM and CCF, and establish a user interface application. The instructors can effortlessly perform book-page detection and flattening through simple button clicks. Furthermore, the instructors can seamlessly operate the application with existing “remote teaching video system” on a Windows PC without purchasing additional hardware. This application exclusively utilizes a CPU to calculate a 1080p resolution desktop image, and complete the entire process from capture a desktop image to flattening a book-page in less than 2 seconds. It offers the advantages of user-friendliness, cost-effectiveness, high speed, and aligning well with the requirements of the online teaching environment. (4) New application of the MS-SSIM index to evaluate the performance of book-pages flattening system. The flattening test images are not restricted to purely textual content anymore, and the evaluation results from MS-SSIM align with human visual perception, making it more objective.

參考文獻


[AVM23] “ PW513 Live Streamer CAM 94° 廣角4K高畫質網路攝影機” AVerMedia. https://www.avermedia.com/tw/product-detail/PW513 and https://24h.pchome.com.tw/prod/DCAS2V-A900AYDMQ?fq=/S/DCAS28 (accessed Jun, 2023)
[Cod71] E. F. Codd, "Further Normalization of the Data Base Relational Model." Research Report / RJ / IBM / San Jose, California RJ909 (1971)
[Czu23]“CZUR Shine Ultra 秒速攜帶式高拍儀”CZUR. https://24h.pchome.com.tw/prod/DCAEB1-A900AU9HA?gclid=CjwKCAjwu_mSBhAYEiwA5BBmfx2Wf4gs5PDIWaHPuqw7NdU2TYVFhQrGR132bMwcajqRz1-P8rDMYRoCu2oQAvD_BwE (accessed Jun, 2023)
[DLH16] C. Dong, C. C. Loy, K. He and X. Tang, "Image Super-Resolution Using Deep Convolutional Networks." In IEEE Transactions on Pattern Analysis and Machine Intelligence (2016). (pp. 295-307 Vol. 38 No.4). doi: 10.1109/TPAMI.2015.2439281.
[Fuj23]“富士通 ScanSnap SV600非接觸式書本掃描器”FUJITSU. https://24h.pchome.com.tw/prod/DCAE0V-A9006TCYT?gclid=CjwKCAjwu_mSBhAYEiwA5BBmf_39rpdu4Tt-y9qE1QaGwRjlTRHiJsO2KwlkfVlnA5CYwxrBZKPraBoCQpYQAvD_BwE (accessed Jun, 2023)

延伸閱讀