基於機器學習於內容感知SVG漫畫壓縮及其新應用

SVG (Scalable Vector Graphics，可縮放向量圖形) 已成為HTML5中描述2D圖形的國際標準格式，也是EPUB電子書漫畫內容的國際標準格式。雖然現今有許多點陣圖轉SVG系統被提出來，但轉換後所產生的檔案較大，而且影像的視覺品質往往遭到破壞。因此，我們之前已提出新的影像處理技術來降低點陣圖轉SVG後的檔案大小，其壓縮率優於前人的方法。然而我們的技術並沒有針對漫畫中的特別內容，如文字、漸層、網點等物件去處理及優化。為了進一步降低SVG漫畫檔案的大小，本論文針對以上特別內容提出處理方法。我們將點陣圖轉換成SVG檔案後進行文字、顏色漸層和紋理的偵測與辨識並將其嵌入於SVG檔案中。在文字部分，利用SWT（Stroke Width Transform，筆畫寬度轉換）和幾何規則濾除非文字元件，並結合HOG（Histogram of Oriented Gradient，方向梯度直方圖）和SVM（Support Vector Machine，支援向量機）進一步降低假陽性。接下來，利用OCR（Optical Character Recognition，光學字元識別）辨識文字區域。為了避免將文字區域向量化，OCR的結果與其坐標值一起嵌入於SVG文件中。在顏色漸層中，我們提出了CGV（Color Gradient Vectorization，顏色梯度向量化）方法來解決這個問題。首先利用時間複雜度為線性時間的CGV演算法來識別每個區域中的顏色和梯度方向。然後，我們將相鄰區域中具有相同的顏色和梯度方向合併成一較大區域，並以漸層語法來表示其SVG路徑。在紋理中，我們的方法使用CSG（Composite Sub-band Gradient，複合式子帶梯度）向量作為紋理描述子，並使用SVM對漫畫中的紋理區域進行分類。然後，結合ACM（Active Contour Model，主動式輪廓模型）提高輪廓區域的分割準確度。實驗結果顯示我們所提出的方法不僅在SVG漫畫檔案大小和視覺觀看品質皆優於其他最先進的SVG向量化系統，更可提供在現代手持設備上顯示較佳的效能。最後我們可以進一步將漫畫翻譯成其他語言，輕鬆提供多語言服務。可以有效率地基於文字或內容的影像搜索。它還可以為數位說故事者提供一個創新的應用系統。

關鍵字

SVG ；向量壓縮；文字識別；機器學習；紋理識別；顏色漸層向量；數位說故事

並列摘要

SVG has become the standard format for 2D graphics in HTML5 and EPUB. Although some image-to-SVG conversion systems had been proposed, the sizes of files they produced are still large. We proposed a new system to convert raster comic images into vector SVG files. The compression ratio is better than the previous methods. However, these methods do not process the contents of the comics. Such as text elements, color gradient, and texture in the image. In this dissertation, we convert comic raster images to SVG files and recognize/embed text elements, color gradient, and texture in the SVG files. In the text, the SWT is applied and geometric filtering is used to filter out non-text elements. We combine HOG with SVM to further reduce false positive. Next, OCR task is applied to real text areas. Instead of encoding the text regions as vectors, the text elements are embedded in the SVG file along with their coordinate values. In color gradient, the proposed CGV (CG vectorization) first applies a linear-time algorithm to identify the CG vector for representing the color and the direction of CG in each region. Then, we merge neighboring regions those have the same CG vector as a large CG region and represent it by a single path of SVG with linear gradient syntax. In texture, our method uses CSG vector as texture descriptor and uses SVM to classify texture area in the comic. Then, the ACM combining with CSG vectors is introduced to improve the segmentation accuracy on contour regions. Experimental results show that our method outperforms other state-of-the-art SVG vectorization systems in terms of not only SVG size but also perceptual quality. It lets vectorized comics have the higher performance to be illustrated on modern e-book devices. Using these text elements, we can further translate comics into other languages to provide multilingual services easily. Text/content-based image search can be supported efficiently. It can also provide a novel application system for digital storytelling.

並列關鍵字

SVG ； Vector compression ； Text recognition ； Machine learning ； Texture recognition ； Color gradient vector ； Digital storytelling

參考文獻

[60] 潘柏沇（2012）。使用顏色漸層向量於SVG漫畫影像壓縮。臺灣大學工程科學及海洋工程學研究所學位論文。

[6] Kei Kawamura, Hiroshi Watanabe, Hideyoshi Tominaga, “Vector Representation of Binary Images Containing Halftone Dots,” IEEE International Conference on Multimedia and Expo, pp. 335-338, 2004.

[7] S. Battiato, G. Gallo, G. Messina, “SVG Rendering of Real Images Using Data Dependent Triangulation,” in: Proc. of ACM/SCCG, pp. 191-198, 2004.

[14] Canedo-Rodriguez, A., Soohyung Kim, Kim, J.H., Blanco-Fernandez, Y., “English to Spanish Translation of Signboard Images from Mobile Phone Camera,” In Proc. of IEEE SoutheastCon, pp. 356-361, 2009.

[15] Tomohiro Nakai, Koichi Kise, Masakazu Iwamura, “Real-Time Retrieval for Images of Documents in Various Languages using a Web Camera,” International Conference on Document Analysis and Recognition, pp. 146-150, 2009.

國際替代計量

基於機器學習於內容感知SVG漫畫壓縮及其新應用

全文下載

主題瀏覽