數位影像中文字資訊擷取(Text Information Extraction)是近年來已成為一項重要的應用,一文字資訊擷取系統具有對影像內容自動加上註解/註釋的功能,可提供影像內容索引的機制。一個完整的文字資訊擷取系統包含了“文字偵測”、“文字定位”、“文字追踪”、“影像強化”、以及“文字識別”。然而,由於多樣化及複雜化的影像文件,文字在影像文件中可能有不同字型大小、形狀、方向等變化,若再加上顏色變化,則更是提高了文字資訊擷取的困難度。本研究試著發展出一個從動態影像中自動擷取文字的方法,透過壓縮格式中常用的離散餘弦轉換(DCT)直接在頻率域上處理文字區塊擷取工作,計算每一DCT的8x8區塊上的水平能量,再加上文字區塊在時間軸上的特性,計算並過濾掉大部份非文字區塊,並保留下來絕大部份的文字區塊,搭配型態影像學的方法,包含浸蝕(Erosion)、增長(Dilation)、開(Opening)、閉(Closing)等運算,找出正確的文字區域。 接下來,我們利用型態影像學的影像重建從找到的文字區域中將文字擷取出來,利用Lin et al.的方法對擷取出來的文字做二值化,得到一比較完整的文字圖形,以利後續的處理與應用,比如做文字識別、存成文字檔案、對影像內容加上註解/註釋、做影像搜尋等。
Recently, Text Information Extraction (TIE) is one of the most important applications. We can not only automatically add annotation to the image but also provide an image indexing mechanism with text information. A complete text information extraction system is composed of detection, localization, tracking, enhancement and recognition. However, because of the complexity and variations in image styles, the text may vary on font size, shape, and orientation. Moreover, with variations in color, text extraction becomes more challenging. In our method, the DCT coefficients and temporal information of a sequence of video images are used to evaluate horizontal energy, with which most of the non-text blocks can be filtered out. Some morphological operators such as erosion, dilation, opening and closing are performed to further remove the non-text blocks with all text blocks reserved. The detected text blocks are further enhanced to extract characters for recognition. The recognized characters are then saved as text files for later use, such as video indexing.