在這篇論文中,我們提出一個連通元件為基礎的文字擷取法,自動偵測與分割在自然場景中文字區塊,這個技術可以用在電腦視覺、標誌擷取和辨識上。首先,我們使用一個快速的連通元件演算法去產生各個連通元件,接著利用簡單的幾何特徵去過滤大量的連通元件,並對剩下的連通元件進行小波與紋理特徵的擷取,將其擷取的特徵送到Adaboost作為其輸入,然後利用Adaboost演算法所訓練出來的強分類器去進行分類判斷,由於強分類器可以快速地將影像中非文字的連通元件給濾除,因此當連通元件區塊通過此強分類器時,則其連通元件將被視為文字元件且被當作為最後擷取的結果。本研究結合離散小波與紋理特徵,使得在文字定位的擷取正確率達94.65 %以上;此外透過Adaboost演算法的快速收斂,可以減少計算成本的浪費。
In this paper, we present a connected-component (CC)-based text extraction method for automatic detection and segmentation of text from the digital images. This technique can be applied to computer vision, sign extraction and recognition. We use a fast algorithm for labeling connected component to generated CCs. Then, by combining the simple geometry feature to filter majority of the CCs, the wavelet features and texture features are extracted from the remaining CCs and sent to the Adaboost classifier as an input for classification judgment. With this strong classifiers, the CCs can be easily categorized either as texts or non-text characters, thus when CC session pass through the strong classifiers, it can categorize CC as text and treated it as the final extraction result. The present research integrated the wavelet features and texture features, which sucessfully facilitated the precision rates of text extraction up to 94.65 %. Furthermore, the computational cost be efficiently reduced through the speed of convergence of Adaboost algorithm.