聚類分析應用於改善PCA影像編碼的視覺品質

在影像編碼過程中，利用主成份分析(PCA)來達到影像壓縮的目的，這樣的壓縮方法稱之為PCA的影像編碼技術。首先將原始影像切成若干區塊形成資料集，並將其映射至某一投影空間(或稱之為特徵空間)，而空間中的投影向量仍保留大部分的原始資訊。儘管如此，這些影像區塊具有各式各樣的特徵，例如：平滑區域、邊緣、紋理等等，使得PCA影像編碼的困難度大為增加。文中提出分群的概念來克服這個問題，也就是說，先將資料集加以適當的分群，透過PCA將原始資料投影在主要的特徵向量上成為較低維度的主成份來實現移除每一群的儲存冗餘，將這樣的降維度技術應用在影像編碼上以便達成影像壓縮的目的。然後對每一群個別地實現PCA重建，為了決定每群的主成分個數，我們使用預先給定經驗值或基因演算法來求得個數的最佳解。分群的問題在很多資料處理過程中自然而然地就會產生，其主要的概念是將所收集到的資料集分割成沒有交集（互斥）的數個群，且每群中的所有成員的聯集相當於原始資料集；而分群的目的是要將整個資料集所定義的損失函數最小化，用聚類分析的概念將資料集分成數群，同一群中的個體同質性高，不同群的同質性低。在本論文中，使用不同的分群演算法並比較其影像重建的視覺效果及品質。首先，介紹分群理論中最常見的K-means以及減法分群演算法。使用此兩種演算法先將資料集分群，再個別實現PCA重建。但過程中，因資料集屬於高維度向量，使得求解共變異矩陣之特徵方程式的問題變得相當複雜，我們嘗試廣義的Hebbian演算法(GHA)來解決這個問題，。GHA屬於神經網路架構的一種，經由訓練所得的權重值即為主成份向量(或特徵向量)，然後利用非自適性的方式決定重建所需的主成份個數，即預先給定某一經驗值，最後比較不同群數所重建的影像視覺效果及品質。再則，考慮影像特徵的分群，例如：平滑區域、垂直與水平邊緣、左斜右斜邊緣、紋理等四群。特徵的擷取以及將多變量的資料投影至較低維度的子空間，這些相關的理論就資訊處理而言是相當基本且重要的，本文中PCA被使用來擷取資料的特徵也就是主成份。無論如何，將所有資料點分成指定的群數並達到最小的編碼誤差，若考慮所有組合數是不可能的，諸如此類的最小化問題是一種NP-hard的問題。為了解決這個問題，我們提出一個迭代(iterative)的分群機制：根據目前的分群進行重新組合並更新，每一次的迭代必須滿足某一分群準則或目標函數。我們預期使用較少的主成份個數來重建一些簡單的結構；相對地，對於較複雜的結構則使用較多的個數，前者可有效地移除紀錄變數的冗餘，並將節省下來的儲存量用來重建較複雜的結構，所以不但提升還原影像的品質並改善某些區域的視覺效果，例如：邊界特徵等等。迭代分群方法對原始資料分群並個別地實現PCA影像重建，過程中利用自適性的方式決定重建所需的主成份個數，例如：基因演算法(GA)求得最佳解。重建的結果將與傳統的PCA、K-means分群的PCA實驗進行比較，過程中事先定義紀錄變數的個數量測方式是必須的。最後，文中亦提出另一種分群方法稱之為多重分群(repartition)演算法。主要是將原始資料集重新再分群的聚類分析過程以數學的方式或演算法來表示，並在GA的架構之下執行。GA的架構分成三階段：基因演算法的實現、多重分群演算法、PCA影像編碼。就提昇影像的重建品質及改善視覺化效果兩者而言，它扮演著一個監督者的角色：在PCA方面，這個方法尋找最佳主成分的線性組合來達到最小的重建誤差；在聚類分析方面，資料中的每個成員被指定到指定的群，使得群內之間變異較小，而群與群之間變異較大，將使得演算過程變得相對簡單且有效率。我們結合以上兩機制以演算法的方式呈現出來並與上述所有分群方法加以實驗比較。

關鍵字

主成分分析；影像編碼；聚類分析；廣義的Hebbian演算法； K-means演算法；基因演算法

並列摘要

Image coding using Principal Component Analysis (PCA), a type of image compression technique, projects image blocks to a subspace that can preserve most of the original information. However, the blocks in the image exhibit various inhomogeneous properties, such as smooth region, texture, and edge, which give rise to difficulties in PCA image coding. This thesis proposes some clustering methods as follows to partition the data into groups, such that individuals of the same group are homogeneous, and vice versa. The PCA method is applied separately for each group. Firstly, we apply PCA for image compression. In the PCA computation, we adopt the neural network architecture in which the synaptic weights, served as the principal components, are trained through generalized Hebbian algorithm (GHA). The number of principal components are determined by a pre-specified value such as the non-adaptive procedure. Moreover, we partition the training set into clusters using K-means method in order to obtain better retrieved image qualities. In addition, we replace K-means method with the subtractive clustering method to implement the above procedure. Secondly, in consideration of image features, we partition full image blocks into four clusters including smooth regions, vertical and horizontal edges, diagonal and subdiagonal edges. Because of the homogeneity, principal component analysis is used to reduce the redundancy of storages inside each cluster through the projection of data based on the principal components. Genetic algorithm is employed to determine the optimal number of components that preserve most of the information of the original data. Basing on this mechanism, we develop an iterative clustering method. The proposed method effectively removes the redundancy and increases the number of principal components in a number of clusters to improve the reconstructed effect of certain clusters with complex structures. It is necessary to define a measurement for the number of recorded variables. Consequently, the retrieved image has high quality and good visual effect than Traditional PCA, the K-means clustering. Finally, we propose another repartition clustering method to partition the data into groups, such that individuals of the same group are homogeneous, and vice versa. The PCA method is applied separately for each group. In the clustering method, the genetic algorithm acts as a framework consisting of three phases, including GA operation, the proposed repartition clustering, and PCA image coding. Based on this mechanism, the proposed method can effectively increase image quality and provide an enhanced visual effect.

並列關鍵字

Principal Component Analysis ； Image Coding ； Cluster Analysis ； Generalized Hebbian Algorithm ； K-means Algorithm ； Genetic Algorithm

參考文獻

[31] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison Wesley, Reading, MA, 1974.

[1] S. Bandyopadhyay and S. Saha, “Fuzzy Symmetry Based Real-Coded Genetic Clustering Technique for Automatic Pixel Classification in Remote Sensing Imagery,” Fundam. Inform., vol. 84, pp. 471-492, 2008.

[2] S. Bandyopadhyay and S. Saha, “A Point Symmetry Based Clustering Technique for Automatic Evolution of Clusters,” IEEE Trans. on Knowledge and Data Engineering, vol. 20, pp. 1-17, 2008.

[3] S. Bandyopadhyay and S. Saha, “GAPS: A clustering method using a new point symmetry based distance measure,” Pattern Recognit., vol. 40, pp. 3430-3451, 2007.

[4] M. Banerjee and N. R. Pal, “Feature Selection with SVD Entropy: some modification and extension,” Inform. Sci., pp. 118-134, 2014.

國際替代計量

聚類分析應用於改善PCA影像編碼的視覺品質

全文下載

主題瀏覽