k-匿名是保護被公開之個人資料免於用來找出個人真實身分的一種模式。要滿足k-匿名的要求,則被公開的資料集中,每一筆資料必須和資料集中至少k-1筆資料,在一組與隱私相關的屬性上,具有相同的值。雖然將原始資料集加以轉換處理來滿足k-匿名的要求並不難,但是處理後的資料集應儘可能降低轉換處理所造成的資訊損失,以確保處理後的資料集仍具有分析研究的可用性。本研究提出一個有效的演算法進行k-匿名,而此方法和近期提出的不同群集方法做比較。接著又提出一個混合式的方法改善先前方法的缺點,根據實驗結果顯示,此演算法可進一步降低資料集的資訊損失,並且有更快速的執行時間。最後本文提出一個以基因演算法為基礎的方式再加以改善資訊損失的程度。
K-anonymity is a model to protect public released microdata from individual identification. It requires that each record must be identical to at least k-1 other records in the anonymized dataset with respect to a set of privacy-related attributes. Although it is easy to anonymize the original dataset to satisfy the requirement of k-anonymity, it is important to ensure that the anonymized dataset should preserve as much information as possible of the original dataset. To minimize the information loss due to anonymization, it is crucial to group similar data together and then anonymize each group individually. This work compares the performance of two recently proposed clustering-based techniques for k-anonymization, and proposes a hybrid of both techniques to achieve less information loss than each of the original techniques. Experimental results show that the proposed hybrid technique reduces not only the total information loss but also the variance of information loss among groups. And we proposed a genetic algorithm for k-anoymazation. Our experiments show that this algorithm reduces the information loss of the anonymized dataset.