透過您的圖書館登入
IP:3.144.240.141
  • 學位論文

差分隱私之關聯規則挖掘機制

DPARM: Differential Privacy Association Rules Mining

指導教授 : 郭斯彥

摘要


在當代社會中,數據量的迅速膨脹推動了數據分析技術的發展,這使得決策自動化成為可能。關聯分析是數據分析中的一項重要任務,其目標在於從事務數據集中找出所有的共現關係,即頻繁項集或可信關聯規則。一條關聯規則由前件和後件兩部分組成,其含義是如果前件發生那麼後件也很可能發生。可信關聯規則即那些可能性較高的關聯規則,它可以幫助人們更好地發現規律,制定相應的策略。 數據分析的過程可以高度概括為一組查詢,每個查詢都是一個關於數據集的實值函數。然而毫無限制和保護地接入數據集以獲取查詢結果,可能導致個人隱私的洩露。因此,隱私保護下的數據分析技術受到了越來越多的關注,人們迫切希望可以找到一種強大的、在數學上嚴格的,且符合人們社會認知的隱私定義。差分隱私就是這樣一種隱私定義,它通過隱私水平這一參數來管理和量化個人在數據分析中的所面臨的隱私風險。一般而言,差分隱私可以通過對查詢結果添加精巧設計的噪聲來實現。 在本文中,我們重點關注滿足差分隱私的多支持度閾值的關聯規則挖掘,並解決了現有技術中存在的挑戰。我們提出並實現了DPARM算法,它利用多支持度閾值,在反映項目真實屬性的同時減少候選項集的數量,並且通過隨機截斷和均勻分割來降低數據集的維度。這兩者有助於降低查詢的敏感度,進而減小所需的噪聲規模,提升挖掘結果的效用。我們還通過自適應配置隱私水平來穩定噪聲規模,併限制了總體的隱私損失。此外,我們證明了DPARM 算法滿足事後差分隱私,並通過一系列實驗驗證了DPARM 算法的實用性。

並列摘要


In contemporary society, the rapid expansion of data volume has driven the development of data analysis techniques, which makes decision automation possible. Association analysis is an important task in data analysis. The goal is to find all co-occurrence relationships from the transactional dataset, i.e. frequent itemsets or confident association rules. An association rule consists of two parts, the antecedent and the consequent, which means that if the antecedent occurs then the consequent is also possible to happen. Confident association rules are those association rules with larger possibility, which can help people better discover patterns and develop corresponding strategies. The process of data analysis can be highly summarized as a set of queries, where each query is a real-valued function of the dataset. However, without any restriction and protection, accessing the dataset to answer the queries may lead to the disclosure of individual privacy. Therefore, techniques for privacy-preserving data analysis has received increasing attention. People are eager to find a strong, mathematically rigorous, and socio-cognitive-conform definition of privacy. Differential privacy is such a privacy definition that manages and quantifies the privacy risks faced by individuals in data analysis through the parameter called the privacy level. In general, differential privacy can be achieved by adding delicate noise to the query results. In this thesis, we focus on differential privacy association rules mining with multiple support thresholds, and solve the challenges existing in the state-of-art works. We propose and implement the DPARM algorithm, which uses multiple support thresholds to reduce the number of candidate itemsets while reflecting the real nature of the items, and uses random truncation and uniform partition to lower the dimensionality of the dataset. Both of these are helpful to reduce the sensitivity of the queries, thereby reducing the scale of the required noise and improving the utility of the mining results. We also stabilize the noise scale by adaptively allocating the privacy levels, and bound the overall privacy loss. In addition, we prove that the DPARM algorithm satisfies ex post differential privacy, and verify the utility of the DPARM algorithm through a series of experiments.

參考文獻


[1]Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. In Acm sigmod record, volume 22, pages 207–216. ACM, 1993.
[2]Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487– 499, 1994.
[3]Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. Blender: enabling local search with a hybrid differential privacy model. In Proc. of the 26th USENIX Security Symposium, pages 747–764, 2017.
[4]Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and Abhradeep Thakurta. Discov- ering frequent patterns in sensitive data. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 503–512. ACM, 2010.
[5]Ferenc Bodon. A fast apriori implementation. In FIMI, volume 3, page 63, 2003.

延伸閱讀