透過您的圖書館登入
IP:3.22.61.73
  • 學位論文

多方網路與加密域資料探勘技術在搜尋與推薦之應用

Multipartite Networks and Encrypted-Domain Data Mining for Search and Recommendation

指導教授 : 吳家麟

摘要


「社群媒體 (Social media)」憑藉著自由開放、共同創新與互動分享的特性在當前網際網路活動中蔚為主流,人與人之間藉由作業平台,在網際網路中透過「資訊交流」與「協同合作」創造出-維基百科(Wikipedia)、部落格(Blog)、臉書(Facebook)以及Flickr等新興網路應用科技。有別於傳統網路媒體,社群媒體係憑藉著參與者的互動:如提供資訊、建立索引或對特定事物進行評分,以創造出應用平台的價值。在參與者的互動下,影響力一旦越過臨界點,就會出現病毒感染般的普遍流傳,社群媒體的發展突顯了「資訊技術」-Information Technology (IT)一詞中:「資訊」Information的重要,因為若僅著重技術-Technology,而忽略了資訊-Information,「資訊技術」Information Technology (IT) 就僅空有軀殼,不能創造資訊產業存在的價值。 分析發現,社群媒體的核心價值係由三大元素構成,其一是集思智慧 (Collective Intelligence),其二是雲端運算(Cloud Computing) ,其三是人機介面 (User Interface),這些特有的元素給我們帶來了機會,但同時亦帶來了新的挑戰。首先談到機會,由於社群 (Social Community)中人與人間的互動,推薦與分享,創造出了特有的集思智慧 (Collective Intelligence),如果我們能夠對這種集思智慧進行有效分析運用,我們就有能力運用一個寶貴的資產,開發出不同於與傳統模式的資料探勘技術,因此吾人先對社群媒體的特性進行了深入的分析,從而掌握了社群媒體中所蘊含的人與人、人與物、物與物等等的多面向資訊。再運用社群媒體所擁有的多方網路 (Multipartite Network) 關係,將社群媒體內的集思智慧做出了最有效的探勘與應用。在多媒體檢索上我們開發了「Building Multi-Modal Relational Graphs for Multimedia Retrieval」 的應用系統,而在推薦系統亦提出了 「Relational Term-Suggestion Graphs Incorporating Multi-Partite Concept and Expertise Networks」的創新方法,對檢索與推薦系統的發展與人機介面的運用均產生具體的貢獻。 接著敘述我們的挑戰,在研究中我們發現,由於社群媒體中參與者間公開互動的特性:參與者在提供資訊、建立索引或進行評分之餘,往往會不經意的洩漏了個人的背景與好惡:最重要的是越來越多的機敏社群資料被置放於具有安全疑慮的雲端伺服器(Cloud Server)上,因此如何在保護個人隱私與集思智慧的前提下,遂行資訊檢索與推薦,遂成為了另一個我們責無旁貸的研究挑戰。 面對此一挑戰,經過深入研析,我們以密碼學為學理基礎,率先提出「個人加密、自主保護」與「一密到底(End-to-end)」的加密域(Encrypted Domain)資料探勘做法,提出個人在加密自己的智慧結晶後,送入資料庫後,仍能在保持加密的形態下,進行精確而有效率的加密域資料探勘,從而提供高度隱私保障的資料檢索與推薦,除此之外基於完善維護個人隱私原則,我們更率先提出運用個人專屬加解、密密鑰的加密域同態運算演算法,使加密域資料探勘的發展更為實用與安全。 在本研究中,我們透過多方網路的分析與應用,得以對社群媒體的寶藏-集思智慧做出最有效的探勘,配合完備的人機介面提供了更現代化的多媒體檢索與推薦。在另一方面,面對於社群媒體隱私權的挑戰,我們亦以創新的高效率加密域資料探勘思維,再以先進的不同加解密鑰對的同態加密運算,使社群網路中每一位智慧貢獻者的隱私都能獲得最佳的保障,企盼每一個人都能樂於悠遊於網路社群媒體之間,安全無慮的分享與貢獻自己所擁有的寶貴智慧與經驗。

並列摘要


Social media currently dominates the activity of the World-Wide Web due to its open nature, collaborative structure, and support for interactive sharing. It has made possible information sharing and collaboration with such social platforms such as Wikipedia, blogs, Facebook, and Flickr. Social media’s primary strength is clearly in its user interaction, which is leveraged to create value for any platform that uses social media as it provides information, performs indexing, and provides scores for various items. Thus the reputation of a product or content can spread from a few people to a huge crowd, and can even “go viral” and end up on international headlines. The advance of social media has strongly underlined the importance of “Information” in Information Technology (IT): for IT without “Information” is nothing but an empty shell and can create no value. According to our analysis, there are three major core values in social media: collective intelligence, cloud computing, and user interface. The rapid development of social media has brought not only new opportunities but also new challenges. Opportunities abound in social media because of the interaction, recommendations, and sharing among people in social communities. This has led to the emergence of collective intelligence. Effective analysis will help identify relationships from multipartite linkages such as contributor-contributor, contributor-term, and term-term. This, along with contributor expertise, will lead to new, non-traditional data-mining approaches that in turn will lead to new systems equipped with excellent search recall, search precision, and meaningful semantic relatedness. We have developed a series of innovative mathematical models for different applications: for multimedia retrieval, we use multi-modal relational graphs; for recommendation systems, we create an approaches using relational term-suggestion graphs incorporating multi-partite concept and expertise networks. This use of collective intelligence yields important contributions for data retrieval, as well as the development of recommendation systems, in particular their user interface. For social media, because of its heavy focus on interpersonal relationships, privacy is now the biggest challenge. For instance, the privacy of medical records should be carefully protected. Most important of all, now that more and more of this kind of data has been stored on cloud servers, the challenge has become how to protect and secure everyone’s precious knowledge, experiences. Retrieval can be done effectively only if such protection is offered. To meet these challenges, we propose an end-to-end encrypted-domain data mining scheme based on the belief that every individual has the right to protect his or her own privacy. We accomplish this using ring homomorphic encryption, which allows for data mining to be conducted even in the encrypted domain. We are the first team to propose that each user use their own encryption and decryption keys to allow the un-trusted server to conduct mathematical operations on their data in the encrypted domain, thus making encrypted-domain data mining not only more practical but also more secure In this research, we use multi-partite analysis to effectively explore the collective intelligence embedded in social media and then leverage it for innovative approaches for recommendation systems and data retrieval systems. The comprehensive user interface demonstrated using this approach modernizes not only data retrieval for text and multimedia but also personalized recommendations. In addition, we propose a novel, highly-efficient encrypted-domain data mining scheme based on ring homomorphic encryption with user-specific encryption and decryption keys that protects the intellectual property of each user of the social media. It is our sincere hope that this will allow for the sharing of knowledge over the Internet with no nagging worries about privacy.

參考文獻


[AMAZON 2010]AMAZON MECHANICAL TURK https://requester.mturk.com/
[Bamba 2008] Bamba, B., Liu, L. Pesti, P. AND Wang, T. 2008 Supporting anonymous location queries in mobile environments with privacygrid,” In Proceedings of the 17th International World Wide Web Conference (WWW 2008), 2008, 237–246.
[Budanitsky et al. 2006]Budanitsky, A. AND Hirst, G., 2006. Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32, 13-47
[Chow 2009] Chow, C. Y., Mokbel, M. F., AND Aref, G. W. 2009. Casper*: Query processing for location services without compromising privacy. ACM Transactions on Database System. 34, 4 (Dec. 2009), 24:1-24:48.
[Chung and Lee 2001]Chung, Y. M., AND Lee, J. Y. 2001. A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 52(4), 283-296.

延伸閱讀