透過您的圖書館登入
IP:18.223.196.59
  • 學位論文

以複合式模型學習法探究多社群網路媒體之使用者資訊

Multi-Modal Learning over User-Contributed Content from Cross-Domain Social Media

指導教授 : 徐宏民
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,社群網路的蓬勃發展逐漸地改變了人們的生活,越來越多人習慣在社群網站上分享生活的點滴;據統計,平均每一天有數百萬的多媒體資料,例如照片,被上傳至社群網站,而這樣龐大數目的多媒體資料裡極可能蘊藏著豐富的資訊;在這篇論文裡,我們期望能從大量的多媒體資料裡萃取出有用的資訊,並藉此解決人們日常生活裡常見的一些問題。 從一張照片出發,我們期望建立一套資訊系統,可以找出這張照片是在哪裡拍攝的,其他人可能會對這照片給予怎樣的文字標籤來描述這拍攝的內容,又這張照片的拍攝地點附近是否有熱門的活動或事件;這樣的資訊,在日常生活裡可以有許多應用,例如在旅遊的時候,我們常會想知道自己在哪裡、眼前的事物是什麼,又或者在這附近平常會有什麼樣的活動或事件;這時只要隨手地對眼前事物拍一張照,再將這照片送入此資訊系統,系統就可以快速地提供我們這些資訊。 過去找出照片拍攝地點的常見作法是利用照片的視覺特徵與地理標籤,透過與現有的資料比對,以推測出照片最可能的拍攝地點;但當照片是在室內被拍攝或是拍攝地點有多個建築物時,可能會降低這類方法的準確度;為此,此論文進一步地加入了打卡資料來輔助定位,同時提出一個有效的影像重新分群技術,來提升推測的照片拍攝地點的準確度。而對於找出照片就拍攝的內容其一般會被加註什麼樣的文字標籤,常見的技巧是利用已有文字標籤的現有照片集來預測那沒有標籤的照片其可能的文字標籤;一種有效作法是根據照片與照片之間的視覺相似度來建構圖,這圖是以照片作為節點,相似度作為連接節點與節點之間的邊的權重,之後考慮到邊的權重,將標籤自節點傳遞分享至沒有標籤的照片的節點,這種作法即使是當沒有標籤的照片數目大於有標籤的照片數目仍經常有效;而考慮到過去的方法大多是一次只傳遞單一個標籤,對於多個標籤的情況下則需要進行多次,並且現有照片的數目一直以來持續地在增長,方法的效率逐漸成為一個重要的議題;有鑑於此,此論文提出了一個基於分散式運算原理的多標籤傳遞法,來有效率地同時傳遞多個標籤。對於找出熱門活動,過往的方法多專注於觀察單一的多媒體資料集的資料變化,例如字詞的談論頻率變化,異常的變化則表示可能有活動或事件發生;而此論文考慮到不同資料集其性質都有所不同,彼此之間可能可以互補不足的地方,來提出了一個有效的兩階段式架構,能有效地結合資料流類型的資料集及打卡類型的資料集,以找出熱門的活動或事件的發生時間、發生地點及其內容資訊;由於已能找出照片可能的拍攝地點,則可以透過比較可能的拍攝地點和熱門活動或事件的發生地點來找出照片拍攝地點附近的熱門活動或事件。 此論文所提的方法都已透過實驗分析來顯示這些方法的有效性。最後,此論文提出一些未來可能的研究方向。

並列摘要


Social media have changed the world and our lives. Every day, millions of media data are uploaded to social-sharing websites. The goal of the research is to discover and summarize large amounts of media data from the emerging social media into information of interests. Our basic idea is to perform multi-modal learning for given data, leveraging user-contributed data from cross-domain social media. Specifically, given a photo, we intend to discover geographical information, people's description or comments, and events of interest, closely related to the photo. These information then can be used for various purposes, such as being a real-time guide for the tourists to improve the quality of tourism. As a result, this dissertation studies modern challenges of image location identification, image annotation, and event discovery, followed by presenting promising ways to conquer the challenges. For image location identification, most previous works directly integrated visual features and geo-tags of the given photos. The performance of the existing approaches, however, could be limited if the given photos were taken indoors, and/or their image contents contain a number of buildings in a close proximity. As a solution, this dissertation unifies visual features, geo-tags, and check-in data, and further presents an image cluster refinement approach, for image location identification. For image annotation, label propagation is widely used to annotate photos based on similarity graphs of photos, where most previous works focused on single-label propagation. Although performing multi-label propagation is expected to be more efficient for annotation than performing single-label propagation several times, performing multi-label propagation may increase the computational complexities. Further, sizes of image datasets continue to increase and thus increase the problem complexity. As a solution, this dissertation presents a scalable multi-label propagation leveraging the power of distributed computing. For event discovery, most previous works investigated a specific media stream. Potentially, mining multiple media streams is capable of achieving better performance than mining a media stream alone, but could be more challenging. As a solution, this dissertation presents a two-stage framework that combines a flow-based media dataset and check-in-based media dataset for events-of-interest discovery. Experimental results on real media datasets show the effectiveness of all of the proposed approaches. Finally, this dissertation provides some possible directions for future studies.

參考文獻


[28] Y.-H. Kuo, W.-Y. Lee, W. H. Hsu, and W.-H. Cheng. Augmenting mobile city-view image retrieval with context-rich user-contributed photos. In Proceedings of ACM International Conference on Multimedia, pages 687-690, November-December 2011.
[46] H. Tong, J. He, M. Li, C. Zhang, and W.-Y. Ma. Graph based multi-modality learning. In Proceedings of ACM International Conference on Multimedia, pages 862-871, November 2005.
[45] H. Tong, J. He, M. Li, W.-Y. Ma, H.-J. Zhang, and C. Zhang. Manifold-ranking-based keyword propagation for image retrieval. EURASIP Journal on Advances in Signal Processing, 2006(1):1-10, January 2006.
[9] A. Z. Broder. On the resemblance and containment of documents. In Proceedings of IEEE Compression and Complexity of Sequences, pages 21-29, June 1997.
[17] L. Feng, J. Wu, S. Liu, and H. Zhang. Global correlation descriptor: a novel image representation for image retrieval. Elsevier Journal of Visual Communication and Image Representation, 33(1):104-114, November 2015.

延伸閱讀