探索以關鍵字為基礎的高階語意圖片及影片搜尋

由於視訊擷取裝置的廣泛使用與簡單易用的圖片及影片分享服務，例如：Flickr, MySpace, YouTube等等，使得網路上或個人儲存裝置裡所存放的多媒體資料有著爆炸性的成長。如何有效率的索引及檢索這些大量多媒體資料仍然是一個待研究的議題。傳統的基於內容為主或基於關鍵字為主的圖像檢索，由於語意隔閡（semantic gap）的問題，仍然無法很有效的解決使用者的資訊需求．為了響應大規模消費者照片上高階語意搜尋的強烈需求，且這些照片通常不具有可靠的標籤，我們探討一種新的多媒體搜尋方式：概念搜尋（concept search）的可行性與議題；這個方法可以讓我們透過概念偵測器，使用關鍵字去尋找視覺物件。雖然概念搜尋對於基於以內容為主的圖像檢索有幫助，不過還是存在一些重要的議題需要被處理。我們探討了這個問題不同的面向：(1)在一個大量的概念語彙中，有效的查詢到概念間的轉換和概念選擇的方法；(2)在關鍵字為主概念搜尋中，我們全面性研究概念搜尋的根本因素對效能的影響，包含概念選擇的方法、概念語彙的大小、偵測器的精確度等等；(3)調查將既有的概念偵測器應用在不同領域（消費者照片）的圖片搜尋之可行性和搜尋品質；(4)融和以文字為主的搜尋結果和概念搜尋結果的搜尋品質；(5)在大規模多媒體資料庫上，高效率的線上索引技巧的需求而非傳統的離線索引；(6)考察利用語意和視覺資訊來滿足使用者需求的可行性。在兩個大規模資料庫的效能評量基準：TRECVID（新聞影片）和Flickr550（消費者照片）上的實驗，我們確認了在概念搜尋中，以Google為基礎的語意擴展相較於傳統的WordNet為基礎的語意擴展更為精確和更具效率；透過概念搜尋中各種根本因素對效能的影響的調查，我們得到概念搜尋比文字搜尋有著更好效能的結論；我們也透過實驗，驗證將既有的概念偵測器應用在不同領域的潛力；而因為使用者提供的標籤通常不精確或有歧義，所以我們也可以利用概念搜尋改善傳統文字搜尋的效能；利用語意和視覺資訊來滿足使用者需求而非單純只使用語意資訊也增加了搜尋的效率；最後，我們所提出的高效率線上索引技巧：FRANK-TAAT，不僅僅可以降低離線索引的負擔，也可以同時解決很少著墨的低召回率（low recall）的問題。

關鍵字

基於內容的圖像檢索；高階語意概念搜尋；大規模資料庫的多媒體索引和檢索

並列摘要

There appears explosive growth of photos and videos due to the proliferation of capture devices and numerous easy-to-use image and video sharing services such as Flickr, MySpace, YouTube, etc. How to effectively index and retrieve these large databases still remains an open problem. Traditional content-based and keyword-based multimedia retrieval methods often fail to meet users’ expectation due to the semantic gap. In response to the strong demands of semantic search over large-scale consumer photos, which generally lack reliable user-provided annotations, we investigate the feasibility and challenges entailed by the new multimedia search paradigm – “concept search,” namely, retrieving visual objects by large-scale automatic concept detectors with keywords. Though concept search is promising, several important issues must be addressed. We investigate the problem in several folds: (1) the effective query-to-concept mapping and concept selection methods over large-scale concept ontology; (2) a comprehensive performance study of the fundamental factors of keyword-based concept search, including concept selection strategy, lexicon size, detector accuracy, to name a few; (3) the quality and feasibility of the pre-trained concept detectors applying to cross-domain consumer-generated data (i.e., Flickr photos); (4) the search quality by fusing automatic concepts and user-generated data (tags); (5) the demand of efficient (query-time) indexing techniques over large-scale multimedia instead of off-line indexing (where query information is ignored); (6) the study of leveraging both semantic meaning and visual co-occurrence to fulfill user’s information needs. Experimenting over two large-scale benchmarks, TRECVID (broadcast news videos) and Flickr550 (consumer photos), we have confirmed the effectiveness of concept search via the semantic mapping by Google-based semantic expansion methods and demonstrated its superiority over conventional WordNet-like methods both in effectiveness and efficiency. Most of the parameterized factors in concept search are investigated – leading to the conclusion that concept search is indeed more effective than text-based search. We further illustrate the potential of pre-trained concept detectors applying on cross-domain consumer photos. We point that the user-contributed tags are somehow inaccurate or ambiguous and can be improved by semantic concepts in the applications of keyword-based search. Auxiliary visual information mining from large-scale image database is utilized to improve effectiveness of user queries. Most of all, the proposed novel query-time indexing method, FRANK-TAAT, not only reduces indexing overhead for large-scale database but also solves commonly observed “low recall” problem, which is seldom addressed in the prior work.

並列關鍵字

Content-based Image Retrieval ； Semantic Concept Search ； Large-scale Multimedia Indexing and Retrieval

參考文獻

[1] A. Haubold et al, “Semantic multimedia retrieval using lexical query expansion and model-based reranking.” Proc. ICME, 2006.

[2] L. Kennedy, S.-F. Chang, “A reranking approach for context-based concept fusion in video indexing and retrieval.” Proc. CIVR, 2007.

[3] S.-Y. Neo et al, “Video retrieval using high level features: exploiting query matching and conﬁdence-based weighting.” Proc. CIVR, 2006.

[4] A. Natsev et al, “Semantic concept-based query expansion and re-ranking for multimedia retrieval.” Proc. ACM Multimedia, 2007.

[5] M. Sahami, T.D. Heilman, “A web-based kernel function for measuring the similarity of short text snippets.” Proc. WWW, 2006.

國際替代計量

探索以關鍵字為基礎的高階語意圖片及影片搜尋

全文下載

主題瀏覽