基於自動圖像標註之圖像檢索工具發展與應用研究

「數位圖像」在資訊蓬勃發展的現代，已經成為支持數位人文研究的重要資料型態之一，而其發展亦為數位時代的人文研究開闢新的挑戰與發展機會。而過去許多研究指出數位圖像的顯示不該是一長串清單或縮略圖，應該存在能透過視覺立刻吸收資訊的物件訊息，會讓閱讀者具有更好的組織圖像能力。因此，能大量辨識數位圖像中存在物件並加以分析的需求愈發重要。而「圖像標註」在此扮演了不可或缺的地位，透過決定適當詞彙來描述數位圖像，以降低人類使用者對於圖像的解釋以及圖像低級特徵之間的語意落差。其中隨著科技發展而衍伸出的「自動圖像標註」則在圖像標註的基礎上，做到降低人工標註的成本與具備高效率及低主觀性等優點，進而促成本研究探索「自動圖像標註」技術輔助數位人文學者進行個體之圖像詮釋的使用差異與感受，嘗試以人文學者角度出發去瞭解使用者如何，以及為何使用圖像，進一步發展出得以有效輔助人文學者進行圖像情境解讀之數位人文工具。因此，本研究發展出「基於自動圖像標註之圖像檢索工具(Image Retrieval Tool Based on Automatic Image Annotation, IRT-AIA)」。該系統的核心技術採用圖像辨識領域中實現實例分割任務的演算法－Mask R-CNN，主要目的為圖像中的實體物件識別，除了能具體辨識圖像中各自獨立的實體物件所屬類別與所在位置以外，更進一步描繪出各實體物件之輪廓，藉此快速萃取數位圖像中的實體物件訊息，並作為圖像集合的替代資訊呈現，讓使用者得以快速吸收並有效組織圖像內容。最後輔以友善且有助於增進人文學者與系統互動之介面，讓人文學者得以在個體詮釋的角度下進行圖像標註以快速取得數位圖像之後設資料內容，進而促進人文學者更有效率地解讀圖像情境。為驗證本研究發展之IRT-AIA是否有助於人文學者進行圖像解讀，本研究採用準實驗研究法之對抗平衡設計，將使用者分為兩組，根據不同的系統使用順序來依次操作IRT-AIA與一般圖像檢索工具(General Image Retrieval Tool, GIRT)來完成不同階段之任務單。並透過行為歷程記錄技術來完整記錄使用者的系統操作行為、科技接受度問卷來反映使用者的實際感受，以及半結構式深度訪談來瞭解使用者的想法與建議，透過多種方法進行交互驗證，以瞭解本研究發展之IRT-AIA與GIRT在自動圖像標註之準確度、解讀圖像情境之檢索圖像正確率、解讀圖像情境之成效、科技接受度上的差異。研究結果發現：第一，IRT-AIA的自動圖像標註準確度已足以有效輔助使用者解讀圖像情境；第二，使用IRT-AIA能獲得更佳的圖像檢索精確率，以及良好的召回率；第三，使用GIRT與IRT-AIA在解讀圖像情境之成效上並未達顯著差異，從分析中顯示社群標籤與人工智慧標籤各有其擅長用途，因此兩者並重的系統才能滿足使用者的不同檢索需求；第四，使用GIRT與IRT-AIA在科技接受度上未達顯著差異，但是均有高於中間值的良好科技接受度；第五，使用者在使用社群標籤與人工智慧標籤輔以瀏覽與檢索的過程中，更為偏好採用人工智慧標籤，並更容易獲得使用者想要進一步瀏覽的目標圖像。

關鍵字

使用者研究；數位人文；數位圖像；圖像辨識；深度學習；自動圖像標註；實例分割； Mask R-CNN ；人機互動；詞頻統計；行為分析

並列摘要

“Digital image”, in the information development era, has become an important data pattern supporting research on digital humanities. The development also creates new challenge and development opportunities for humanities studies in the digital time. Past research indicated that the display of digital images should not be a long list or a thumbnail, but should exist in the object message which could immediately absorb information visually and allow readers presenting better image organization ability. For this reason, it becomes more important to largely recognize and analyze objects existing in digital images. “Image annotation” plays an inevitably role; digital images are described by determining proper vocabulary to reduce the semantic gap between human users’ image explanation and image low-level features. Along with the development of technology, “automatic image annotation”, based on image annotation, could reduce the cost for manual annotation and present advantages of high efficiency and low subjectivity. It facilitates the research on exploring humanists’ use difference and perception of individual image interpretation with the assistance of “automatic image annotation”. This study attempts to understand how and why users use images from the aspect of humanists and further develop an effective digital humanities tool assisting humanists in image situation interpretation. “Image retrieval tool based on automatic image annotation (IRT-AIA)” is therefore developed in this study. The core technology of the system is to apply Mask R-CNN, the algorithm to implement instance segmentation tasks in image recognition, to recognize physical objects in images. In addition to specifically recognize the categories and locations of independent physical object in images, it would further draw the profile of various physical objects to rapidly extract the physical object message in digital images. Such message is presented as the alternative information of image set, allowing users rapidly absorbing and effectively organizing image contents. Finally, a friendly interface to enhance the interaction between humanists and the system allows humanists preceding image annotation under individual interpretation to rapidly acquire the meta-data content of digital images and further facilitate more efficient image situation interpretation of humanists. To verify IRT-AIA developed in this study being able to assist humanists in image interpretation, counterbalanced design in quasi-experimental research is applied in this study. The users are divided into two groups to complete tasks at different stages by operating IRT-AIA and general image retrieval tool (GIRT), according to different system use sequence. Behavior process recording is also utilized for completely recording users’ system operation behaviors, technology acceptance model questionnaire is applied to reflect users’ actual perception, and semi-structured in-depth interview is used for understanding user’s ideas and suggestions. With the mutual verification through various methods, the differences in automatic image annotation accuracy, image retrieval accuracy in image situation interpretation, image situation interpretation effectiveness, and technology acceptance between IRT-AIA and GIRT developed in this study are understood. The research results are summarized as followings. First, the automatic image annotation accuracy of IRT-AIA could effectively assist users in interpreting image situation. Second, the use of IRT-AIA could acquire better image retrieval precision rate and good recall rate. Third, the image situation interpretation effectiveness between the use of GIRT and IRT-AIA does not achieve significant differences. The analyses reveal that community tag and artificial intelligence tag present distinct purposes that a system laying equal stress on both could satisfy users’ needs for different retrieval. Fourth, the use of GIRT and IRT-AIA does not reach remarkable differences in technology acceptance, but presents good technology acceptance above the median. Fifth, users, in the process of using community tag and artificial intelligence tag for browse and retrieval, prefer artificial intelligence tag, which allows users more easily acquiring the target image.

並列關鍵字

user research ； digital humanities ； digital image ； image recognition ； deep learning ； automatic image annotation ； instance segmentation ； Mask R-CNN ； human-computer interaction ； word frequency statistics ； behavior analysis