基於自動圖像標註之圖像檢索工具發展與應用研究__國立政治大學博碩士論文全文影像系統

上傳須知

帳號：guest(52.15.210.12) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中):	張志泓
作者(英):	Chang, Chih-Hung
論文名稱(中):	基於自動圖像標註之圖像檢索工具發展與應用研究
論文名稱(英):	Development and Application of an Image Retrieval Tool Based on Automatic Image Annotation
指導教授(中):	陳志銘
指導教授(英):	Chen, Chih-Ming
口試委員:	林巧敏張道行洪振洲
口試委員(外文):	Lin, Chiao-Min Chang, Tao-Hsing Hung, Jen-Jou
學位類別:	碩士
校院名稱:	國立政治大學
系所名稱:	圖書資訊與檔案學研究所
出版年:	2019
畢業學年度:	107
語文別:	中文
論文頁數:	150
中文關鍵詞:	使用者研究、數位人文、數位圖像、圖像辨識、深度學習、自動圖像標註、實例分割、Mask R-CNN、人機互動、詞頻統計、行為分析
英文關鍵詞:	user research、digital humanities、digital image、image recognition、deep learning、automatic image annotation、instance segmentation、Mask R-CNN、human-computer interaction、word frequency statistics、behavior analysis
Doi Url:	http://doi.org/10.6814/NCCU201900502
相關次數:	推薦:0 點閱:80 評分: 下載:0 收藏:0

「數位圖像」在資訊蓬勃發展的現代，已經成為支持數位人文研究的重要資料型態之一，而其發展亦為數位時代的人文研究開闢新的挑戰與發展機會。而過去許多研究指出數位圖像的顯示不該是一長串清單或縮略圖，應該存在能透過視覺立刻吸收資訊的物件訊息，會讓閱讀者具有更好的組織圖像能力。因此，能大量辨識數位圖像中存在物件並加以分析的需求愈發重要。而「圖像標註」在此扮演了不可或缺的地位，透過決定適當詞彙來描述數位圖像，以降低人類使用者對於圖像的解釋以及圖像低級特徵之間的語意落差。其中隨著科技發展而衍伸出的「自動圖像標註」則在圖像標註的基礎上，做到降低人工標註的成本與具備高效率及低主觀性等優點，進而促成本研究探索「自動圖像標註」技術輔助數位人文學者進行個體之圖像詮釋的使用差異與感受，嘗試以人文學者角度出發去瞭解使用者如何，以及為何使用圖像，進一步發展出得以有效輔助人文學者進行圖像情境解讀之數位人文工具。

因此，本研究發展出「基於自動圖像標註之圖像檢索工具(Image Retrieval Tool Based on Automatic Image Annotation, IRT-AIA)」。該系統的核心技術採用圖像辨識領域中實現實例分割任務的演算法－Mask R-CNN，主要目的為圖像中的實體物件識別，除了能具體辨識圖像中各自獨立的實體物件所屬類別與所在位置以外，更進一步描繪出各實體物件之輪廓，藉此快速萃取數位圖像中的實體物件訊息，並作為圖像集合的替代資訊呈現，讓使用者得以快速吸收並有效組織圖像內容。最後輔以友善且有助於增進人文學者與系統互動之介面，讓人文學者得以在個體詮釋的角度下進行圖像標註以快速取得數位圖像之後設資料內容，進而促進人文學者更有效率地解讀圖像情境。

為驗證本研究發展之IRT-AIA是否有助於人文學者進行圖像解讀，本研究採用準實驗研究法之對抗平衡設計，將使用者分為兩組，根據不同的系統使用順序來依次操作IRT-AIA與一般圖像檢索工具(General Image Retrieval Tool, GIRT)來完成不同階段之任務單。並透過行為歷程記錄技術來完整記錄使用者的系統操作行為、科技接受度問卷來反映使用者的實際感受，以及半結構式深度訪談來瞭解使用者的想法與建議，透過多種方法進行交互驗證，以瞭解本研究發展之IRT-AIA與GIRT在自動圖像標註之準確度、解讀圖像情境之檢索圖像正確率、解讀圖像情境之成效、科技接受度上的差異。

研究結果發現：第一，IRT-AIA的自動圖像標註準確度已足以有效輔助使用者解讀圖像情境；第二，使用IRT-AIA能獲得更佳的圖像檢索精確率，以及良好的召回率；第三，使用GIRT與IRT-AIA在解讀圖像情境之成效上並未達顯著差異，從分析中顯示社群標籤與人工智慧標籤各有其擅長用途，因此兩者並重的系統才能滿足使用者的不同檢索需求；第四，使用GIRT與IRT-AIA在科技接受度上未達顯著差異，但是均有高於中間值的良好科技接受度；第五，使用者在使用社群標籤與人工智慧標籤輔以瀏覽與檢索的過程中，更為偏好採用人工智慧標籤，並更容易獲得使用者想要進一步瀏覽的目標圖像。

“Digital image”, in the information development era, has become an important data pattern supporting research on digital humanities. The development also creates new challenge and development opportunities for humanities studies in the digital time. Past research indicated that the display of digital images should not be a long list or a thumbnail, but should exist in the object message which could immediately absorb information visually and allow readers presenting better image organization ability. For this reason, it becomes more important to largely recognize and analyze objects existing in digital images. “Image annotation” plays an inevitably role; digital images are described by determining proper vocabulary to reduce the semantic gap between human users’ image explanation and image low-level features. Along with the development of technology, “automatic image annotation”, based on image annotation, could reduce the cost for manual annotation and present advantages of high efficiency and low subjectivity. It facilitates the research on exploring humanists’ use difference and perception of individual image interpretation with the assistance of “automatic image annotation”. This study attempts to understand how and why users use images from the aspect of humanists and further develop an effective digital humanities tool assisting humanists in image situation interpretation.

“Image retrieval tool based on automatic image annotation (IRT-AIA)” is therefore developed in this study. The core technology of the system is to apply Mask R-CNN, the algorithm to implement instance segmentation tasks in image recognition, to recognize physical objects in images. In addition to specifically recognize the categories and locations of independent physical object in images, it would further draw the profile of various physical objects to rapidly extract the physical object message in digital images. Such message is presented as the alternative information of image set, allowing users rapidly absorbing and effectively organizing image contents. Finally, a friendly interface to enhance the interaction between humanists and the system allows humanists preceding image annotation under individual interpretation to rapidly acquire the meta-data content of digital images and further facilitate more efficient image situation interpretation of humanists.

To verify IRT-AIA developed in this study being able to assist humanists in image interpretation, counterbalanced design in quasi-experimental research is applied in this study. The users are divided into two groups to complete tasks at different stages by operating IRT-AIA and general image retrieval tool (GIRT), according to different system use sequence. Behavior process recording is also utilized for completely recording users’ system operation behaviors, technology acceptance model questionnaire is applied to reflect users’ actual perception, and semi-structured in-depth interview is used for understanding user’s ideas and suggestions. With the mutual verification through various methods, the differences in automatic image annotation accuracy, image retrieval accuracy in image situation interpretation, image situation interpretation effectiveness, and technology acceptance between IRT-AIA and GIRT developed in this study are understood.
The research results are summarized as followings. First, the automatic image annotation accuracy of IRT-AIA could effectively assist users in interpreting image situation. Second, the use of IRT-AIA could acquire better image retrieval precision rate and good recall rate. Third, the image situation interpretation effectiveness between the use of GIRT and IRT-AIA does not achieve significant differences. The analyses reveal that community tag and artificial intelligence tag present distinct purposes that a system laying equal stress on both could satisfy users’ needs for different retrieval. Fourth, the use of GIRT and IRT-AIA does not reach remarkable differences in technology acceptance, but presents good technology acceptance above the median. Fifth, users, in the process of using community tag and artificial intelligence tag for browse and retrieval, prefer artificial intelligence tag, which allows users more easily acquiring the target image.

目次 i
表目次 iii
圖目次 v
式目次 vi
第一章緒論 1
　第一節研究背景與動機 1
　第二節研究目的 7
　第三節研究問題 8
　第四節研究範圍與限制 9
　第五節名詞解釋 10
第二章文獻探討 11
　第一節數位圖像於數位人文應用 11
　第二節數位圖像工具發展現況 14
　第三節自動圖像標註 17
第三章系統設計 25
　第一節系統設計理念 25
　第二節系統架構 28
　第三節系統使用者介面 31
　第四節系統開發環境 41
第四章研究設計與實施 43
　第一節研究架構 43
　第二節研究方法 47
　第三節研究對象 49
　第四節研究工具 50
　第五節實驗設計 52
　第六節資料處理與分析 55
　第七節研究實施步驟 59
第五章實驗結果分析 61
　第一節研究對象基本資料 62
　第二節發展之IRT-AIA自動圖像標註功能之準確度分析 64
　第三節使用GIRT與IRT-AIA使用者之圖像檢索正確率分析 74
　第四節使用者使用GIRT與IRT-AIA解讀圖像情境成效之差異分析 78
　第五節使用者使用GIRT與IRT-AIA之科技接受度差異分析 80
　第六節使用者使用GIRT與IRT-AIA之行為紀錄分析 82
　第七節半結構式訪談質性資料分析 92
　第八節綜合討論 118
第六章結論與建議 127
　第一節結論 127
　第二節 IRT-AIA之系統改進建議 131
　第三節未來研究方向 134
附錄 136
　附錄一受試者參與研究同意書 136
　附錄二實驗實施階段一任務單－「政大四季之美」 137
　附錄三實驗實施階段二任務單－「包種茶節的各國文化差異」　138
　附錄四訪談大綱 139
　附錄五 IRT-AIA科技接受度量表 140
　附錄六 GIRT科技接受度量表 142
　附錄七兩系統比較科技接受度量表 144
參考文獻 146

中文文獻
政治大學（2007）。茶言觀政：政大校園影像記憶網。檢自 http://memory.lib.nccu.edu.tw/?m=c1104&doc_serial=1
陳勇汀（2017）。行為順序檢定：滯後序列分析 / Behavior Analysis: Lag Sequential Analysis。檢自https://pulipulichen.github.io/HTML-Lag-Sequential-Analysis/
項潔、涂豐恩（2011）。導論—什麼是數位人文。從保存到創造: 開啟數位人文研究（頁9-28）

英文文獻
Abdulla, W. (2017). Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. [Python, GitHub repository]. Retrieved from https://github.com/matterport/Mask_RCNN (Original work published 2017)
Agosti, M., Ferro, N., Orio, N., & Ponchia, C. (2014). CULTURA Outcomes for Improving the User’s Engagement with Cultural Heritage Collections. Procedia Computer Science, 38, 34-39. doi:10.1016/j.procs.2014.10.007
Bates, M. J. (2007). What is browsing—really? A model drawing from behavioural science research. Retrieved from http://www.informationr.net/ir/12-4/paper330.html
Beaudoin, J. E. (2014). A framework of image use among archaeologists, architects, art historians and artists. Journal of Documentation, 70(1), 119-147. doi:10.1108/JD-12-2012-0157
Beaudoin, J. E., & Brady, J. E. (2011). Finding Visual Information: A Study of Image Resources Used by Archaeologists, Architects, Art Historians, and Artists. Art Documentation: Journal of the Art Libraries Society of North America, 30(2), 24-36. doi:10.1086/adx.30.2.41244062
Belkin, N. J. (2008). Some(What) Grand Challenges for Information Retrieval. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), Advances in Information Retrieval (pp. 1-1). Springer Berlin Heidelberg.
Bhagat, P. K., & Choudhary, P. (2018). Image annotation: Then and now. Image and Vision Computing, 80, 1-23. doi:10.1016/j.imavis.2018.09.017
Busa, R. (1980). The Annals of Humanities Computing: The Index Thomisticus. Computers and the Humanities, 14(2), 83-90.
Chen, C.-M., & Tsay, M.-Y. (2017). Applications of collaborative annotation system in digital curation, crowdsourcing, and digital humanities. The Electronic Library, 35(6), 1122-1140. doi:10.1108/EL-08-2016-0172
Chen, J., Wang, D., Xie, I., & Lu, Q. (2018). Image annotation tactics: transitions, strategies and efficiency. Information Processing & Management, 54(6), 985-1001. doi:10.1016/j.ipm.2018.06.009
Cheng, Q., Zhang, Q., Fu, P., Tu, C., & Li, S. (2018). A survey and analysis on automatic image annotation. Pattern Recognition, 79, 242-259. doi:10.1016/j.patcog.2018.02.017
Chew, B., Rode, J. A., & Sellen, A. (2010). Understanding the Everyday Use of Images on the Web. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (pp. 102–111). New York, NY, USA: ACM. doi:10.1145/1868914.1868930
Dutta, A., Gupta, A., & Zisserman, A. (2016). {VGG} Image Annotator ({VIA}). HTML, CSS and Javascript, Visual Geometry Group. Retrieved from http://www.robots.ox.ac.uk/~vgg/software/via/
Eklund, P., Lindh, M., Maceviciute, E., & Wilson, T. D. (2006). EURIDICE Project: The Evaluation of Image Database Use in Online Learning. Education for Information, 24(4), 177-192.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2), 303-338. doi:10.1007/s11263-009-0275-4
Friedrichs, K., Münster, S., Kröber, C., & Bruschke, J. (2018). Creating Suitable Tools for Art and Architectural Research with Historic Media Repositories. In S. Münster, K. Friedrichs, F. Niebling, & A. Seidel-Grzesińska (Eds.), Digital Research and Education in Architectural Heritage (pp. 117-138). Springer International Publishing.
Girshick, R. (2015). Fast R-CNN. ArXiv:1504.08083 [Cs]. Retrieved from http://arxiv.org/abs/1504.08083
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv:1311.2524 [Cs]. Retrieved from http://arxiv.org/abs/1311.2524
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. ArXiv:1703.06870 [Cs]. Retrieved from http://arxiv.org/abs/1703.06870 (arXiv: 1703.06870)
Hockey, S. M. (2004). The History of Humanities Computing. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A Companion to Digital Humanities (pp. 3-19). Oxford: Blackwell Publishing. Retrieved from http://discovery.ucl.ac.uk/12274/
Hwang, G.-J., Yang, L.-H., & Wang, S.-Y. (2013). A concept map-embedded educational computer game for improving students’ learning performance in natural science courses. Computers & Education, 69, 121-130. doi:10.1016/j.compedu.2013.07.008
Im, D.-H., & Park, G.-D. (2015). Linked tag: image annotation using semantic relationships between image tags. Multimedia Tools and Applications, 74(7), 2273-2287. doi:10.1007/s11042-014-1855-z
Ivasic-Kos, M., Ipsic, I., & Ribaric, S. (2015). A knowledge-based multi-layered image annotation system. Expert Systems with Applications, 42(24), 9539-9553. doi:10.1016/j.eswa.2015.07.068
Jin, C., & Jin, S.-W. (2016). Image distance metric learning based on neighborhood sets for automatic image annotation. Journal of Visual Communication and Image Representation, 34, 167-175. doi:10.1016/j.jvcir.2015.10.017
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., … Dollár, P. (2014). Microsoft COCO: Common Objects in Context. ArXiv:1405.0312 [Cs]. Retrieved from http://arxiv.org/abs/1405.0312 (arXiv: 1405.0312)
Llamas, J., Lerones, P. M., Zalama, E., & Gómez-García-Bermejo, J. (2016). Applying Deep Learning Techniques to Cultural Heritage Images Within the INCEPTION Project. In M. Ioannides, E. Fink, A. Moropoulou, M. Hagedorn-Saupe, A. Fresa, G. Liestøl, … P. Grussenmeyer (Eds.), Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection (pp. 25-32). Springer International Publishing.
Lorang, E., Soh, L.-K., Datla, M. V., & Kulwicki, S. (2015). Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections. D-Lib Magazine, 21(7/8). doi:10.1045/july2015-lorang
Maihami, V., & Yaghmaee, F. (2017). A review on the application of structured sparse representation at image annotation. Artificial Intelligence Review, 48(3), 331-348. doi:10.1007/s10462-016-9502-x
McCay-Peet, L., & Toms, E. (2009). Image use within the work task model: Images as information and illustration. Journal of the American Society for Information Science and Technology, 60(12), 2416-2429. doi:10.1002/asi.21202
Münster, S., Kamposiori, C., Friedrichs, K., & Kröber, C. (2018). Image libraries and their scholarly use in the field of art and architectural history. International Journal on Digital Libraries, 19(4), 367-383. doi:10.1007/s00799-018-0250-1
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. ArXiv:1506.01497 [Cs]. Retrieved from http://arxiv.org/abs/1506.01497 (arXiv: 1506.01497)
Schonfeld, R., & Long, M. (2015). Supporting the Changing Research Practices of Art Historians. New York: Ithaka S+R. doi:10.18665/sr.22833
Schreibman, Susan. (2012). Digital Humanities: Centres and Peripheries. Historical Social Research-Historische Sozialforschung, 37(3), 46-58.
Terras, M. (2012). Image Processing and Digital Humanities. In M. Terras, J. Nyhan, & C. Warwick (Eds.), Digital Humanities in Practice (pp. 71-90). Facet. Retrieved from http://discovery.ucl.ac.uk/1327983/
Tikka, P. (2006). Image Retrieval: Theory and Research. Leonardo, 39(3), 268-269. doi:10.1162/leon.2006.39.3.268a
Wang, J. Z., Grieb, K., Zhang, Y., Chen, C., Chen, Y., & Li, J. (2006). Machine annotation and retrieval for digital imagery of historical materials. International Journal on Digital Libraries, 6(1), 18-29. doi:10.1007/s00799-005-0121-4
Warwick, C. (2012). Studying users in digital humanities. In C. Warwick, M. Terras, & J. Nyhan (Eds.), Digital Humanities in Practice (1st ed., pp. 1-22). Facet. doi:10.29085/9781856049054.002
Whitelaw, M. (2015). Generous Interfaces for Digital Cultural Collections. Digital Humanities Quarterly, 009(1).
Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346-362. doi:10.1016/j.patcog.2011.05.013

(此全文20240719後開放瀏覽)
電子全文

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文