透過您的圖書館登入
IP:3.129.19.251
  • 學位論文

結合真實人眼視覺系統的影像處理方法之研究

A Study on Image Processing Method Combined with Real Human Visual System

指導教授 : 易志孝
共同指導教授 : 洪國銘(Kuo-Ming Hung)

摘要


在影像處理技術中,人眼視覺系統模型(Human visual system model, HVS model)是當技術面對生物學和心理學這類複雜問題的簡化方法。HVS model會隨著我們對真實視覺系統的了解程度而不斷改善。然而,有些問題會因為我們對真實視覺系統理解度不足,進一步地忽略重要的特徵。在重要特徵被忽略的狀況,造成現有技術在一些真實視覺能夠輕易分辨的問題變得困難且複雜。為了能進一步理解真實視覺系統,本文根據完形心理學(Gestalt Psychology)學者 K. Koffka 所提出的視覺認知理論,對有的戶外火焰警報系統與商標影像檢索系統提出改善方法。 在現有戶外火焰警報系統中,使用不受地形範圍限制的影像火焰偵測系統作為系統基礎。然而,火焰偵測系統在偵測與定位火焰後,仍需要有負責的消防員從旁協助觀察是否為真實火災。會造成這種原因便是在人類的真實視覺系統中,我們觀察到火以及其他有用的資訊。火與其他有用的資訊再加上人類的經驗,便足以準確預測偵測到的火焰是否惡化。與火相關的人類經驗,便是人們耳熟能詳的火三角。在火三角中,氧氣、燃料與熱源這三種元素便是構成火焰燃燒的重要元素。而在火災相關研究中,火三角系統在時間與空間上放大後,對應的三種元素也轉變成氣候、植被與火源。除了氣候是普通攝影機無法捕捉的特徵,植被與火源便是能經常獲得的特徵。然而,現有的火焰偵測系統卻無法將植被特徵列入考慮,主要原因為植被這項特徵的干擾因素遠遠超過火源。本文根據深度學習中,卷積神經網路(Convolutional Neural Network, CNN)擁有的其中一種特性。該特性為神經網路對環境變化有一定的抵抗力。透過該特性提出一個與人類視覺相似的火災警報系統。提出系統對分類的影像資料進行轉移學習與系統測試。測試結果顯示,提出方法在false Negative 為0的情況下,將過去系統的誤報率(False Positive Rate, FPR)從40.47%降低至4.15%。這項結果證實提出方法的再分類確實能使現有的系統獲得與人類真實視覺系統較相似的火警判斷效果。 在現有商標影像檢索系統中,商標檢索的數學模型能有效且快速找出相似的商標。然而,在各國法庭上,許多實際商標判決往往無法使用這些系統作為評判標準。主要原因是數學模型雖然能找出兩個商標的特徵相似,卻無法說明使用的特徵與人類真實視覺系統上的關聯性。這樣的結果使得法官必須以自己的主觀認知作為判斷基準。本文根據 K. Koffka 對視覺認知理論做出七個面向的解釋,以及商標設計學的原理,提出對應的七個特徵。同時,我們提出新的數學模型實作這七個特徵,並用資料庫進行檢索測試。在實驗結果顯示,本文提出七個特徵的系統,除了與現存商標影像檢索系統一樣能正確判斷相似性外,還能夠說明商標之間哪個特徵相近。透過特徵的距離,進一步說明真實視覺系統為何認為兩個商標相同或相異。 本文根據HVS model,在不同領域提出兩套系統。在實驗測試中,證實提出HVS model系統能有效連結影像處理、視覺心理學與商標設計學之間的關聯。

並列摘要


In image processing technology, the human visual system model (HVS model) is a simplified method when technology faces such complex problems as biology and psychology. The HVS model will continue to improve as we understand the real vision system. However, there are some problems because of our insufficient understanding of the real visual system, which further neglects important features. In the situation where important features are ignored, the problems that can be easily distinguished in real vision in the prior art become difficult and complicated. In order to further understand the real visual system, this dissertation proposes basic improvement methods for some outdoor fire alarm systems and trademark image retrieval systems based on the human eye recognition pattern proposed by Gestalt psychologist K. Koffka. In the existing outdoor fire alarm systems, the image fire detection system, which is not restricted by the terrain range, is used. However, after the fire detection system detects and locates the fire, a responsible firefighter still needs to assist in observing whether it is a real. The reason for this is that in the real human visual system, we observe not only fire but also some other useful information. Fire and other useful information human experience are enough to accurately predict whether the detected fire will deteriorate. Fire triangle, coming from human experience, includes three ingredients, namely: oxygen, heat, and fuel, that are required for a fire to burn, the three elements of oxygen, fuel and heat are the important elements that make up the fire. In fire-related work, after the fire triangle system is enlarged in time and space, the corresponding three elements are also transformed into climate, vegetation and ignition. Removal of climate is a feature that cannot be captured by ordinary cameras, and vegetation and ignition are features that can often be obtained. However, existing fire detection systems cannot take vegetation features into consideration. The main reason is that the disturbing factors of the vegetation feature are very large. This dissertation is based on one of the feature of Convolutional Neural Network (CNN) in deep learning. This feature is that the neural network has a certain resistance to environmental changes. This dissertation proposes a fire alarm system similar to human vision. The proposed system performs transfer learning and system testing on the image data reclassified by CNN using the proposed method. The test results show that the proposed method reduces the false positive rate (FPR) of the past system from 40.47% to 4.15% when false negative (FN) is 0. This result confirms that the reclassification of the proposed method can indeed enable the existing system to obtain fire alarm accuracy similar to that of the real human visual system. In the existing trademark image retrieval system, the mathematical model of trademark retrieval can effectively and quickly find similar trademarks. However, in the courts of various countries, many actual trademark judgments often fail to use these systems as the criteria for judging whether there is confusion between trademarks. The main reason is that although the mathematical model can find the features of the two trademarks is similar, it cannot explain the correlation between the used features and the real human visual system. Such a result makes the judge must use his own subjective cognition as the criterion of judgment, and make the judgment deviate from justice. Based on K. Koffka's seven-oriented explanation of the recognition patterns of human eyes and the principles of trademark design, this dissertation proposes seven corresponding features. At the same time, we propose a new mathematical model to implement these seven features, and use the database to test. The experimental results show that the seven-feature system proposed in this dissertation can not only accurately judge the similarity as the existing trademark image retrieval system, but also explain which features of the trademarks are similar. Through the distance of the features, it further explains why the real visual system considers the two trademarks to be the same or different. According to the HVS model, this dissertation proposes two systems in different fields. In the experimental tests, this dissertation proves that the proposed HVS model system can effectively assist humans, and link image processing, visual psychology, and trademark design to achieve technologies that enhance human well-being.

參考文獻


[1] Gonzalez, R. C. and Wood, R. E., Digital image processing, 3rd ed., Pearson Education, 2009.
[2] Koffka, K., Principles Of Gestalt Psychology, Mimesis International, 2014.
[3] Krizhevsky A., Sutskever, B., Hinton, G. E., "ImageNet Classification with Deep Convolutional Neural Networks," in Neural Information Processing Systems (NIPS), 2012.
[4] Nair, V. Hinton, G.E., "Rectified Linear Units Improve Restricted Boltzmann Machines," in International Conference on Machine Learning, Haifa, Israel, 2010.
[5] Tüske, Z., Tahir, M.A., Schlüter, R., Ney, H., "Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables," in IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015.

延伸閱讀