透過您的圖書館登入
IP:216.73.216.225
  • 學位論文

基於深度學習的多任務超解析成像技術研究 - 以監測系統應用為例

Research on Deep Learning-Based Multi-Task Super-Resolution Imaging Techniques: Taking Surveillance-System Applications as Examples

指導教授 : 繆紹綱
本文將於2025/09/11開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年來,隨著人工智慧的迅速發展,許多超解析度的深度學習方法被提出,並被證實能有效提升影像的解析度。然而,在處理多類別影像的情況下,大部分基於深度學習的超解析度方法僅針對有限的影像類別進行訓練。專注於特定類別的訓練可能會導致其他類別的性能下降,因此如何開發一個能適用於多個領域和影像類別的超解析度網路成為一個重要的課題。 本研究提出了一個基於多任務學習架構的超解析度成像技術,該架構可以同時處理多種超解析度問題。為了驗證這個架構的可行性,我們選擇了衛星影像、空拍影像和路側影像這三個場景進行研究。這些影像在當今社會中具有廣泛的應用,例如衛星影像可用於國土變異的判斷,空拍影像可用於特定物件的監測,路側影像可用於路況監測等。本研究分為兩個階段:第一階是段評估現有方法並檢視其問題,第二階段是提出新的模型架構並探討其效能。 在第一階段中,我們應用了傳統的Bicubic方法和深度學習方法(SRCNN、FSRCNN、SRResNet和EDSR)在三個監測系統上進行超解析度處理。結果顯示這些方法都能改善影像的解析度,其中EDSR效果最為顯著。我們使用了三種不同類型影像的資料庫來訓練超解析度網路模型,發現單一資料庫訓練的模型在單一超解析度任務中表現較好。然而,在處理其他任務時,影像解析度改善的幅度較低,需要使用多個網路模型來處理。使用混合資料庫訓練的模型效能無法超越單一影像資料庫訓練的模型。最後,使用常見的超解析度資料庫DIV2K訓練的模型在三種類型的影像上的效能都不如特定使用三種類型影像訓練出的模型。 在第二階段中,我們提出了一個新的超解析度模型架構,包括融合式特徵提取器、超解析度組件和分類器。本研究構建的創新架構旨在透過整合多類型影像的特徵資訊,全面提升超解析度影像重建的性能表現。這個融合式的特徵提取策略的核心目標在於使超解析度模型在訓練階段能夠學習並結合更為豐富與多樣的特徵信息。此外,我們在模型設計上導入了注意力機制,使模型能夠針對不同類型的影像,很聰明地賦予對應類別的特定生成資訊。最後,為了進一步提升模型的生成效能,我們在整體超解析度架構中整合了一個分類器,從而讓模型在訓練和生成過程中能夠利用類別相關的資訊來更加精確地輔助影像重建。該超解析度模型架構在衛星影像、空拍影像及路側影像的測試中展現出優越性,並在PSNR和SSIM指標上有顯著改善。此外,該架構對於不同類型的影像都具有明顯的效果改善。在MOS評分比較中,該架構也優於低解析度影像,且與EDSR相比在各種情況下都有提升。物件偵測結果顯示,提高影像解析度能夠改善物件偵測的準確性和性能。 本研究提出一新架構,透過多任務學習與融合不同影像特徵,旨在同時處理多種超解析度問題。該架構還融入注意力機制與分類器,以更全面地捕捉及重建影像資訊。研究結果顯示,新架構在多種場景如衛星、空拍和路側影像中均達到卓越表現,並在多項評估指標上優於其他方法,確實提升了影像品質和物件偵測的準確性。未來可運用於遙感和環境監測、城市規劃和交通管理。這種多任務學習架構也可能適用於醫療影像,如X光、MRI和CT掃描,以獲得更清晰的結果。未來研究可以探索模型的實時運行能力,以及如何最有效地將這種技術應用於邊緣運算或嵌入式系統中。

並列摘要


In recent years, with the rapid development of artificial intelligence, many deep learning methods for super-resolution have been proposed and proven to enhance image resolution effectively. However, most deep learning-based super-resolution methods are trained on limited image categories when dealing with multi-class images. Focusing on specific categories may decrease performance for other categories, making it a significant challenge to develop a super-resolution network that can be applied to multiple domains and image categories. This study proposes a multi-task learning-based super-resolution imaging technique that addresses various super-resolution problems simultaneously. We considered three scenarios to validate this framework's feasibility: satellite imagery, aerial imagery, and roadside imagery. These images have extensive applications today, such as using satellite imagery for land variation assessment, aerial imagery for specific object surveillance systems, and roadside imagery for traffic surveillance systems. The study is divided into two stages: the first stage evaluates existing methods and identifies their problems, while the second stage proposes a new model architecture and explores its performance. In the first stage, we applied traditional bicubic and deep learning methods (SRCNN, FSRCNN, SRResNet, and EDSR) to perform super-resolution processing on three surveillance systems. The results showed that these methods improved image resolution, with EDSR demonstrating the most significant improvement. We used databases of three different types of images to train super-resolution network models and found that models trained on a single database performed better in a single super-resolution task. However, when dealing with other tasks, the improvement in image resolution is relatively low, and multiple network models are required for better processing. Models trained on a mixed database did not outperform those trained on a single image database. Finally, models trained on the common super-resolution database DIV2K perform worse on all three types of images than models trained specifically on all three types of images. In the second stage, we proposed a new super-resolution model architecture, including a fusion-based feature extractor, a super-resolution component, and a classifier. The innovative framework constructed in this study aims to comprehensively improve the performance of super-resolution image reconstruction by integrating feature information from multiple types of images. The core objective of this hybrid feature extraction strategy is to enable the super-resolution model to learn and combine richer and more diverse feature information during the training phase. Additionally, we have incorporated attention mechanisms into the model architecture design, allowing the model architecture to intelligently assign specific generative information to different types of images. Finally, to further enhance the model's generative capabilities, we have integrated a classifier into the overall super-resolution architecture, enabling the model to utilize category-related information for more precise assistance in image reconstruction during both training and generation. This architecture demonstrated superior satellite imagery and aerial imagery testing, significantly improving PSNR and SSIM indicators. Additionally, this architecture showed noticeable improvement in different types of images. In the MOS scoring comparison, this architecture not only outperforms low-resolution images but also shows improvement compared to EDSR in various scenarios. Object detection results showed that enhancing image resolution could improve accuracy and performance in object detection. This study proposes a novel framework that leverages multi-task learning and integrates different image features to simultaneously address multiple super-resolution problems. The architecture also incorporates attention mechanisms and a classifier to more comprehensively capture and reconstruct image information. Research results indicate that the new framework performs excellently across various scenarios, such as satellite, aerial, and roadside images, and outperforms other methods on multiple evaluation metrics, significantly enhancing image quality and object detection accuracy. Future applications could extend to remote sensing and environmental monitoring, urban planning, and traffic management. This multi-task learning framework may also be applicable to medical imaging, such as X-rays, MRIs, and CT scans, to achieve clearer results. Future research could explore the real-time operating capabilities of the model, as well as how to most effectively apply this technology in edge computing or embedded systems.

參考文獻


[1] Y. H. Kuo, A Study on Super Resolution, Ph.D. Dissertation, Institute of Computer and Communication Engineering, National Cheng Kung University, 2014.
[2] J. S. Isaac and R. Kulkarni, “Super Resolution Techniques for Medical Image Processing,” Int. Conf. on Technologies for Sustainable Development, pp. 1-6, 2015.
[3] Y. J. Chen, Improving the Detection Performance of Collapse Objects in Satellite Images with Super Resolution Imaging Technology, Master Thesis, Department of Electronic Engineering, Chung Yuan Christian University, 2021.
[4] K. Urazoe, N. Kuroki, Y. Kato, S. Ohtani, T. Hirose, and M. Numa, “Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning,” IEICE Trans. on Information and Systems, Vol.104, No.1, pp. 183–193, 2021.
[5] Chunghwa Telecom, Internet of Vehicles - Using 5G Low Latency to Provide Real-Time Traffic Information. https://www.cht.com.tw/home/campaign/5genterprise/CHT-Connected-Vehicles.html

延伸閱讀