在背景複雜的環境下進行精確的影像辨識是機器視覺領域面臨的一大挑戰。本研究致力於開發一種應用卷積神經網絡架構YOLOv7和U2NET的影像辨識系統,以提高在背景相似的環境中目標辨識的準確性。YOLOv7的實時物件偵測能力與U2NET在去除背景方面的優勢結合,能夠識別並正確標記出背景中的目標物。 本研究首先分析了蘭花需要換盆時機,並且分析現有園藝產業中的即時辨識模型,發現背景複雜度對現有影像辨識模型的影響,因此透過資料集的不同,讓YOLOv7訓練出了四種模型。資料集總共分為去背、無去背,而模型分為去背模型、無去背模型、不同標籤混和模型以及同標籤混模型。在整理資料集中透過U2NET模型的去除背景的能力,以加速資料去背處理的效率,藉此取代YOLOv7的遮罩(Mask)功能。進一步,優化了資料的處理,透過YOLOv7模型,在不犧牲辨識速度的情況下,提高對於複雜背景中物體的辨識率。 在實驗階段,使用文心蘭資料集的複雜背景影像進行了廣泛的測試。結果表明,四種模型中,同標籤混和模型的實際辨識成果比其他優秀,在辨識準確性上有顯著提升,混和訓練模型對於一般模型提升了10%的辨識精準度,藉此可以解決背景複雜的問題。透過詳細的實驗分析,展示了模型在各種挑戰性背景下的性能,證明了它在應對真實世界複雜視覺情況中的有效性。
Accurate image recognition in complex environments is a significant challenge in the field of machine vision. This study is dedicated to developing an image recognition system utilizing the convolutional neural network architectures YOLOv7 and U2NET to enhance the accuracy of target recognition in environments with similar backgrounds. The real-time object detection capabilities of YOLOv7, combined with the background removal strengths of U2NET, enable the system to identify and correctly label objects against complex backgrounds. The research initially analyzed the timing for repotting orchids and reviewed the real-time recognition models in the existing horticulture industry. It found that background complexity significantly affects current image recognition models. Thus, by varying the dataset, four models were trained using YOLOv7. The datasets were divided into categories with and without background removal, and the models included a background removal model, a no-background removal model, a mixed label model, and a same-label mix model. Using U2NET's capability to remove backgrounds sped up the efficiency of background processing, thereby replacing the Mask function of YOLOv7. Furthermore, data processing was optimized by the YOLOv7 model, improving the recognition rate of objects in complex backgrounds without sacrificing recognition speed. In the experimental phase, extensive testing was conducted using complex background images from the Oncidium dataset. The results indicate that the same-label mixed model outperformed the other four models in terms of recognition accuracy, showing a significant improvement. The mixed training model enhanced the recognition precision by 10% compared to the general models, thereby resolving issues related to complex backgrounds. Through detailed experimental analysis, the performance of the models under various challenging backgrounds was demonstrated, proving their effectiveness in handling complex visual scenarios in the real world.