透過您的圖書館登入
IP:18.218.81.166
  • 學位論文

基於自注意力機制之深度學習模型於未見商品品牌文字辨識與商品主動操作上架

Applying Self-Attention-based Deep Learning to Text Spotting for Reorienting Unseen Object with Active Manipulation

指導教授 : 王學誠
本文將於2025/06/21開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年來,超商業者正面臨著勞動力短缺、人力成本提高、勞工工作時數不能過長,導致實體的店面成本不勘負荷,因此許多便利商店公司紛紛投入資源在無人商店的開發。為了達成商店中自動上架與結帳系統,系統必須辨識商品的語意標籤,並且透過此資訊得知商品的姿態,才能夾取商品並將商品以整齊、品牌文字朝外的樣式擺放上架,或是讓輸送帶的商品以品牌文字面朝上的姿態掃描結帳。欲設計出自動化無人商店系統,本論文主要有以下三點挑戰:1)網路模型能辨識多樣商品的語意標籤,2)受限於環境因素的變化導致辨識正確率不高,3)商品的初始姿態不滿足夾取條件。為了解決這些問題,本論文設計一套網路架構與系統,並與過去同樣用於自動化無人商店的NCTU-PnP [1]做比較。論文主要貢獻為:1)透過偵測文字區域與文字辨識模型進行商品品牌文字的語意分割任務,2)改變網路架構來增強學習效率與能力,3)設計吸盤夾爪機器手臂系統來進行翻轉商品與取放商品。

並列摘要


In recent years, convenience store are now facing the problems of labor shortage, increasedlabor costs and excessive working hours, which the physical stores couldn’t handle the cost.Therefore, there are many store have invested resources in the development of automated un-manned store. To achieve the automatic pick-and-place and checkout system in the store, thesystem must recognize the semantic label of the product, and use this information to know theproduct pose. So that we could pick up and place the product on the shelves with tidy and thebrand name facing outward, or have the products on the conveyor scanned with the brand namefacing upward. To design the system of the automated unmanned store, there are three mainchallenges in my thesis: 1)The learning model is able to recognize semantic label for variousproducts. 2)Environmental factors and lighting condition leading to low predicted accuracy.3)Initial pose of the product doesn’t meet the grasping for arms. To solve these problems, wedesign a network architecture and system, and compare with NCTU-PnP [1], which was alsoused in automated unmanned stores in the past. The thesis contributions are: 1)Using the mod-els of text detection and recognition to find out the semantic label of the product. 2)Designnetwork architecture to enhance learning efficiency and capability. 3)Design an arms withsuction gripper for products flipping and pick-and-placing.

參考文獻


[1]Y.-S. Su, S.-H. Lu, P.-S. Ser, W.-T. Hsu, W.-C. Lai, B. Xie, H.-M. Huang, T.-Y. Lee, H.-W.Chen, L.-F. Yuet al., “Pose-aware placement of objects with semantic labels-brandname-based affordance prediction and cooperative dual-arm active manipulation,” in2019IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp.4760–4767.
[2]A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in naturalimages,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition, 2016, pp. 2315–2324.
[3]S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible represen-tation for detecting text of arbitrary shapes,” inProceedings of the European Conferenceon Computer Vision (ECCV), 2018, pp. 20–36.
[4]C. Luo, L. Jin, and Z. Sun, “Moran: A multi-object rectified attention network for scenetext recognition,”Pattern Recognition, vol. 90, pp. 109–118, 2019.
[5]S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, “Icdar 2003 ro-bust reading competitions,” inIn Proceedings of International Conference on DocumentAnalysis and Recognition (ICDAR). Citeseer, 2003, pp. 682–687.

延伸閱讀