本研究針對堆疊物件提出一套模組化的分類夾取流程,使用 RGB-D 相機取得物件堆疊的平面以及深度影像,經過實例切割模型(Mask-RCNN)及夾取點生成卷積類神經網路(Generative Grasping Convolutional Neural Network, GG-CNN),找出該堆疊中的多個夾取點,最後將所有物件的夾取點彙整至堆疊中,根據深度資訊篩選出不會與鄰物干涉的夾取點,並令機器手臂前往夾取。 在最初的分割步驟中,本研究選擇Mask R-CNN 對堆疊影像進行實例切割(Instance Segmentation),將物件從堆疊中逐一分離,取得堆疊中物件的位置以及類別資訊,並加入邊緣損失以取得更精確的邊緣輪廓。 第二步驟使用 GG-CNN 對單一物件的深度資訊生成像素級(Pixelwise)的夾取穩定度評分,此模型對於未知物件仍有預測夾取點的能力,因此在增加新的目標物件時,不需再更新此步驟的模型參數。 在第三步驟中透過深度影像,結合第一步驟的物件的位置資訊,以及第二步驟的夾取穩定度評分,剔除可能碰撞夾取點,並依據穩定度排序,即為本流程的最後輸出結果。最後,本研究並以一機器臂系統驗證此一流程之可行性,其夾取成功率可達84.3%。
This thesis presents a robotic grasping and classification system for objects in cluttered environments. The system consists of three main parts: (i)instance segmentation, (ii)grasping candidates generation, and (iii)collision avoidance. In the first part, the instance segmentation model, Mask R-CNN, isolates each cluttered object from the scene and is improved to obtain an accurate mask edge. In the second part, Generative Grasping Convolutional Neural Network (GG-CNN) predicts the quality and grasps for every object, which is segmented in the first part. After that, the grasping candidates would be sampled from the pixel-wise prediction of GG-CNN. In the last part, the algorithm selects collision-free grasps from the grasping candidates based on depth information. Finally, a robotic system is presented to illustrate the effectiveness of the process. It is shown that an 84.3% successful rate of grasp can be achieved.