近年來,深度卷積神經網絡(CNN)在密集分類問題如影像分割上有相當傑出結果。語義分割主要是在輸入圖像上對於各個像素進行分類,實例分割則是對所有前景物體進行分割並賦予特定遮罩,並區分出不同的物體和類別。全景分割可視為語義分割和實例分割所結合之研究,目標是在為圖像中的每個像素預測對應之類別和可數物體的ID。當前的最新研究採用兩階段檢測器來檢測和分割前景對象,並在其上附加另一個網絡分支以進行語義分割。接著,他們將語義分割和實例分割知結果以演算法進行融合得到全景分割結果。然而,這些研究並未考慮運算所花的時間。 本研究中,我們提出了一種有效的全景分割網絡,以快速的運算速度來解決全景分割任務。基本上,此研究是基於原型遮罩和遮罩係數的簡單線性組合來生成遮罩。語義分割以及實例分割的分支網路上僅需要預測遮罩係數並使用原型網絡分支預測的共享原型遮罩生成結果。此外,為了提高共享原型遮罩的質量,我們採用了一個稱為跨層級注意力融合模塊的模塊,該模塊將多尺度特徵以注意力機制融合,從而幫助它們彼此之間的關聯性。 為了驗證此項研究,我們對具有挑戰性的COCO全景數據集進行了各種實驗。實驗結果中,本研究在GPU上約51毫秒之快速運算速度得到了極具競爭性的結果。同樣,我們以38.9%的PQ勝過所有一階段網路之方法。
Recently, deep convolutional neural networks (CNNs) have shown outstanding performance in dense classification problems such as segmentation tasks. Semantic segmentation aim to provide pixel-wise classification on input image. Instance segmentation segments all foreground objects and distinguishes different object instances. Panoptic segmentation is a scene parsing task which unifies semantic segmentation and instance segmentation into one single task, which aims to assign semantic label and instance ID to every pixel in an image. The current state-of-the-art studies adopt a two-stage detector which detects and segments the foreground objects and then cattach another network branch on it for semantic segmentation. Then, they combine both results into panoptic segmentation with heuristic merging. However, these studies did not take too much concern on inference time. In this work, we propose an Efficient Panoptic Segmentation Network (EPSNet) to tackle the panoptic segmentation tasks with fast inference speed. Basically, EPSNet generates masks based on simple linear combination of prototype masks and mask coefficients. The light-weight network branches for instance segmentation and semantic segmentation only need to predict mask coefficients and produce masks with the shared prototypes predicted by prototype network branch. Furthermore, to enhance the quality of shared prototypes, we adopt a module called "cross-layer attention fusion module", which aggregates the multi-scale features with attention mechanism helping them capture the long-range dependencies between each other. To validate the proposed work, we have conducted various experiments on the challenging COCO panoptic dataset. The experimental results show that EPSNet achieve highly promising performance with significantly faster inference speed (51ms on GPU). Also we outperforms all one-stage methods with 38.9\% PQ.