以移動立體視覺相機搭配無監督式時間與空間特徵與監督式應用遞迴類神經網路進行動態物體偵測

本論文提出用深度學習(deep learning) 直接從原始資料無監督式 (unsupervisedly) 的學出動態物體的特徵。具體來說, ，深度學習的技術，像捲積 (convolution), 兩個非線性的固定卷積 (pooling), 與堆疊 (stacking) ，會被運用來學習時間與空間特徵 (spatio-temporal features) 的多層次表示 (hierarchical representation)。本文是基於移動立體視覺相機的資料進行學習。時間與空間特徵整合遞迴類神經網路 (Recursive Neural Network) 之後，就可以從影像中辨認出動態物體 (motion segmentation) 。實驗結果顯示，本文提出來的方法，比較於用點特徵 (point feature) 加上運動模型 (egomotion) 的方法，可以在難偵測點特徵的地方，提取特徵助於動態物體偵測

關鍵字

深度學習；遞迴類神經網路；移動立體視覺相機；動態物體偵測；時間與空間特徵學習

並列摘要

IN this work deep learning is used to unsupervisedly learn features directly from raw data. Instead of hand-engineering features for each new sensor input data, the system advantageously adapts to new data by unsupervised learning. More specifically, deep learning techniques of convolution, pooling,and stacking are used to learn hierarchical representation of spatio-temporal features from unlabeled stereo video data. The spatio-temporal features are learned based on Reconstruction Independent Component Analysis (RICA) autoencoder.The learned features are then applied to do motion segmentation on moving objects in images from a moving stereo camera. In order to do so the spatio-temporalfeatures are extracted from image segments, and Recursive Neural Network is used to recursively build up a segmentation tree to segment out moving objects from the scenes. To our knowledge, this is the first time deep learning is applied on learning spatio-temporal features together with motion segmentation (scene-parsing). Comparing to moving object detection methods using point features with egomotion estimation, we show our features can be extracted in situations where good point features are not detectable. The system is evaluated with real-world data with results similar to state-of-the-art, while achieving better detection in certain situations.

並列關鍵字

Deep Learning ； Autoencoders ； Motion segmentation, Moving object detection ； Reconstruction Independent Component Analysis ； Recursive Neural Network

參考文獻

Kundu, A., Krishna, K., & Sivaswamy, J. (2009). Moving object detection by multiview

superpixels. E′cole Polytechnique Fe′de′ral de Lausssanne (EPFL), Tech. Rep, 149300.

visual slam and dense scene flow to increase the robustness of localization and

mapping in dynamic environments. In IEEE International Conference on Robotics

Coates, A. & Ng, A. (2011). Selecting receptive fields in deep networks. In Advances

國際替代計量

以移動立體視覺相機搭配無監督式時間與空間特徵與監督式應用遞迴類神經網路進行動態物體偵測

全文下載

主題瀏覽