透過您的圖書館登入
IP:52.91.0.68
  • 學位論文

自駕車高精地圖基於深度學習的三維點雲辨識與分割

3D Point Cloud Object Recognition and Segmentation for High-Definition Map of Autonomous Driving Vehicle with Deep Learning

指導教授 : 傅楸善
本文將於2026/06/14開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


本文提出的深度學習方法,用於自駕車光達所蒐集到的三維點雲,除了每個點雲在像素級別的辨識與分割上得到較優的成績之外,也實際應用在自駕車高精地圖的改善上,最終也得到肉眼可見的改善成果。 基於深度學習的辨識與分割是本文主要的創新貢獻。前處理的部份,這裡在保持三維特徵的前提之下,基於光達(LiDAR: Light Detection And Ranging)的特性運用座標轉換將三維資料投影到二維空間,用這個做法的原因有三,第一是可以降低模型的參數,也才能真正運用在自駕車上,畢竟自駕車的反應速度是很重要的,比起丟到雲端計算,實現邊緣計算才是更實際的。第二是這裡雖然把三維點雲投影到二維空間上,但所有三維特徵並未遺失,所以這裡一樣可以用三維的方式做處理。第三是因為現在深度學習在二維的辨識與分割已經很成熟了,在二維座標上處理可以讓我們站在巨人的肩膀上,進而結合前人三維點雲與二維影像的研究與一些本論文的想法得到分割模型。 在本研究著重的深度學習網路架構上,我們提出以生成對抗學習網路與半監督學習的方式去提高語義分割具代表性的衡量標準。簡要地說就是將剛剛提到的分割模型作為生成對抗學習網路的生成器,與一樣是基於卷積神經網路的判別器,結合半監督學習的概念去發想,可以改善語義分割是基於各個像素去辨識卻忽略鄰近區域資訊的這個缺點,而這個重大改善也是本篇的一大亮點。 本文有一個主要的應用就是高精地圖,高精地圖就像人們要前往曾經去過的目的地時腦中會浮現的樣貌。試想腦中的想像是抽象的,但高精地圖是實際能顯示並做計算的,由此可知其在自駕車領域扮演一個很重要的角色。本文從如何蒐集自駕車視覺的光達(LiDAR)三維點雲資料,到實際以此三維資料去建造高精地圖都有實際操作與說明。而我們三維點雲辨識與分割的技術在這流程中出現的時機則是在建造高精地圖之前,在我們辨識各個點雲的類別之後。可依據這些類別特徵對高精地圖做前處理,例如移除會影響建圖的動態點雲,一如我們在二維影像處理中移除雜訊那樣,使我們的高精地圖雜訊減少之外,也可使地圖本身更精準,又或者是移除建圖中不想看到的物件,使我們能在有許多汽機車日常行駛的情況下,蒐集到該地區無特定物件的高精地圖,比起在自駕車專用場域建圖,這樣做更貼近現實的情況。

並列摘要


In this thesis, the deep learning method is used for the 3D point cloud collected by LiDAR for self-driving cars. In addition to obtaining better results in the pixel-level recognition and segmentation of each 3D point cloud, it is also actually applied to the improvement of high-definition maps for self-driving cars. Eventually, we get the improvement results visible to the naked eye. Recognition and segmentation based on deep learning is the main innovative contribution. In pre-processing, to maintain the 3D features, based on the characteristics of LiDAR, we use coordinate transformation to project 3D data into 2D space. There are three reasons for this method. The first is that the parameters of the model can be reduced, so that it can be truly used in self-driving cars. After all, the reaction speed of self-driving cars is very important. It is more practical to realize edge computing than uploading it into cloud computing. The second is that although the 3D point cloud is projected onto the 2D space, all 3D features have not been lost, so they can be processed in 3D as well. The third is that deep learning is now mature in 2D recognition and segmentation. Processing on 2D coordinates allows us to stand on the shoulders of giants, and then combine the previous research on 3D point cloud and 2D image with some innovative ideas in this thesis to get the segmentation model. Based on the deep learning network architecture that this research focuses on, we propose to improve the representative measurement standards of semantic segmentation by means of generative adversarial learning networks and semi-supervised learning. Briefly, the segmentation model just mentioned is used as the generator of the adversarial learning network, the same as the discriminator based on the convolutional neural network, combined with the concept of semi-supervised learning, can improve the semantic segmentation based on each pixel recognition but ignores the shortcoming of neighboring area information, and this major improvement is also a highlight of this thesis. One of the main applications of this thesis is high-definition maps. High-definition maps are like what people think of when they go to destinations they have been to. Imagine that the imagination in their minds is abstract, but high-definition maps can actually display, compute, and play a very important role in the field of self-driving cars. This thesis has practical operations and instructions from how to collect LiDAR 3D point cloud data for self-driving car vision to actually use 3D data to build high-definition maps. The timing of our 3D point cloud recognition and segmentation technology in this process is before the construction of the high-definition map. After we have identified the categories of each point cloud, we can pre-process the high-definition map based on these category features. For example, removing the dynamic point cloud that will affect the construction of the map, just like we remove the noise in the 2D image processing, not only reduces the noise of our high-definition map, but also makes the map itself more accurate. To remove objects that we do not want to see in the map, we can collect high-definition maps of the area without specific objects when there are many cars and motorcycles driving daily. Compared with building maps in a dedicated field for self-driving cars, this is closer to reality.

參考文獻


[AG 2021] Infineon Technologies AG, “From assisted to automated driving,” https://www.infineon.com/cms/en/discoveries/adas-to-ad/, 2021.
[Aksoy 2019] E. E. Aksoy, S. Baci, and S. Cavdar, “SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving,” Proceeding of IEEE Intelligent Vehicles Symposium, France, pp. 1-7, https://arxiv.org/pdf/1909.08291.pdf, 2019.
[Ayyadevara 2019] V. K. Ayyadevara, “Calculating an intersection over a union between two images,” https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789346640/6/ch06lvl1sec51/calculating-an-intersection-over-a-union-between-two-images, 2019.
[Behley 2018] J. Behley,and C.stachniss, “Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments,” Proceeding of Robotics: Science and Systems, Pittsburgh, Pennsylvania, USA, http://roboticsproceedings.org/rss14/p16.pdf, 2018.
[Behley 2019] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” Proceedings of International Conference on Computer Vision, Seoul, Korea, pp. 1-17, https://arxiv.org/pdf/1904.01416.pdf, 2019.

延伸閱讀