透過您的圖書館登入
IP:3.17.6.75
  • 學位論文

加速深度學習系統──以DeepVariant為案例研究

Accelerating Deep Learning Systems: A Case Study with DeepVariant

指導教授 : 洪士灝
本文將於2024/08/13開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


隨著次世代定序 (next generation sequencing) 的快速發展,我們可以用低廉的價格取得個人基因體的數十億的片段,這些片段中會有許多的錯誤,我們必須藉由變異偵測 (variant calling) 的技術,才可以確定每一個基因位點的鹼基種類。本論文探討的案例—DeepVariant,利用深度神經網路來對定序資料作變異偵測,曾在2016舉辦的 PrecisionFDA Truth Challenge 中贏得SNP performance的獎項,然而,整個DeepVariant需要數個小時才能完成。 在此論文中,我們提出優化DeepVariant效能的方法,首先,利用SOFA觀察程式的執行特徵,發現到DeepVariant分為兩個階段的執行,先將全部的基因轉為圖片,才用神經網路進行圖片推論,我們實作新的資料流方式,將兩個階段的執行重疊以達到加速的效果。接著,我們用Vtune分析第一個階段,觀察到程式花了許多時間在Python與C++之間的資料轉換,以及Python本身沒有效率的函式呼叫的實作方法,因此,我們將整支程式重新以C++改寫,減少了因為使用Python而產生的不必要的時間。最後,我們實作了分散式版本的DeepVariant,使得DeepVariant可以用多個CPU伺服器以及多個GPU平行計算,並客製化TensorFlow從網路收取資料的操作,減少了不必要的資料複製與轉換,提高了GPU的使用率。 藉由以上的優化方法,我們將DeepVariant的執行時間從4個小時降到1小時左右,並且,在8台CPU伺服器以及8台GPU的環境下,達到接近線性的加速,只需要少於8分鐘的執行時間,以成本效益來看表現得比Parabricks好。

並列摘要


As the next-generation sequencing (NGS) rapidly evolves, the sequence of an individual’s genome can be determined at a decreasing price from billions of short, errorful sequence reads by calling the genetic variants (variant calling) present in an individual genome. DeepVariant, the case study in this thesis, is an open-source software package that calls genetic variants with a deep neural network (DNN), which has won the PrecisionFDA Truth Challenge for best SNP Performance in 2016. Even with a high-performance GPU device to accelerate the DNN, it still took four hours to complete the variant calling on our workstation, so we chose to analyze the performance of DeepVariant to find ways to further reduce the time and cost of the NGS variant calling pipeline. In this thesis work, we used SOFA (Swarms of Functions Analysis) to characterize the performance of DeepVariant. The original DeepVariant program executed tasks in two stages. In the first stage, all the sequencing data were converted into images, and in the second stage, the inference of images was done using a DNN. Based on this observation, our first optimization work was able to shorten the execution time by 26% by restructuring the program and overlapping the two stages of execution. Next, we used the Intel VTune Amplifier to profile the first stage and revealed a large amount of execution overhead for the Python-based main program to call into C++ functions and convert data between Python and C++ functions. Thus, we decided to re-implement the main program in C++, which resulted in 68% reduction of the execution time. Finally, we built a distributed version of DeepVariant to further scale its performance in the datacenter by distributing the tasks in the first and the second stages onto multiple CPU servers and multiple GPUs. In the meantime, we developed a customized TensorFlow operation to handle the data received from the ZeroMQ network socket, effectively reducing unnecessary data copying and data conversion and improving the GPU utilization. As a result, we reduced the execution time of DeepVariant to 7 minutes and 39 seconds with a near-linear speedup on 8 CPU servers and 8 GPUs, which outperformed an industrial solution provided by Parabricks in terms of cost-performance.

參考文獻


[1] Green, E.D., Rubin, E.M. and Olson, M.V. The future of DNA sequencing. Nature, 550(7675):179-181, 2017.
[2] Stephens, Z.D., et al. Big Data: Astronomical or Genomical? PLoS biology, 13(7):e1002195, 2015.
[3] Ashley, E.A. Towards precision medicine. Nature reviews. Genetics, 17(9):507-522, 2016.
[4] Dey, N., et al. Mutation matters in precision medicine: A future to believe in. Cancer treatment reviews, 55:136-149, 2017.
[5] Park, J.Y., et al. Next-generation sequencing in the clinic. Nat. Biotechnol. 31, 990–992, 2013.

延伸閱讀