透過您的圖書館登入
IP:18.191.228.88
  • 學位論文

科學資料之內存運算查詢系統

In-memory query system for scientific datasets

指導教授 : 周志遠
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著現今電腦的運算能力愈來愈強大,而且在資料量不斷提升 的情況下有限的I/O頻寬卻無法等比例的提升,兩者間日趨擴大的效 能差異導致傳統的模擬後數據處理方法(post-simulation data processing method)已面臨效能上的瓶頸。因此原位計算(in-situ computing)與查詢驅動數據分析(query-driven data analysis)是 用於縮短資料搬移路徑很重要的技巧。我們實作一個結合了位圖索 引(bitmap indexing)、空間資料結構重組(spatial data reorganization) 、分散式共享內存(distributed shared memory)與 位置感知平行執行(location-aware parallel execution)的索引系 統,並且使用了NERSC的超級電腦作為真實環境對兩個真實科學模擬 資料運行實驗分析。結果顯示對比於傳統依賴平行儲存檔案系統的 查詢系統,我們的系統可以達到10倍以上的效能優化。

關鍵字

索引 科學資料

並列摘要


The growing gap between compute performance and I/O bandwidth coupled with the increasing data volumes has resulted in a bottleneck to the traditional post- simulation data processing method. Hence in-situ computing and query-driven data analysis are important techniques to minimize data movement. By taking advantage of the growing memory capacity on supercomputers, we developed an in-memory query system for scientific data analysis. Our approach is a combination of bitmap indexing, spatial data layout re-organization, distributed shared memory, and location-aware parallel execution. Our evaluations on a NERSC supercomputer using two real scientific datasets showed that we can aggregate the memory ca- pacity from thousands of computes nodes to analyze a 750GB simulation dataset without transferring data to remote nodes or storage systems. Comparing to the traditional solutions based on out-of-core parallel file systems, we achieve more than x10 speedup. Therefore, our system can support interactive query and serve as a vehicle for steering simulations.

參考文獻


Adding value to the io pipelines of high performance applications with jitstaging.
In Proceedings of the 20th International Symposium on High Performance
Ecient query execution on raw data les. In Proceedings of the 2012 ACM
pages 241{252, 2012.
[4] IPCC Fifth Assessment Report. http://en.wikipedia.org/wiki/IPCCF ifthAssessmentReport:

延伸閱讀