以取樣爲基礎之多執行緒程式行爲分類與預測

程式執行行爲常有重複執行的現象，因此如果可以認出重複執行的部分，則可以最佳化程式執行。若可將程式執行區段準確分類到適當的類別裡，並且可以利用這些分類資訊來預測下一區段所屬的類別，則可及時啟用一個適當的最佳化方法於下一個區段執行。這樣的最佳化機會不僅出現在單一執行緒的程式裡，也會出現在多執行緒的程式裡。在本篇論文裡，我們提出一個架構能夠蒐集每個執行緒所執行的每個區段的程式特徵，並且在程式執行時將每個區段分類到適當的類別裡。最後利用這些分類結果來預測下一區段的類別。爲了提高分類的效率，我們只儲存固定數目的代表資訊來做分類，因此分類的效率很高而且所需的記憶體空間小於三千個位元組。另外我們使用信心表格來提高預測準度。我們的架構能夠將區段分到適當類別裡使得同一類別裡的區段間行爲非常類似。且下一個區段所屬的類別的預測準度爲65%。若加入信心表格，則預測準度提高至80%。

關鍵字

程式區段；類別；多執行緒程式；程式特徵；類別預測

並列摘要

Program executions are usually repetitive; hence, we can optimize program executions if we are able to identify the program’s repetition pattern. If we can accurately classify program execution intervals into phases, and use such information to accurately predict the next phase at runtime, we will be able to apply suitable optimization on the coming intervals. Such runtime optimization opportunity not only exists in single-threaded programs but also in multi-threaded programs (e.g. OpenMP codes, which exhibit repetitive behavior). In this paper, we propose a framework that collects code signature of each interval from individual threads of a multi-threaded parallel program, and classify them into phases at runtime. We then use the classiﬁcation results to predict the next phase in an on-line fashion. In order to efﬁciently classify execution intervals on-line, we only keep a ﬁxed number of representative data in a shared table for classiﬁcation purpose. The classiﬁcation is very efﬁcient and uses less than 3K bytes of memory. We also propose a conﬁdence table to improve the prediction rate. Our system can classify intervals into phases with high homogeneity, can predict the next phase with 65% accuracy without using conﬁdence and with 80% accuracy when conﬁdence table is used.

並列關鍵字

interval ； phase ； multi-threaded program ； code signature ； phase prediction

參考文獻

[2] H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing representative portions of large intel®itanium®programs with dynamic instrumentation. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pages 81–92, Washington, DC, USA, 2004. IEEE Computer Society.

[3] T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 3–14, Washington, DC, USA, 2001. IEEE Computer Society.

[4] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45–57, New York, NY, USA, 2002. ACM.

[5] E. Perelman, G. Hamerly, and B. Calder. Picking statistically valid and early simulation points. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 244, Washington, DC, USA, 2003. IEEE Computer Society.

[6] Y. Zhang, B. Özisikyilmaz, G. Memik, J. Kim, and A.N. Choudhary. Analyzing the impact of on-chip network traffic on program phases for cmps. In Proceeding of IEEE International Symposium on Performance Analysis of Systems and Software, pages 218–226, 2009.

國際替代計量

以取樣爲基礎之多執行緒程式行爲分類與預測

全文下載

主題瀏覽