軟體效能分析是一個重要且困難的議題,長久以來有許多學者隨著軟硬體的發展,提出各種軟體效能分析與塑模的方法。軟體效能塑模可以協助軟體開發者在開發過程中,預先評估與預測軟體效能。隨著硬體效能的提升,軟體架構的設計越來越趨於複雜,開發在核心系統上運行的多核心軟體,更應該避免花費大量的時間與人力資源來做軟體最佳化,將軟體效能調整到滿足既定的效能需求。隨著多核心處理器的發展,多核心處理器已經漸漸成為主流的處理單元。許多軟體也開始進行開發與修改支援平行運算的平行版本,來取得最佳的軟體效能。開發一個多核心軟體在多核心環境上運行,所必須考慮到的因素,比傳統單核心環境要複雜許多。例如系統資源競爭的問題、同步機制以及共享快取記憶體資源競爭等問題。在傳統的軟體開發流程當中,軟體的效能往往都是在整個軟體都已經開發完成後,才開始進行測量。如果軟體的效能如期達到既定的效能需求,整個軟體開發流程也可以告一段落。但是如果軟體效能無法達到效能需求,對於多核心軟體而言,將會相當困難去進行修改,一方面無法簡單的判斷出效能瓶頸是在什麼地方,另一方面也可能是整體的軟體架構設計有問題,導致軟體效能低落,由於大部分的軟體程式碼都已經實作完成,因此如果要做修改,將會非常困難,而且大多數的程式碼都可能面臨到必須重新實作的問題。因此,利用一個準確的軟體效能塑模來協助軟體開發者預測軟體的效能,讓軟體能夠在多核心系統上發揮出最佳的效能,是十分重要的。本篇論文,首先針對影響多核心軟體的三項主要因素作分析,分別為:平行度(Parallelism)、溝通模式(Communication pattern)與資料密集度(Locality),並且分析多核心軟體效能瓶頸與三項主要因素之間的關聯性。我們提出一套針對溝通導向之軟體效能分析(Communication-Oriented Performance Estimation, COPE)方法,來協助軟體開發者分析與檢測多核心軟體效能瓶頸,並且給予軟體開發者建議如何在目前的軟體架構與軟體設定值當中,設定適當的執行序數目,來取得最佳的軟體效能。
Performance modeling can assist the developer to estimate system performance at an early stage, such that the huge cost incurred by tuning software to achieve the target performance can be avoided. Since multicore processors have already become mainstream for computing, many applications are being derived as parallel software to enhance the performance. Developing an application on multicore platforms is more complex and more factors than single core platforms need to be considered. For example resource contention, synchronization problem, and shared cache conflict are some critical factors . Traditionally, the performance was measured after most code were implemented. If the performance does not meet application requirements, it is very difficult to modify the applications in parallel version due to a large amount of the code needs to be modified. Thus, an accurate model is needed to estimate the system performance and further guide developers to tune for optimal performance. In this work, we first analyze the impact of performance from three factors including thread parallelism, communication pattern, and data locality. Moreover, we analyze the performance bottleneck from correlation of the three factors. We propose a communication-oriented performance estimation method to assist the programmer to detect and analyze performance bottlenecks. Furthermore, we suggest the adjustment of the number of threads to obtain better performance from current configuration.