透過您的圖書館登入
IP:18.188.72.75
  • 學位論文

軟體實現瞬時故障檢測與糾錯之程式設計模型

A Software-based Redundant Execution Programming Model for Transient Fault Detection and Correction

指導教授 : 陳鵬升
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


軟體可靠度這項議題在現今電腦系統中的地位越來越重要,隨著多核心技術的發展日漸成熟,我們可以利用多核心的處理器來冗餘執行系統任務,藉以提高計算系統的可靠度。然而,要從無到有寫出一套能夠有效提升可靠度的方法是非常困難與複雜的。 在這篇論文中,我們提出一個純軟體的瞬時故障檢測與糾錯之程式設計模型。我們利用多執行緒的技術來冗餘執行藉以達到處理瞬時故障的目的。除此之外,我們還採用多數決投票的方式來修復錯誤,並且有著額外的執行緒──看門狗,用來自我監控模型中沒有回應的執行緒,並使其修復。實驗結果顯示,套用我們程式模型的程式其正確率可以達到88.9%,遠高於未套用之程式。程式設計者可以系統化地將我們的程式設計模型應用到他們的程式上,使其具有容錯的能力。 關鍵字:多執行緒、可靠度、瞬時故障、容錯。

關鍵字

多執行緒 可靠度 瞬時故障 容錯

並列摘要


Software reliability is becoming increasingly important due to the close relationships between computer systems and our everyday life. With the advent of multi-core technology, we can leverage the multi-core processor to improve the reliability of computing systems by redundancy, but programming from scratch is difficult and complicated. In this thesis, we proposed a software-based programming model for transient fault detection and correction. The multi-threading technique is introduced to handle thread-level redundant execution for fault detection. The majority voting is used to recover from errors. Moreover, a watchdog thread is used to cope with issues of no-response threads. For the tested benchmark programs, the probability of correct results in the proposed programming model is 88.9%. It is much higher than the original program. Programmers can systematically apply the proposed programming model to their programs, and make them have fault tolerance. Keywords: multi-threading, reliability, transient fault, fault tolerance.

參考文獻


[1] Polian, J. P. Hayes, S. Kundu, and B. Becker, “Transient Fault Characterization in Dynamic Noisy Environments,” in Proc. of IEEE International Test Conference, 2005.
[2] Mushtaq, H., Al-Ars, Z., and Bertels, K. Efficient software based fault tolerance approach on multicore platforms. In Proc. Design, Automation & Test in Europe Conference (Grenoble, France, March 2013).
[4] C. Wang, H. s. Kim, Y. Wu, and V. Ying. Compiler-managed software-based redundant multi-threading for transient fault detection. In Code Generation and Optimization, 2007. CGO ’07. International Symposium on, pages 244–258, March 2007.
[5] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. Swift: Software implemented fault tolerance. In Proceedings of the International Symposium on Code Generation and Optimization, CGO ’05, pages 243–254, Washington, DC, USA, 2005. IEEE Computer Society.
[6] H. Mushtaq, Z. Al-Ars, and K. Bertels, “Survey of fault tolerance techniques for shared memory multicore/multiprocessor systems,” in Design and Test Workshop (IDT), 2011 IEEE 6th International, December 2011, pp. 12 –17.

延伸閱讀