以BM25特徵權重機制來改進軟體錯誤報告重複性偵測之研究

在軟體工程領域的研究中，偵測重複錯誤報告在近年來成為一項受到重視的議題。這主要是因為重複的錯誤報告會重複人力成本來分析。但也有研究指出，重複的錯誤報告可能可以提供豐富的資訊，而幫助軟體開發人員進行軟體維護。在過往研究中，已經有許多學者提出利用自然語言處理技術以及文字探勘技術來偵測重複的錯誤報。在論文中，我們提出以BM25為基礎的特徵權重機制，來進一步提升偵測效能。經由三個開放源碼軟體專案的測試，包含 Apache, ArgoUML及SVN，我們所提出的方法確實能有效提升偵測效能。

關鍵字

錯誤報告；重複性偵測；特徵權重； BM25

並列摘要

In the research areas in software engineering, detection on duplicate bug reports has received much attention. There are two main reasons. First, duplicate bug reports may waste human resource to process these redundant reports. Second, duplicate bug reports may provide abundant information for furtherer software maintenance. In the past studies, many schemes have been proposed using the information retrieval and natural language processing techniques. In this thesis, we propose a novel detection scheme based on a BM25 feature weighting scheme. We have conducted empirical experiments on three open source projects, Apache, ArgoUML, and SVN. The experimental results show that the BM25-based scheme can effectively improve the detection performance in nearly all cases.

並列關鍵字

Bug Reports ； Duplication Detection ； Feature Weighting ； BM25

參考文獻

[3] Avinash Atreya and Charles Elkan, “Latent Semantic Indexing (LSI) Fails for TREC Collections,” ACM SIGKDD Explorations Newsletter, vol. 12, no. 2, pp. 5–10, Mar. 2011.

[12] Man Lan, Sam-Yuan Sung, Hwee-Boon Low, and Chew-Lim Tan, “A Comparative Study on Term Weighting Schemes for Text Categorization,” in Proceedings of the 2005 IEEE International Joint Conference on Neural Networks (IJCNN ’05), Jul. 2005, pp. 546–551.

[14] C. J. Van Rijsbergen, S. E. Robertson, and M. F. Porter, New Models in Probabilistic Information Retrieval. London: British Library, 1980.

[15] S. E. Robertson and S. Walker, “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval,” in Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’94), 1994, pp. 232–241.

[18] Nicolas Serrano and Ismael Ciordia, “Bugzilla, ITracker, and Other Bug Trackers,” IEEE Software, vol. 22, no. 2, pp. 11–13, Mar. 2005. 24

被引用紀錄

許淑群（2008）。西醫基層總額支付制度對執業醫師健保收入的影響〔碩士論文，臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2008.00133

國際替代計量

以BM25特徵權重機制來改進軟體錯誤報告重複性偵測之研究

全文下載

主題瀏覽