對於軟體錯誤報告基於N-gram特徵與群集收縮的重複 性偵測之研究

由過往研究顯示，偵測重複錯誤報告是軟體維護中的一項重要議題。一方面，重複錯誤報告會耗費大量人力成本來分析。另一方面，如果能夠整合重複的錯誤報告裡的豐富除錯資訊，將可幫助軟體開發人員進行除錯與測試。而在目前的重複報告偵測方法中，如果使用文字探勘技術，並無法達到良好效能表現，如果配合使用軟體執行資訊，雖然可大幅提升效能，但存在使用者隱私問題。在本論文中，我們提出新的方法，以n-gram 特徵資訊及群集收縮技術來提升重複報告偵測效能。經由四個開放源碼軟體專案的測試，包含Apache, ArgoUML, SVN 及 Eclipse，我們所提出的方法能有效提升偵測效能。

關鍵字

錯誤報告

並列摘要

According to past research studies, detection duplicate bug report is an important issue in software maintenance. First, triaging these duplicate bug reports may cost a large amount of human resources. Second, these duplicate bug reports may contain abundant debugging information which can be mined in depth to help testing and debugging processes. In previous studies, the schemes using only text mining techniques cannot achieve excellent performance. Although the performance can be highly improved with additional execution information, this approach has the privacy concern. In this thesis, we propose a novel scheme using n-gram features and the cluster shrinkage technique to improve the detection performance. With four open-source projects, Apache, ArgoUML, SVN, and Eclipse, we have conducted empirical studies. The experimental results show that the proposed scheme can effectively improve the detection performance.

並列關鍵字

bug report ； N-gram ； Cluster Shrinkage

參考文獻

[7] Marc Damashek, “Gauging Similarity with n-Grams: Language-Independent Categorization of Text ,” Science, vol. 267, no. 5199, pp. 843–848, Feb. 1995.

[8] Michael Fischer, Martin Pinzger, and Harald Gall, “Analyzing and Relating Bug Report Data for Feature Tracking,” in Proceedings of the 10th Working Conference on Reverse Engineering (WCRE ’03), 2003, pp. 90–99.

[9] David A. Grossman and Ophir Frieder, Information Retrival: Algorithms and Heuristics, 2nd ed. Springer, 2004.

[14] Bradley Kjell, W. Addison Woods, and Ophir Frieder, “Discrimination of Authorship using Visualization,” Information Processing and Management, vol. 30, no. 1, pp. 141–150, Jan. 1994.

[16] Robert J. Sandusky, Les Gasser, and Gabriel Ripoche, “Bug Report Networks: Varieties, Strategies, and Impacts in a F/OSS Development Community,” in Proceedings of the 1st International Workshop on Mining Software Repositories (MSR 2004), 2004, pp. 80–84.

被引用紀錄

李澤宏（2014）。新北市國民中學校園霸凌行為現況與防治策略之研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2014.00218

黃易進（2011）。新北市國中學生校園霸凌行為現況之研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2011.01006

簡羽謙（2008）。國民中學校園內學生間暴力事件歷程〔碩士論文，國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-1208200823415600

林孜懿（2009）。探討台灣北部地區參與清潔針具計畫之地方藥局從業人員推廣計畫之相關因素研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315171772

張婉貞（2011）。國中女生校園霸凌之探討〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215470633

國際替代計量

對於軟體錯誤報告基於N-gram特徵與群集收縮的重複性偵測之研究

全文下載

主題瀏覽