透過您的圖書館登入
IP:18.221.236.224
  • 學位論文

運用社群探勘技術探討程式開發者的資訊需求與需求未被滿足之原因

Mining Program Developers’ Information Needs and Reasons for Needs Being Unfulfilled in Programming Communities

指導教授 : 陳靜枝

摘要


近年來,程式開發的問題愈來愈複雜,開發者傾向前往開發者社群平台(如: Stack Overflow),以尋求其他有經驗的開發者的協助。然而,此類社群平台回答率 卻越來越低,使開發者的資訊檢索日益困難,因此,本研究希望找出開發者最關鍵 的資訊需求並分析回答率日漸低落的原因,以作為未來資訊提供之指引。 本研究的實驗資料選用 2008 至 2021 年間,Stack Overflow 上與 Python 相關 的共 1,897,336 筆問答討論,我們利用隱含狄利克雷分布模式(Latent Dirichlet Allocation)對這些問題進行主題模型的訓練,並透過實驗選擇表現最佳的資料與參 數組合之模型,同時使用主題之標籤分布相似度驗證了該模型在分類問題上的有 效性,最終透過該模型擷取出其中最重要的四十個需求主題。 接著,我們利用這些訓練出的主題進行後續的分析,並獲得了以下結論:在針 對主題發展趨勢的分析中,我們發現討論度下降、過時的主題通常是內容與應用較 為固定而無變化的主題;而討論度上升的主題則是近年來興起的技術且大多與資 料分析、機器學習相關。再者,關於主題特性的分析使我們了解到困難的主題較為 熱門卻有較低的回答率,因此應被視為資訊需求最急迫的主題。最後,部分的提問 者擁有較高的被回答率,同時,擁有良好提問習慣 (如:附上程式碼及不濫用標籤 等)的提問者亦更可能獲得解答。 整體而言,本研究提供了數個關於程式開發者需求研究的方法與發現,我們期 望這些經驗可以有助於未來改善開發者的資訊檢索,同時為開發者營造一個更好 的工作環境。

並列摘要


As developing issues are getting complicated, programming developers tend to seek experienced developers in the programming communities such as Stack Overflow for help. However, the forum’s declining answer rate is making information retrieval more and more difficult. Thus, we aim to find developers’ critical needs and the reasons for the dropping answer rates to provide guidance for complementing related information. This study collects 1,897,336 Python-related posts on Stack Overflow and conducts topic model training using these posts and the Latent Dirichlet Allocation (LDA) model. Next, we conduct trials to select the most relevant datasets and parameters and verify the trained model’s effectiveness in categorizing posts using tag similarities. Finally, the forty most critical topics are extracted from the model and used in the following analysis. First, the topics’ trend analysis shows that topics with decreasing popularity have stable contents and applications. In contrast, the increasing topics have risen rapidly in the past decade and are mostly related to data analytics. Second, the topics’ feature tests reveal that difficult topics are more popular while having lower answer rates. Thus, the information needs on these topics should be considered the most urgent. Lastly, some of the askers have higher answered rates. Moreover, askers receive more solutions if they have good asking habits, such as attaching code snippets and not overusing tags. This research provides several methods and conclusions on developers’ needs. We expect that the findings in this research can be adopted to improve developers’ information needs, which results in a better working environment for developers.

參考文獻


[1]  Ali, R. H. and Linstead, E., "Modeling topic exhaustion for programming languages on StackOverflow," Proceedings of the 32nd International Conference on Software Engineering and Knowledge Engineering - SEKE '20, pp. 400-405, 2020.
[2]  Allamanis, M. and Sutton, C., "Why, when, and what: analyzing Stack Overflow questions by topic, type, and code," Proceedings of the 10th Working Conference on Mining Software Repositories - MSR '13, pp. 53-56, 2013.
[3]  Asaduzzaman, M., Mashiya, A. S., Roy, C. K. and Schneider K. A., "Answering questions about unanswered questions of Stack Overflow," Proceedings of the 10th Working Conference on Mining Software Repositories - MSR '13, pp. 97-100, 2013.
[4]  Bajaj, K., Pattabiraman, K. and Mesbah, A., "Mining questions asked by web developers," Proceedings of the 11th Working Conference on Mining Software Repositories - MSR '14, pp. 112-121, 2014.
[5]  Baltadzhieva, A and Chrupała, G., "Predicting the quality of questions on Stackoverflow," Proceedings of Recent Advances in Natural Language Processing, pp. 32-40, 2015.

延伸閱讀