Title

應用文件探勘技術進行立法文本自動化分析

Translated Titles

Automatic Content Analysis of Legislative Documents by Text Mining Techniques

Authors

周士堯

Key Words

文件探勘 ; 支持向量機 ; 立法表現 ; 分類 ; 兩階段分群 ; text mining ; SVM ; legislative performance ; classification ; two-stage clustering

PublicationName

清華大學服務科學研究所學位論文

Volume or Term/Year and Month of Publication

2013年

Academic Degree Category

碩士

Advisor

林福仁

Content Language

英文

Chinese Abstract

在立法院國會圖書館網站裡,提供了一個公開且客觀的管道,讓公民可以追蹤了解立法院每天發生的事情,諸如立委的質詢等等。然而,這些公開的資訊量其實非常大,也非常凌亂,一般民眾可能無法有效消化這些資訊,或很難透過這些資訊去清楚了解立委的問政績效,因而浪費了此公開管道的美意,因此,為了克服這個困難,本研究目的就在於透過文件探勘技術去有效分辯每位立委立法表現的類別,然後展現出他們在各領域裡的問政績效。 此研究根據中山政治所專家所建構的立法分類架構為基礎,透過兩階段分群(two-stage clustering)去做特徵值擷取,再採用支持向量機(support vector machine)去建立模型來自動預測立委立法表現到最適合的分類。 為了讓此系統可以永續執行下去,此研究同時也對政治專家與一般民眾在分類標籤貢獻上的內容差別做了實驗驗證,呈現的結果沒有顯著差別,將支持未來系統可以直接透過網路讓一般民眾做維護與更新分類的動作。 本研究提出的自動預測分類方法,輔以視覺化雷達圖的呈現,希望幫助公民更能了解立法院活動與立委的問政績效,根據實驗的結果顯示,使用本方法可以有效自動分辨立法表現類別,進而可持續利用國會圖書館的公開立法資訊,有效做到監督立委在各種面向下的問政績效。

English Abstract

The Parliamentary Library of Taiwan’s Legislative Yuan website provides a fair and objective channel for the public to track daily activities of the Legislative Yuan and legislators’ inquiries. However the quantity of generated documents is so large that the general public may not be able to update of the legislative performance of each legislator from these contents. To mitigate the gap of legislative document generation and the sense making by the general public, this study proposed a text mining mechanism to automatically classify legislative documents referring to each legislator, and then represent the proportion of their legislative performance on certain categories. This study first initiated a basic legislative categorical structure by domain experts. Then a two-stage clustering was applied to perform feature selection for legislative documents. The SVM method was applied to build a model to classify the new document to the appropriate category. In order to maintain the classification categories up to date, in this study, we also evaluate the difference from labeling contents by domain experts and the general public. If the categories labeled by both do not have significant difference, we can call for the general public via internet to maintain the updated categories of newly generated legislative documents. Experimental results show the effectiveness of the proposed test mining mechanism, which automatically classifies legislative documents to reveal legislators’ performance accordingly. With this result, people can monitor legislators and track their legislative activities using the information from the Parliamentary Library of Legislative Yuan to update their perception on legislative performance in various categories.

Topic Category 基礎與應用科學 > 資訊科學
科技管理學院 > 服務科學研究所
社會科學 > 管理學
Reference
  1. I. Political Science references:
    連結:
  2. Liao, D. L., Lin, F. R., Huang, Y. C., Liu, Z. Y., & Lee, C. X. (2012). The Establishment of Taiwanese Legislators' Campaign Promise Database. Journal of Electoral Studies, 19(1), 129-158.
    連結:
  3. Lin, J. J. (2006). The Study of Interpellation System of Legislative Yuan in R.O.C. Journal of TOKO, 1(1).
    連結:
  4. Liao, Y. (2006). The Research of Voter Turnout: Case Study in Taiwan. The Journal of Chinese Public Administration, (3), 185-202.
    連結:
  5. II. Technical mechanism references:
    連結:
  6. Berghel, H. (1997). Cyberspace 2000: Dealing with information overload. Communications of the ACM, 40(2), 19-24.
    連結:
  7. Korenius, T., Laurikkala, J., Juhola, M., & Jarvelin, K. (2006). Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments. Information Retrieval, 9(1), 33-53.
    連結:
  8. Lin, F., & Hsueh, C. (2006). Knowledge map creation and maintenance for virtual communities of practice. Information Processing & Management, 42(2), 551-568.
    連結:
  9. Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: review and suggestions for application. Journal of marketing research, 20(2), 134-148.
    連結:
  10. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
    連結:
  11. Sheng, H. Y. (2005). 立法委員的立法提案:第五屆立法院的分析. Taipei, Taiwan: 2005 Annual conference of Taiwanese Political Science Association.
  12. Siao, Y. S. (2010). Investigation and research of the oral presentation of legislators: Analysis of debates about national defense, diplomacy and cross-strait relations in The Legislative Yuan Official Gazette (Master’s thesis, National Taiwan Normal University, 2010). NTNU Institutional Repository.
  13. Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster Analysis (fourth.). Arnold, London.
  14. Ku, L. W. (2000). A study on the multilingual topic detection of news articles (Master’s thesis, National Taiwan University, 2000). NDLTD in Taiwan.
  15. Burbidge, R., & Buxton, B. (2001). An Introduction to Support Vector Machines for Data Mining. Operation Research Society. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.7639
  16. Huang, Y. C. (2010). Incremental Clustering: An Example of Legislative Interpellation (Master’s thesis, National Tsing Hua University, 2010). NTHU Electronic Theses and Dissertations System.
Times Cited
  1. 薛仱芸(2014)。改善網路操弄評論分類績效之研究。朝陽科技大學資訊管理系學位論文。2014。1-91。