帳號:guest(3.14.83.223)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳郁仁
作者(外文):Chen, Yu-Jen
論文名稱(中文):以預先分群為基礎之情境式文件分群
論文名稱(外文):Context-Aware Document Clustering: A Preclustering-Based Approach
指導教授(中文):魏志平
王俊程
指導教授(外文):Wei, Chih-Ping
Wang, Jyun-Cheng
學位類別:碩士
校院名稱:國立清華大學
系所名稱:科技管理研究所
學號:9773518
出版年(民國):99
畢業學年度:98
語文別:英文
論文頁數:43
中文關鍵詞:文件分群情境式文件分群預先分群文件探勘知識管理
外文關鍵詞:Document clusteringContext-aware document clusteringPreclusteringText miningKnowledge management
相關次數:
  • 推薦推薦:0
  • 點閱點閱:83
  • 評分評分:*****
  • 下載下載:4
  • 收藏收藏:0
Document clustering is an intentional act that should reflect individuals’ preferences with regard to the semantic coherency or relevant categorization of documents. Thus, an effective document clustering must consider individual preferences and needs to support contextual document categorization. In order to support context-aware document clustering, Yang and Wei (2007) proposed a Context-Aware Document Clustering (CAC) technique that takes into consideration a user’s categorization context represented as a list of anchoring terms and a statistical-based thesaurus constructed via exploiting the Web, and then generates a set of document clusters from this particular preferential context of the user. However, the thesaurus construction in the CAC technique is time-consuming and requires a lot of computational resources. Moreover, the thesaurus is built based on Web pages belong to various domains and may result in including too general terms for anchoring term expansion. In response to the limitations of the CAC technique, we proposed a new approach for context-aware document clustering. Specifically, those anchoring terms provided by a target user is first applied to form some partial clusters which are then adopted to consolidate a set of representative features for clustering the document corpus. Due to the nature of preclustering partial documents in the corpus, we called the proposed approach a preclustering-based context-aware document clustering technique. Our empirical evaluation results suggest that the proposed technique achieves better clustering effectiveness than its benchmark techniques (i.e., the traditional content-based approach and the CAC approach).
在這個資訊爆炸的時代,為了管理日益增加的文件,發展有效的文件分群技術是越來越重要而且必要的。文件分群是一種由使用者主導的行為,反應哪些類別對於使用者是適當的以及文件該如何歸類,而這種主觀意識會隨著使用者所在的情境不同而有所差異。因此,良好的文件分群技術必須考慮到使用者的偏好及所屬情境。然而,現有的文件分群技術大多僅依照文件內容來做分群,無法符合個人偏好或情境的要求。於是,楊姓及魏姓學者發表了情境式文件分群的技術(Context-Aware Document Clustering, CAC)。CAC技術考慮使用者在某一情境下的分類偏好(由一組Anchoring terms來表達此偏好),並且利用搜尋引擎來檢索網際網路上的文件,以建置一個統計式詞庫,達成符合使用者偏好的文件分群。然而,詞庫的建構需耗費大量的時間與運算資源,且其涵蓋範圍太廣,導致會包含一些過於廣泛的字,而非針對使用者所提供的Anchoring Terms。為了解決這樣的缺失,本文提出一個以預先分群為基礎(Preclustering-Based)的情境式文件分群。利用使用者的分類偏好先產生部分文件群,再利用部分文件群來萃取出足以代表使用者偏好與情境的特徵,最終達成符合使用者偏好的文件集群。根據本文的實證評估結果,證實所提出的情境式文件分群技術有不錯的分群效能且優於傳統文件分群技術以及CAC技術。
ABSTRACT i
中文摘要 ii
誌謝 iii
LIST OF FIGURES vi
LIST OF TABLES vii
CHAPTER 1 INTRODUCTION 1
1.1 Research Background 1
1.2 Research Motivation and Objectives 2
CHAPTER 2 LITERATURE REVIEW 5
2.1 Content-Based Document Clustering 5
2.2 Partial-Clustering-Based Personalized Document Clustering 8
2.3 Context-Aware Document Clustering 11
CHAPTER 3 DESIGN OF THE PRECLUSTERING-BASED CONTEXT-AWARE DOCUMENT CLUSTERING 15
3.1 Preclustering 16
3.2 Feature Formation 18
3.3 Document Representation 20
3.4 Clustering 21
CHAPTER 4 EMPIRICAL EVALUATION 22
4.1 Data Collection 22
4.2 Evaluation Criteria 23
4.3 Parameter Tuning 24
4.4 Comparative Evaluation 29
4.5 In-depth Analyses 30
4.5.1 Effects of Different Partial Clusters Generation Methods 30
4.5.2 Effects of Different Feature Formation Methods 31
4.5.3 Effects of Values of k2 and k3 34
CHAPTER 5 CONCLUSION AND FUTURE RESEARCH DIRECTIONS 39
REFERENCES 41
Bade, K., and Nurnberger, A. (2006). Personalized hierarchical clustering. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 181-187.
Barreau, D. (1995). Context as a factor in personal information management systems. Journal of the American Society for Information Science, 46(5), 327-339.
Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. (1999). Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3), 329-341.
Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, 152-155.
Brill, E. (1994). Some advances in transformation-based part of speech tagging. In Proceedings of the Twelth National Conference on Artificial Intelligence, 722-727.
Case, D. (1991). Conceptual organization and retrieval of text by historians: The role of memory and metaphor. Journal of the American Society for Information Science, 42(9), 657-668.
Cutting, D., Karger, D., Pedersen, J., and Tukey, J. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 318-329.
Donovan, J. (1991). Patrons Expectations about Collocation: Measuring the Difference between Psychologically Real and the Really Real. Cataloging and Classification Quarterly, 13(2), 23-43.
Guerrero Bote, V., Moya Anegon, F., and Herrero Solana, V. (2002). Document organization using Kohonen's algorithm. Information Processing and Management, 38(1), 79-89.
Jain, A., Murty, M., and Flynn, P. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.

Kim, H., and Lee, S. (2000). A semi-supervised document clustering technique for information organization. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 30-37.
Kwasnik, B. (1991). Documentation Note the Importance of Factors That Are Not Document Attributes in ihe Organisation of Personal Documents. Journal of Documentation, 47(4), 389-398.
Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1996). Self-organizing maps of document collections: A new approach to interactive exploration. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 238-243.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. University of Chicago press, Chicago.
Larsen, B., and Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16-22.
Pantel, P., and Lin, D. (2002). Document clustering with committees. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 199-206.
Restorick, F. (1986). Novel filing systems applicable to an automated office: a state-of-the-art study. Information processing & management, 22(2), 151-172.
Roussinov, D., and Chen, H. (1999). Document clustering for electronic meetings: an experimental comparison of two techniques. Decision Support Systems, 27(1-2), 67-79.
Rucker, J., and Polanco, M. (1997). Personalized navigation for the web. Communications of the ACM, 40(3), 73-75.
Voorhees, E. (1986). Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Information processing & management, 22(6), 465-476.
Wei, C., Chiang, R., and Wu, C. (2006a). Accommodating individual preferences in the categorization of documents: a personalized clustering approach. Journal of Management Information Systems, 23(2), 173-201.
Wei, C., Hu, P., Dong, Y., and Correspondence, C. (2002). Managing document categories in e-commerce environments: An evolution-based approach. European Journal of Information Systems, 11(3), 208-222.
Wei, C., Yang, C., Hsiao, H., and Cheng, T. (2006b). Combining preference-and content-based approaches for improving document clustering effectiveness. Information Processing and Management, 42(2), 350-372.
Yang, C., and Wei, C. (2007). Context-aware Document clustering. In Proceedings of 11th Pacific Asia Conference on Information Systems.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *