基於關聯度指標之網路文件語意分析與文句摘要

有別於傳統的潛在語意分析（Latent Semantic Analysis）法，本論文結合潛在語意分析與文章詞彙關聯度及文章句子關聯度的概念，建構出關聯性矩陣，用來加強與擷取文件中的概念結構，以得到語意層面的分析，進而篩選最適合之句子，作為文件摘要的依據。在效果評估方面，提出兩個評估指標，以達到較客觀的摘要評估結果。實驗針對13大類（生活、地方、社會、政治、科技、旅遊、財經、健康、國際、教育、運動、戲劇、藝文）共1300篇中文網路新聞進行測試。實驗結果顯示，我們所提之方法能篩選出同時具有較低相似度及文章代表性的句子。

關鍵字

斷詞；文件摘要；潛在語意分析；奇異值分解

並列摘要

In this study, relation measures between words and between sentences were integrated with the Latent Semantic Analysis to construct the relational matrix between words and sentences. Such a novel approach improved the semantic analysis of the similarity between sentences, aiding the extraction of the representative sentences of the document for text mining. In the experiment, thirteen categories of documents were applied and two performance indices including sentence similarity and document classification rate were used to evaluate the proposed idea. The experimental results show that the proposed method is capable of extracting less similar and more representative sentences of the document.

並列關鍵字

Word segmentation ； text mining ； latent semantic analysis ； singular value decomposition

被引用紀錄

張益誠、張育傑、余泰毅（2021）。探討環境教育論文的文件自動分類技術－以2013－2018年環境教育研討會摘要為例。環境教育研究，17(1)，85-128。https://doi.org/10.6555/JEER.17.1.085

國際替代計量

基於關聯度指標之網路文件語意分析與文句摘要

全文下載

主題瀏覽