應用潛在語意分析於測驗題庫相似性之比對

本研究旨在應用資訊檢索技術中潛在語意分析（latent semantic analysis，LSA）的方法，分析題庫中的試題是否有相同或相似的情形，並探討使用潛在語意分析時，冗詞去除與否、權重的調整與維度約化（dimension reduction）對結果的影響，研究目的有二：一、探討潛在語意分析是否能有效找出題庫中相同或相似的試題？二、探討使用潛在語意分析時，冗詞去除與否、何種調整權重方式與約化的維度，在分析試題相似度時效果較佳？本研究使用「電腦軟體應用技能檢定丙級學科」92年與93年共1000題選擇題為題庫，並將其試題與試題間的相似度分為完全相同、非常相似、部分相似與些微相似四類，研究結論如下：一、有去除冗詞在分析各種相似程度的試題其效果皆優於無去除冗詞者。二、適合本題庫調整詞彙與試題關係矩陣權重的方式為log-entropy。三、判斷兩試題是否完全相同時，保留的維度愈高精確率愈高，判斷兩試題是否非常相似、部分相似與些微相似時，保留維度依序為30、15與14時，精確率較佳。四、對於本題庫中（1）用詞完全相同、（2）部分辭彙不同、（3）敘述方式不同，但題意相同、（4）辭彙不同，但意義相同四類試題，系統皆能正確的判斷出來。

關鍵字

潛在語意分析；題庫；相似試題

並列摘要

The purpose of this study is to apply latent semantic analysis(LSA) to analyze item bank whether it does have the same or similar item, and discuss to use LSA, whether the common words remove or not, the weight adjustment and dimension reduction, the influence to the result. Two major purposes of this study are. 1.Discusses latent semantic analysis whether can effectively discover the same or similar item in the item bank? 2.Discusses the use of latent semantic analysis, whether the common words remove or not, what method of weight adjustment and the number of dimension reduction is better to analyze item bank similarity? This research use "the computer software application skill examination - grade-C course" of the years 92 & 93 which have 1,000 multiple choices items as item bank, And classified four kind of similarity, completely identical, extremely similar, partially similar and slightly similar. The research conclusion is as follows: 1.When analyzing each similar degree item , the effect of removing common words is better than not removing common words. 2.The method used of weight adjustment for term-by-document matrix to suit this item bank is log-entropy. 3.Judging two item whether completely identical , the retention dimension higher precision rate is better. Judging two item whether extremely similar, partially similar and slightly similar , when the retention dimension is 30, 15 and 14, the precision rate is better. 4.Regarding (1) the phrase is completely identical, (2) the partial phrase is different, (3) the statement is different, but meaning is identical, (4) the phrase is different, but the meaning is identical, the four kind of item, the system all can correctly judge.

並列關鍵字

latent semantic analysis ； item bank ； similar item

參考文獻

Bellegarda, J.,(2000). Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, 88(8), 1279-1296.

Berry, M. W., Dumais, S. T., & O'Brien, G. W.(1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4),177-196.

Dumais, S. T.(1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments and Computers, 23,229-236.

Fox, C. J. (1990). A stop list for general text. ACM-SIGIR Forum, 24, 19-35.

Harman, D. (1992). Ranking algorithms. In W.B. Frakes and R. Baeza-Yates, eds. Information retrieval: Data structures and algorithms. Englewood Cliffs NJ: Prentice Hall, 363-392.

被引用紀錄

黃巧媛（2016）。應用潛在語意分析於試題相似度比對 -以中華民國物流協會認證題庫為例〔碩士論文，國立臺中科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0061-1907201621365000

國際替代計量

應用潛在語意分析於測驗題庫相似性之比對

主題瀏覽