Title

利用Shannon's Entropy作特徵維度切割以改善貝氏分類器正確率之研究

Authors

黃士恆

Key Words

貝氏分類器 ; 黃金分割 ; Shannon's Entropy ; 特徵維度分割 ; Bayesian Classifier ; Golden Section ; Shannon's Entropy ; Feature Dimension Section

PublicationName

臺灣師範大學工業教育學系學位論文

Volume or Term/Year and Month of Publication

2004年

Academic Degree Category

碩士

Advisor

何宏發

Content Language

繁體中文

Chinese Abstract

本研究所提出的貝氏分類器是基於機率理論的貝氏定理之分類器,它根據樣本的分類結果計算出各個特徵之區間的機率值,然後計算出測試案例發生各種結果的機率,是一個在理論與實務上最佳的分類器。由於在計算各個特徵維度的區間時,沒有適當的方法可以快速且正確地分割區間,故本研究的目的嘗試以黃金分割作為分割的方法,並以Shannon's Entropy作為判斷是否分割得宜的根據,以便取得最佳的分割區間,並且對於不同的資料,可以一併適用於此分割方法之全自動分割為研究目的。 實驗的結果發現並不是全部的資料庫都可以有很高的正確率,乃是因為資料分佈與資料重要性的問題,這雖也呼應了貝氏定理的獨立性假設是必須的,但從整體表現不錯角度來看,本研究之貝氏分類器所擁有的高維度與高容忍性,仍是一個可利用的分類器。

English Abstract

This proposed Bayesian Classifier is a classifier based on Bayes' Theorem of Probability. The Bayesian Classifier, best in theory as well as in practice, calculated the probability within the intervals of input features according to the training set and class, and then calculated the probability in every class using testing set. Because the intervals of the input feature cannot be sectioned correctly and rapidly, the research aims to use the golden section as a way to get the intervals in each feature, and the Shannon's Entropy is used to check if the intervals sectioned by golden section are appropriate and optimal. Moreover, the research proposed to apply this method to automatic section in the different sets. Results of experiments showed that not all databases have high accuracy due to problems in data distribution and importance. Though the results prove that the independent assumption of the Bayes' Theorem is necessary, the Bayesian Classifier has its advantage of good recognition rates and reliability in the high dimension.

Topic Category 科技學院 > 工業教育學系
工程學 > 工程學總論
社會科學 > 教育學