我們利用資料探勘技術從大量的病歷資料中進行診斷項目與診斷結果之間的分析比較,藉此獲得綜合數個診斷項目,推導診斷結果的可行性,並進一步探討診斷結果的預測性。本研究發展的雛形系統準確率可達95%。 在醫療診斷資料中蘊含著許多有價值的資訊,而如何自這些資料中萃取出有用的資訊,資料探勘已儼然成為不可或缺的工具。所謂資料探勘是指從大量資料或大型資料庫中由電腦自動選取一些重要的、潛在有用的資料類型或知識以做為決策分析之參考。目前資料探勘所包含的各種技術已被廣泛的應用在許多領域上,例如,商業交易資料的購物籃分析與資料檔案檢索等。 本研究主要分為3階段:第1階段,首先計算各診斷項目資料間的相關係數(Correlation Coefficient),去除(Prune)係數較小的診斷項目,以達到精簡龐大資料量之目的。第2階段,找出各診斷項目資料之最佳分佈(Distribution),並藉以產生隨機值以補齊診斷項目中的遺失資料(Missing Value)。第3階段,藉由AND模組運算產生重要診斷項目與診斷結果間的規則。接著,應用規則推導(Rule Induction)方法中的J-Measure(Symth, 1992),計算各規則之資訊獲益(Information Gain,即J-Information)並保留有用的規則。最後,再佐以澳洲研究機構之甲狀腺診斷資料((Thyroid Disease Database, TDD, 1987))驗證規則之正確性。根據實驗結果數據,我們提出之方法能依據診斷項目檢查值有效預測診斷結果,也直接證實了運用本方法於輔助醫療診斷之可行性。
There are lots of valuable information that are hidden in medical databases, however, it is often too tedious or too complicate to discover useful knowledge from then. So that, how to use effective methods to extract information from large medical records has become an important issue today. The principle of data mining is in sorting through large amount of data and filtering out relevant information. It has been described as ”the nontrivial extraction of implicit, previously unknown, and potentially useful information from data” and ”the science of extracting useful information from large data sets or databases.” To date, data mining techniques have been widely used in many fields such as education and e-commerce, etc. By applying data mining techniques, we proposed the Computer-aid Disease Diagnostic System (CDDS), which can be used to evaluate the relationship between diagnostic items and diagnostic from a large medical database to induce valuable information, rules, and to predict the diagnostics. CDDS takes three stages to complete the work: (1) reduces database size by calculating the correlation coefficients between diagnostic items and diagnosing decision, and prune items whose correlation coefficients are small; (2) find the best-fit probability distribution and generate random variates to fill in the missing values among those records; (3) employ AND operations on diagnostic items to generate rules, and calculate J-Information of each rule. Retain rules with higher J-Information and use them to predict the diagnostic. In our experiment, the ratio of correctness is 95%. As you can see, by applying CDDS, we can not only extract valuable information from medical databases but also provide some aids to those medical professionals in diagnosing diseases.