醫學文獻優化查詢

醫學文獻數量隨著電腦與網路的普及呈現指數成長，在MEDLINE/PubMed上的文獻數量從1990年底的676萬筆，到2011年的現在已經累積2000萬筆以上的文獻。對於繁忙的醫療專業人員來說，要從這樣的龐大的資料庫中搜尋適合的文獻是一大負擔。為了輔助使用者的搜尋工作，搜尋引擎的介入是必要的。然而，PubMed所提供預設的搜尋策略並沒有辦法有效回傳相關結果，經驗不足的使用者需要不斷的try and error才能找到符合的文章，且缺乏良好的排序機制。本研究試著實作一系統，在搜尋的部份使用者除了可以透過關鍵字查詢外，還可以輸入句子甚至是文章來做搜尋，透過使用者與系統介面的互動，來達到快速優化查詢的效果，並以相關性來作為排序文章的機制。另外在系統中針對醫學文獻的摘要即時利用PICO的分類器作句子的分類，協助使用者在閱讀時更快的鎖定目標句。最後在實驗結果與討論的部分，本研究從PubMed中蒐集明確描述Patients、Intervention、Outcome相關的句子當成實驗的材料，實驗材料中有分成訓練資料與測試資料，訓練資料透過NLTK的貝氏分類器建構三組分類模型並且透過PICO分類演算法來對測試資料進行分類，最後透過10-fold cross validation評估PICO分類系統的效能。

關鍵字

搜尋引擎；資訊檢索；資料探勘； PICO

並列摘要

The number of medical literature as the popularity of computer and network grow exponentially , the number of medical literature end of 1990 from 676 million in the MEDLINE / PubMed and in 2011 has now accumulated more than 20 million medical literature. For busy medical researcher and physician, from this huge database to search for literature is a major burden. To complement the work of the user's search, the search engine intervention is necessary. However, PubMed provides the default search strategy is not an effective way to return relevant results, inexperienced users need to constantly try and error to find the related article, and the lack of good sorting mechanism. This study tried to implement a system, the user can according keywords to queries, also you can type a sentence and even articles do search through the user interaction with the system interface to find the related articles, and provides sort of mechanism by relevance to help user find the articles quickly. Also in the system for real-time classification of medical literature using PICO classifier for the classification of sentences to help users read faster when user got the target sentence. Finally, in the experimental results and discussion we collection the clearly described Patients, Intervention, Outcome-related sentences as the experimental materials, experimental materials are divided into training data and test data, the training data through the NLTK Bayesian classifier to construct three classification model and through the PICO classification algorithms to classify the test data by 10-fold cross validation for assessing the performance of PICO.

並列關鍵字

Search Engine ； Information Retrieval ； Data Mining ； PICO

參考文獻

[2] Wilczynski NL, Haynes RB for the Hedges Team. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey. BMC Med. 2004 Jun 09

[3] Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. BMJ. 2005 May 13

[4] Haynes RB, Wilczynski NC for the Hedges Team. Optimal search strategies for retrieving scientifically strong studies of diagnosis from MEDLINE: analytical survey. BMJ. 2004 May 1

[5] Angela A. Chang. "Searching the Literature Using Medical Subject Headings versus Text Word with PubMed". The LaryngoscopeVolume 116, 2006

[6] MARGARET H. COLETTI, MLS, HOWARD L. BLEICH,"Medical Subject Headings Used to Search the Biomedical Literature" ,JAMIA ,Jul/Aug ,2001

國際替代計量

醫學文獻優化查詢

全文下載

主題瀏覽