Natural Language Processing in Pathology Report of Breast Cancer

指導教授 : 邵于宣


病理報告是醫院中心裏最重要的醫療報告之一,病理科醫師必須於檢查病人的檢體後,將結果寫成報告交給臨床醫師,做為臨床治療參考。正因為不同病人的病灶與檢體不同,為讓醫師方便書寫,病理報告通常以未結構化的文字撰寫。一般此類病理報告在醫院歸檔時會加上病理號,因此當病理報告有查詢的需求時,只能透過病人的病歷號或病理號等方式為查詢條件。此種查詢最大的缺點在於無法滿足臨床醫師希望能取得與某份病理報告內容相似的其他類似報告之查詢需求。本次以400份乳癌病理報告為研究資料來源,以自然語言處理技術搭配機器學習擷取病理報告中的檢體資訊內容,最後結果以10種標籤設計作內容分類結果最好,平均token準確率達到9成以上,但若要作進階內容查詢則必須提升以醫療專有名詞為主的標籤準確率。 本研究希望將報告內容自動進行分類,期許日後能開發出能讓醫師以各類關鍵字並快速找到相關病理報告。


Pathology report is one of the most important medical reports in medical centers. Pathologists examine patients’ specimens and write reports based on exams outcomes, and the clinicians will use these reports as the primary diagnosis reference while making diagnostic decision towards treatments of illness. For the differences between patients’ illness and specimens, it is difficult to ask pathologists to write report in a specific way, especially for the different exams outcomes. Therefore, pathologists report in free text format. Generally, the query criteria of pathology will be medical record number or pathology number. However, there is a gap between what clinicians really want to query in other criteria and the output based on these two. Therefore, we used Natural Language Processing and Machine Learning techniques to extract information from pathology reports and classified them. We hope the outcome of this research can reduce the effort of pathologists and the information retrieval system enhanced by the proposed method can enable them retrieve the required information timely and accurately.


