Translated Titles

Theme Classification for HIV/AIDS Medical Questions-A Case Study of Yahoo! Answers





Key Words

文件探勘 ; 推薦系統 ; 議題分類 ; 基因演算法 ; Text Mining ; Recommendation System ; Theme Classification ; Genetic Algorithm



Volume or Term/Year and Month of Publication


Academic Degree Category




Content Language


Chinese Abstract

近幾年在健康醫療服務(Health Care Service)之研究趨勢上,對於醫療諮詢者的需求,導向於建構實體和虛擬管道進行醫療諮詢之環境。隨著網際網路的發展,網路平台已成為尋求醫療諮詢與討論之熱門虛擬管道。從社會觀感來看,被貼上負面標籤的愛滋病議題上,提問者傾向選擇透過網路管道而逃開使用實體管道尋求相關資訊,其中尤其以尋求HIV/AIDS相關醫療諮詢更加明顯。   因此本研究針對愛滋病議題,開發Crawler程式以收集存在「Yahoo!奇摩知識家」之愛滋病提問文章,透過文件探勘技術(Text Mining)、人工智慧(Artificial Intelligence)等技術,提出一套愛滋病相關文章自動化議題分類機制,以有系統的彙整各項愛滋討論議題建立愛滋病提問之醫療回覆推薦系統資料庫,以提供相關單位及人員後續應用。   實驗結果發現同時考量平均字詞權重與出現位置區塊,予以加權計算,以及針對出現位置區塊進行適切的文章區塊劃分,皆能有效提升分類效果。本研究提出的權重處理方法之分類F-measure為76.32%。

English Abstract

Studies on health care service have been conducted to provide the medical consultation environments to the people who seek the medical suggestions in the real or the virtual way. With the fast development of the Internet, network platforms have become the popular place supplying with medical asking and discussion. Moreover, askers will choose virtual way instead of the real way to quest about the stigmatized HIV/AIDS issue. If people take the unreliable and unverified medical advice from the Internet would cause some problems. For solving this issue, we collect the HIV/AIDS questions from “Yahoo! Answers” and use Text Mining and Artificial Intelligence to conduct a HIV/AIDS recommendation system. Askers can use the text form to describe their problems or the events they suffered, and the system will timely respond the dependably medical recommendation. In the experiment, we found adjustively dividing article can improve the effectiveness of the classification. In addition, considering statistical and linguistic keywords weight can also enhance the accuracy. The F-measure of the best  classification result shows 76.34%.

Topic Category 資訊學院 > 資訊管理學系
社會科學 > 管理學
  1. 1.Bull, S. S., Mcfarlane, M., & King, D., “Barriers to STDs/HIV prevention on the Internet,” Health Education Research, Vol. 16, No. 6, pp. 661-671, 2001
  2. 2.Besnehard, Q., Marchessoux, C., Kimpe, T., “Generic and Optimized Framework for Multi-Content Analysis Based on Learning Approaches,” Proceedings of SPIE - The International Society for Optical Engineering 7540, Art. No. 75400V, 2010.
  3. 3.Cover, T., and Thomas, J.A., “Elements of Information Theory,” Wiley, 1991.
  4. 4.Creecy, H., Masand, M., Smith, J.,and Waltz, D., “Trading Mips and Memory for Knowledge Engineering,” Communications of the ACM, Vol. 35, No. 8, pp. 48-63, 1992.
  5. 7.Dong, J., H. Cao, P. Liu and L. Ren, “Bayesian Chinese Spam Filter Based on Crossed N-gram,” Proceedings of the 6th International Conference on Intelligent Systems Design and Applications (ISDA), Vol. 3, No. 4021867, pp. 103-108, 2006.
  6. 8.Espejo, P.G., Ventura, S., Herrera, F., “A Survey on The Application of Genetic Programming to Classification,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews 40 (2), Art. No. 5340522, pp. 121-144, 2010.
  7. 10.Eysenbach, G., “Patient-to-Patient Communication: Support Groups and Virtual Communities,” in Lewis et al., (eds.), Consumer Health Informatics – Informing Consumers and Improving Health Care, pp. 97-106, 2006.
  8. 11.Francisco, G..S., Rafael, V.G., Rodrigo, M.B. “An integrated approach for developing e-commerce applications,” Expert Systems with Applications, Vol. 28, Issue 2, pp. 223-235, 2005.
  9. 14.Hagel, J., and Amstrong, A.G., Net Gain: Expanding Markets Through Virtual Communities, McKinsey & Company, 1997.
  10. 16.Hill, W., Stead, L., Rosenstein R., and Furnas G., “Recommending And Evaluating Choices In A Virtual Community Of Use,” Conference on Human Factors in Computing Systems, pp. 194-201, 1995.
  11. 17.Ho,T.C.T. and Xiang C., “ExerTrek: A Portable Handheld Exercise Monitoring, Tracking and Recommendation System,” IEEE International Conference on e-Health Networking, Applications and Services, No. 5406194, pp. 84-88, 2009.
  12. 18.Ittoo, A.R., Zhang, Y., Jiao, J., “A Text Mining-based Recommendation System for Customer Decision Making in Online Product Customization, ” IEEE International Conference on Management of Innovation and Technology, No. 4035880, pp. 473-477, 2006.
  13. 20.J.D. Johnson, Cancer-Related Information Seeking, Baker & Taylor, 1997.
  14. 21.Jong-Hun Kim, Jung-Hyun Lee, Jee-Song Park, Young-Ho Lee and Kee-Wook Rim,“Design of Diet Recommendation System for Healthcare Service Based on User Information,” 4th International Conference on Computer Sciences and Convergence Information Technology, No. 5367898, pp.516-518, 2009.
  15. 22.Keller, S. G., Labelle, H., Karimi, N., & Gupta, S., “STD/HIV prevention for teenagers: A look at the Internet universe,” Journal of Heal Communication, Vol. 7, No. 4, pp. 341-53, 2002.
  16. 24.Lan, M., Tan, C. L., Su, J., & Lu, Y., “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, Issue 4, pp. 721-735, 2009.
  17. 26.Ma, W.Y. & Chen, K.J., “A bottom-up Merging Algorithm for Chinese Unknown Word Extraction,” Proceedings of ACL workshop on Chinese Language Processing, Vol. 17, pp. 31-38, 2003.
  18. 28.Palmer, J., “Designing for Web Site Usability” IEEE Computer, Vol. 35, Issue 7, pp. 10 –103, 2002.
  19. 29.Pattaraintakorn, P., Zaverucha, G.M. and Cercone, N., “Web Based Health Recommender System Using Rough Sets, Survival Analysis and Rule-Based Expert Systems,” Lecture Notes In Artificial Intelligence, Vol. 4482, pp.491-499, 2007.
  20. 30.Pei, Z., Shi, X., Marchese, M., Liang, Y., “Text Categorization Method Based on Improved Mutual Information and Characteristic Weights Evaluation Algorithms,” Proceedings of Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, No. 4406352, pp. 87-91, 2007.
  21. 31.Phan, X. H., Nguyen, L. M., Horiguchi, S., “Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections,” Proceeding of the 17th International Conference on World Wide Web 2008, pp. 91-99, 2008.
  22. 32.Phanich, M., Pholkul, P. and Phimoltares, S., “Food Recommendation System Using Clustering Analysis for Diabetic Patients,” Information Science and Applications, pp.1-8, 2010.
  23. 33.Phuc, D., Phung, N.T.K., “Using Naive Bayes Model and Natural Language Processing for Classifying Messages on Online Forum,” 2007 IEEE International Conference on Research, Innovation and Vision for the Future, RIVF 2007 , No. 4223081, pp. 247-252, 2007.
  24. 36.Reeves, P. M., “How individuals coping with HIV/AIDS use the Internet,” Health Education Research, Vol. 16, No. 6, pp. 709-719, 2001.
  25. 37.Rheingold Howard, Virtual Community: Homesteading on the Electronic Frontier, Reading, Mass.: Addison-Wesley, 1993.
  26. 39.Romm, C., Plisjin, N., and R., “Virtual communities and society:Toward and Intergrative three phase model,” International Journal of Information Management, Vol. 17, No. 4, pp. 261-270, 1997.
  27. 41.Serrano, J. I., and del Castillo, M. D. del, “Evolutionary Learning of Document Categories,” Information Retrieval, vol. 10, no. 1, pp. 69-83, 2007.
  28. 42.Teo, H.H., Chan, H.C., Wei, K.K., and Zhang, Z., “Evaluation Information Accessibility and Community Adaptivity Features for Sustaining Virtual Learning Communities,” International Journal of Human-Computer Studies, Vol. 59, No. 5, pp. 671-697, 2003.
  29. 44.Wang, F., Carley, K.M., Zeng, D., and Mao, W., “Social Computing: From Social Informatics to Social Intelligence,” IEEE Intelligence Systems, Vol. 22, No. 2, pp. 79-83, 2007.
  30. 46.Yang, Y., Pederson, J. O., “A Comparative Study on Feature Selection in Text Categorization,” International Conference on Machine Leaning (ICML'97), 1997.
  31. 49.Zorkadis, V., D. A. Karras and M. Panayotou, “Efficient Information Theoretic Strategies for Classifier Combination, Feature Extraction and Performance Evaluation in Improving False Positives and False Negatives for Spam E-mail Filtering,” Neural Networks, Vol.18, Issue 5-6, pp. 799-807, 2005.
  32. 5.Case, D.O., Johnson, J.D., Andrews, J.E., Allard, S.L., Kelly, K.M, “From two-step flow to the Internet: The changing array of sources for genetics information seeking,” Journal of the American Society for Information Science and Technology, Vol. 55, No. 8, pp. 660-669, 2004.
  33. 6.Diaz, J.A., Griffith, R.A., Ng, J.J., Reinert, S.E., Fredmann, P.D., and Moulton, A.W., “Patients’ Use of the Internet for Medical Information,” Journal of General Internal Medicine, Vol. 17, No. 3, pp. 180-185, 2002.
  34. 9.Eysenbach, G., and Diepgen, T.L., “Patients Looking for Information on the Internet and Seeking Teleadvice,” Archives of Dermatology, Vol. 135, No. 2, pp. 151-156, 1999.
  35. 12.Frohlich, H., & Chapelle, O., “Feature Selection for Support Vector Machines by Means of Genetic Algorithms,” Proceedings of the 15th IEEE international Conference on Tools With Artificial Intelligence, Sacramento, CA, USA (2003). 142–148, 2003.
  36. 13.Goh, C.-L., M. Asahara and Y. Matsumoto, “Chinese Word Segmentation by Classification of Characters,” Computational Linguistics and Chinese Language Processing, Vol. 10, No. 3, pp. 381-96, 2005.
  37. 15.Harrag, F., El-Qawasmah, E., “Neural Network for Arabic Text Classification,” 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009 , Art. No. 5273841, pp. 778-783, 2009.
  38. 19.J. A. Alspector, A. Kolcz, and N. Karunanithi, “Comparing feature-based and clique-based user models for movie selection,” in Proceedings of the Third ACM Conference on Digital Libraries, Pittsburgh, Pennsylvania, USA, June 1998, pp. 11-18.
  39. 23.Khan A.S. and Hoffmann A., “Building a case-based diet recommendation system without a knowledge engineer,” Artificial Intelligence in Medicine, Vol. 27, No. 2, pp.155–179, 2003.Information Science and Applications, pp. 1-8, 2010.
  40. 25.Lessmann, S., Stahlbock, R., Crone, S.F., “Genetic Algorithms for Support Vector Machine Model Selection,” IEEE International Conference on Neural Networks - Conference Proceedings, Art. No. 1716515, pp. 3063-3069, 2006.
  41. 27.Meijuan, G., Jingwen, T., Shiru, Z., “Research of Web Classification Mining Based on Classify Support Vector Machine,” 2009 Second ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2009 2, Art. No. 5268004, pp. 21-24, 2009.
  42. 34.Q. Li and B. M. Kim, “Clustering approach for hybrid recommender system,” in Proceedings of the IEEE/WIC International Conference on Web Intelligence, 2003, pp. 33-38.
  43. 35.Qu, H., La Pietra, A., Poon, S., “Automated Blog Classification: Challenges and Pitfalls,” AAAI Spring Symposium - Technical Report SS-06-03, pp. 184-186, 2006.
  44. 38.Rodriguez, A., Jimenez, E., Fernandez, J., Eccius, M., Gomez, J.M., Alor-Hernandez, G., Posada-Gomez, R. and Laufer, C., “SemMed: Applying Semantic Web to Medical Recommendation Systems,” Intensive Applications and Services, pp.47-52, 2009.
  45. 40.Rujiang, B., Junhua, L., “A Hybrid Documents Classification Based on SVM and Rough Sets,” Proceedings - 2009 International E-Conference on Advanced Science and Technology, AST 2009 , Art. No. 5231746, pp. 18-23, 2009.
  46. 43.Vidhya, K.A., Aghila, G.,“Hybrid Text Mining Model for Document Classification,” 2010 The 2nd International Conference on Computer and Automation Engineering, ICCAE 2010 1, Art. No. 5451965, pp. 210-214, 2010.
  47. 45.Wang, X., Hua, Z., Bai, R., “A Hybrid Text Classification Model Based on Rough Sets and Genetic Algorithms ,” Proc. 9th ACIS Int. Conf. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2008 and 2nd Int. Workshop on Advanced Internet Technology and Applications , Art. No. 4617495, pp. 971-977, 2008.
  48. 47.Yuan, F., Yang, L., Yu, G.E., “Improving The K-NN And Applying It to Chinese Text Classification,” 2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005, pp. 1547-1553, 2005.
  49. 48.Zheng, W., “A SVM Text Classification Approach Based on Binary Tree,” IFCSTA 2009 Proceedings - 2009 International Forum on Computer Science-Technology and Applications 3, Art. No. 5384927, pp. 455-458, 2009.
Times Cited
  1. 李玉嬋(1999)。內科護理實習學生健康諮商技巧訓練之成效研究。臺灣師範大學教育心理與輔導學系學位論文。1999。1-269。
  2. 洪愛琇(2005)。臺灣中醫門診護理人員之專業發展。臺北醫學大學護理學系碩士暨碩士在職專班學位論文。2005。1-209。