透過您的圖書館登入
IP:18.191.89.23
  • 期刊

運用文字探勘技術探討性相關議題之研究-以PTT論壇feminine_sex板為例

Applying Text Mining Techniques to Sexual Issues on PTT feminine_sex

摘要


隨著全球進入資料科學的時代,巨量資料來源不僅僅只有結構的資料,文字及不具結構化的資料在我們的生活中也到處可見。使用網路蒐集資訊儼然成為上網的重要目的,挖掘民眾所關注之性相關議題便成為瞭解民眾對性的態度及性知識是相當重要的方法。本研究使用R語言撰寫爬蟲程式來自動抓取批踢踢(PTT)論壇女性性板(feminine_ sex)的文章,蒐集一個年度共1,438篇的文章,從語料庫大量的文字資訊中,我們其實很有機會在性議題裡發展出各種有潛力及有趣的應用,這正是本研究在文字探勘技術的目標。feminine_ sex板經過自然語言斷詞處理,研究結果顯示出現次數最頻繁的前三個詞彙為醫生、問題與男友。主題模型透過K-Means集群演算法,分析結果經命名後呈現大眾討論的議題大多圍繞在親密關係、避孕諮詢以及衛生醫療三個主要議題,而此研究結果亦可提供教育及醫療相關單位,實施性教育及衛教訓練的補強。

並列摘要


Entering the era of information science globally, we find that big data not only contain structured information but also include text and unstructured information. The use of the internet for information collection has become one of the important purposes of the internet. Therefore, it is very important that doing research on how people concerned about the sexual issues could help us to understand people's attitude on sex and their sexual knowledge. This study used the web crawler which created by R language to automatically extract the articles from the feminine sex board, collecting a total of 1,438 articles in one year. Then, from a large amount of information in the text corpus, we were actually given a chance to develop a variety of potential and interesting applications in sexual issues, which is the purpose of this study in the text mining techniques. After the word segmentation in the natural languages processing, the results showed that the three most frequent words in feminine_sex board are doctor, problems, and boyfriend. We used the K-Means cluster algorithm on the topic model. After classifying the analysis results, we get to know that the public discussion topics are mostly about three main issues, which are the intimate relationship, contraceptive counseling, and health care. Hence, we can provide the results for the respective educational and medical authorities to advocate sex education and to improve health care training on this related topic.

並列關鍵字

Text Mining Sexual Issues PTT Web Crawler Topic Model

參考文獻


丁怡婷、劉志光(2010)。文字探勘技術應用於中醫診斷腦中風之研究。數據分析,5(4),41-64。
古鐘响(2009)。黃色笑話收集與性學分析研究。未出版之碩士論文,樹德科技大學人類性學研究所,高雄市。
朱瑀馨(2007)。運用資料探勘技術於人壽保險業顧客關係管理之研究。淡江大學保險學系保險經營研究所碩士論文,淡江大學保險學系保險經營研究所,台北縣。
黄文、王正林(2015)。利用R語言打通大數據的經脈。台北市:佳魁資訊。
陳怡廷、陳麗如、吳姿瑩(2016)。從部落格探索客家旅遊目的地意象之研究─自然語言處理的方法與應用。戶外遊憩研究,29(2),81-111。

延伸閱讀