隨著資訊科技的發展及手持裝置與社群網站越來越趨於活絡,各種電子新聞、社群網站的貼文與評論的資料量快速成長且結構複雜。一般而言,資料可簡單的分成結構化資料與非結構化資料,結構化的資料已有許多有效的方法可以運用,像是資料採礦技術,但如文字、聲音、影像等非結構化資料的分析方法,相較之下較為少數,運用本研究的文字探勘平台,挖掘出有效的資訊,將可以快速的從資料中探討其重要意義。本研究希望透過網路上的開源碼整合出一套平台,利用Python 做為後台運算,結合HTML 撰寫網頁程式,把文字探勘的平台架在Django 上。再將夏季旅展的新聞資料匯入平台,做文字探勘相關的分析,如詞雲分析、關聯分析、集群分析、情感分析等,討論夏季旅展資料的意義與脈絡。
With the development of information technology, handheld devices and social networking sites become more and more active, a variety of electronic news and community website postings and comments rapidly growing amount of data and complex structure. In general, the data can be simply divided into structured data and unstructured data, structured data there are many effective methods can be applied, such as data mining technology. But such as text, sound, video and other unstructured data analysis method, compared to relatively few, in this study the use of text mining platform, found out an effective information, will be able to quickly explore its significance from the data. We hope that through this study, an open source web platform for the integration of a set, use Python as a background operation, combined with HTML pages written program, the text mining platform on the shelf in Django. Then TTE news data import platform, do text mining-related analysis, such as word cloud analysis, correlation analysis, cluster analysis, sentiment analysis, etc., to discuss the meaning and context of information TTE.