運用分群演算法建構政府開放資料集語意推薦平台基於雲端運算

由於資料量快速膨脹，導致大數據時代的來臨，資料間的關係也相對增多，當電腦知道這些資料間的關聯性時，就能達到資料鏈結的動作，當前臺灣政府資料開放平臺僅提供內部資料集的查詢，如何讓資料集與其他平臺的資料進行串聯並推薦相關資料是本研究所探討的問題。本研究提出整合K-means與k-NN基於語意雲端運算架構(Integration K-means and k-NN based on Semantic Cloud Computing Framework, IKKSCCF)，並建立鏈結資料查詢平臺(Linked Data Query Platform, LDQP)來驗證IKKSCCF之可行性，提供不一樣的介面查詢方式，並與Facebook的資料進行結合，將使用者近期在Facebook上點讚的粉絲專頁貼文，與政府開放資料集進行關聯分析，針對長期照顧、食品安全、環境保護三大議題，藉由結巴(Jieba)中文斷詞，取出政府開放資料集以及該貼文中的特定詞彙，統計資料集以及貼文各自包含的詞彙數量，並透過K-means將所有資料集進行分群的動作，之後將分類好的資料集與Facebook貼文進行關聯分析，透過k-nearest neighbors algorithm(k-NN演算法)找出該貼文與政府開放資料集的最大關聯性，將此關聯性轉為Resource Description Framework(RDF)所需之物件，並提供於Semantic Web達到語意推論，當使用者在 LDQP進行Facebook帳號登入時，系統將針對使用者近期在Facebook上關注的議題，找出關聯性較高的資料集，提供給使用者進行選取，達到個人化的資料集推薦。

關鍵字

Facebook ；開放資料；結巴中文斷詞； K-means ； k-NN ；語意網；雲端運算

並列摘要

The Big Data Era is coming with the fast-growing data, and the correlation between data is becoming more complicated. If computer knows the data correlation, it can build data linkage. Currently, the open data platform provided by Taiwan government supports query of internal dataset only. This study is going to explore how to link up with the dataset with the data of other platforms, as well as recommend related data. This study proposes Integration K-means and k-NN based on Semantic Cloud Computing Framework (IKKSCCF), and builds the Linked Data Query Platform(LDQP) to validate the feasibility of IKKSCCF. It provides a different method for interface query and combines with the Facebook data to perform correlation analysis between the fans page articles liked by the user on Facebook and the open dataset provided by the government. For the three topics of long-term care, food safety and environmental protection, it screens out the specific vocabulary in the open dataset and the articles by using Jieba suite, so as to count the vocabularies contained in the open dataset and the articles respectively. Moreover, it clusters all datasets through K-means, and then conducts correlation analysis for the classified datasets and the Facebook articles. With k-nearest neighbors algorithm, it finds out the maximum correlation between the article and the open dataset, which is converted into the object needed by Resource Description Framework, and then provided for Semantic Web to achieve semantic inference. When the user logs in Facebook through LDQP, the system will find out the highly-correlated datasets based on the topics concerned by the user on Facebook lately. These are provided for the user to make selection, achieving the individualized recommendation of datasets.

並列關鍵字

Facebook ； Open Data ； Jieba ； K-means ； k-NN ； Semantic Web ； Cloud Computing

參考文獻

[1] J. a.Barnes, “Class and Committees in a Norwegian Island Parish,” Hum. Relations, vol. 7, no. 1, pp. 39–58, 1954.

[2] Y.Dewi, L.Widyasari, L. E.Nugroho, and A. E.Permanasari, “The Benefit of the Web 2 . 0 Technologies in Higher Education : Student ’ s Perspectives,” 2016 3rd International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pp. 278–282.

[3] J.Zhang, X.Li, and L.Zhang, “Exploring the Virtual Reference Service Based on Web 3 . 0 Environments in the Library,” 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), pp. 862–866, 2015.

[4] A.Eftekhar, C.Fullwood, and N.Morris, “Capturing personality from Facebook photos and photo-related activities: How much exposure do you need?,” Comput. Human Behav., vol. 37, pp. 162–170, 2014.

[5] A.Dixit, A. K.Yadav, and S.Kumar, “An Efficient Architecture and Algorithm for Server Provisioning in Cloud Computing using Clustering Approach,” 2016 International Conference System Modeling & Advancement in Research Trends (SMART).

國際替代計量

運用分群演算法建構政府開放資料集語意推薦平台基於雲端運算

未授權

主題瀏覽