Research on Crawling Network Information Data with Scrapy Framework

In the Internet era of big data, the emergence of crawlers significantly improves information retrieval efficiency. This paper briefly introduced the basic structure of crawler software, the scrapy framework, and the clustering algorithm used to improve the performance of information crawling and classification. Then, the crawler software and clustering algorithm were programmed by the python software. Experiments were carried out using the MATLAB software in the LAN in a laboratory to test the Weibo data between October 1 and October 31. Moreover, a crawler software that adopted the scrapy framework but did not add the clustering algorithm was taken as a control. The results showed that the scrapy framework based crawler software could not achieve the same Weibo information classification as the actual classification whether the clustering algorithm was added or not; the crawler software that was added with the clustering algorithm was closer to the exact proportion in classification and obtained classification results with higher accuracy and lower false alarm rate in a shorter time.

關鍵字

Clustering Algorithm ； Crawler Software ； Network Data ； Scrapy Framework

國際替代計量

全文下載

主題瀏覽

Research on Crawling Network Information Data with Scrapy Framework

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies