透過您的圖書館登入
IP:216.73.216.60
  • 期刊

Research on Crawling Network Information Data with Scrapy Framework

摘要


In the Internet era of big data, the emergence of crawlers significantly improves information retrieval efficiency. This paper briefly introduced the basic structure of crawler software, the scrapy framework, and the clustering algorithm used to improve the performance of information crawling and classification. Then, the crawler software and clustering algorithm were programmed by the python software. Experiments were carried out using the MATLAB software in the LAN in a laboratory to test the Weibo data between October 1 and October 31. Moreover, a crawler software that adopted the scrapy framework but did not add the clustering algorithm was taken as a control. The results showed that the scrapy framework based crawler software could not achieve the same Weibo information classification as the actual classification whether the clustering algorithm was added or not; the crawler software that was added with the clustering algorithm was closer to the exact proportion in classification and obtained classification results with higher accuracy and lower false alarm rate in a shorter time.

延伸閱讀