Building Big Data Platform on Hadoop and Spark with Microservice Architecture

We design a big data platform based on Hadoop (HDFS, Spark, MapReduce) and Hive. The platform is built on the virtualization cloud platform. Firstly, multiple virtual machines are created on the cloud platform, and then Hadoop distributed storage system and distributed computing system are deployed on the virtual machines cluster. The Hive data query and analysis platform is deployed based on Hadoop. At the same time, we have done Internet search and social data analysis based on the big data platform. The experimental results show that in the loading, mapping, query and statistical analysis of Internet big data, the big data platform cluster constructed by multiple machines has higher efficiency and throughput rather than that of a single machine.

關鍵字

Big data ； Spark ； MapReduce

參考文獻

A.O. Hassan, A.A Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study, Advances in Applied Sciences,2021, 6, 2021.

Zhai Yanlong, T.K Jude, L.K Jay. Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS, Journal of Parallel and Distributed Computing, 2021, 156, 119-130.

Sundarakumar M. R., Mahadevan G., Somula Ramasubbareddy. An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce, International Journal of System Dynamics Applications, 2021, 10, 1-25.

Kadkhodaei Hamidreza, Eftekhari Moghadam Amir Masoud Dehghan Mehdi. Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm, [J]Expert Systems with Applications, 2021,183.

Ruihong Zhang,Zhihua Hu. Comparative Research on Active Learning of Big Aata based on Mapreduce and Spark, Microprocessors and Microsystems, 2020,103425.

國際替代計量

Building Big Data Platform on Hadoop and Spark with Microservice Architecture

全文下載

主題瀏覽