透過您的圖書館登入
IP:3.15.225.173
  • 學位論文

基於Apache Spark 建構串流 XML Veracity 真實度之模型

Veracity Model of Streaming XML Document Based On Apache Spark

指導教授 : 陳世穎
本文將於2024/09/12開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年大數據的數量飛速成長,而龐大的數據會產生大量的應用。每一個應用都會產生資料交換,XML (eXtensible Markup Language)作為現今通用的網路資料交換格式,隨著網際網路資料的增長,也已經同樣具有大數據(Big Data)的特徵。本研究建構XML真實度模型應用程式開發介面(XML Veracity Model API)來解決大數據在資料傳輸中真實度不易量化的問題。XML文件真實度基於資料理解性有很多面向可以做詮釋,使用真實度模型API,使用者可以自行設計自己所認為的文件真實度的維度以及屬性,並產生量化的分數。且為了因應現今串流資料應用的增加,且又因串流XML資料又有結構上的問題,難以驗證真實度。本研究將真實度模型應用到串流資料,並且使用Apache Spark 增加模型的處理效能,來達到快速處理串流XML的目的,以及驗證真實度模型的設計架構。

關鍵字

巨量資料 XML 串流 Apache Spark 真實度

並列摘要


In recent years, the volume of data contained by the big data has increased rapidly, and huge data will generate a lot of applications. Every application generates data exchange. XML (Extensible Markup Language) is a common network data exchange format. With the growth of Internet data, it also has the characteristics of big data. This study constructs the XML Veracity Application Programming Interface (XML Veracity Model API) to solve the problem that the realism of big data is not easy to quantify in data transmission. The truth of XML file is based on data understandability and can be interpreted and used. Veracity Model API, users can design their own dimensions and attributes of document realism and generate quantified scores. In order to respond to the increase in the application of streaming data today, and because of the structural problems of streaming XML data, it is difficult to verify the realism. This study applies the Veracity model to streaming data and uses Apache Spark to increase the processing performance of the model to achieve the purpose of quickly processing streaming XML and verifying the design architecture of the realism model.

並列關鍵字

Big Data XML Streaming Apache Spark Veracity

參考文獻


[1] A Standard Textual Interchange Format for the Object Exchange Model (OEM). http:
//infolab.stanford.edu/~mchughj/oemsyntax/oemsyntax.html. Retrieved
[2] Apache hadoop. https://hadoop.apache.org. Retrieved on December, 2018.
[3] Apache spark. https://spark.apache.org. Retrieved on October, 2018.
[4] Dash operationalizes Python and R models at scale. https://plot.ly/dash. Retrieved

延伸閱讀