在網際網路的普及化以及網路使用者的快速成長下,如何快速並有效率的處理使用者的海量資料(Big Data),是一個相當重要的議題。然而在提升處理海量資料效能的方式上,一般提升硬體效能的方式已無法負荷,而一般傳統關聯式資料庫在分散式擴展(scale out)的方式上也無法有良好的效能發揮。而在雲端運算這幾年的蓬勃發展下,延伸出了NoSQL的技術來解決處理海量資料問題。而在NoSQL的眾多工具中,MongoDB 為其中發展迅速的資料庫工具之一,有別於關聯式資料庫以列為主(row-oriented)的儲存方式,MongoDB 採用以文件式為主(document-oriented)的方式來處理資料。由於在架構設計上的不同,關聯式資料庫中的眾多資料表並不一定都適合NoSQL資料表的設計方式,且也需依據 MongoDB 的特性來規劃資料表。本論文提出透過結合(hybrid)關聯式資料庫和MongoDB的方法,分析資料表中較適合於MongoDB的應用做反正規化,接著進一步規劃如何做資料的分散處理,使得在資料處理的延展性(scalability)上相較於單一使用關聯式資料庫上能夠有更好的效能提升。
Under the popularity of the Internet and the rapid growth of number of network users, how to process the mass data (Big Data) of users fast and efficiently is a very important issue. Enhance the performance of hardware, however, has been unable to cope with processing massive data anymore. Also, the traditional database can’t perform well on the approach with scale out. With cloud computing booming in recent years, the NoSQL technology has been developed to solve the problem of dealing massive data. MongoDB is the database tool with rapid growth in the numerous NoSQL tools. Different from row-oriented storage of the relational database, MongoDB adopt the document-oriented storage to deal with data. Because the different of structure design, many tables in the relational database are not suitable for the NoSQL data design, and also need to design tables according to the characteristics of MongoDB. This paper presents a hybrid approach of relational database and MongoDB. We propose several strategies to analyze which parts are suitable for MongoDB. Then further plan how to scale the processing of data, making the scalability of data processing have better performance improvement compared to the relational database.