透過您的圖書館登入
IP:3.139.97.157
  • 學位論文

NoSQL效能與穩定性之研究-以HBase為例

指導教授 : 王存國
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


商業競爭益趨激烈的今天,如何貼近消費者需求,取得競爭優勢,無不是各家廠商競相努力的方向,近年來資料探勘與巨量資料Big Data變成一門顯學。如何萃取出資料中所隱含的資訊及價值,引起了廣大的關注。更精確的消費行為分析與預測,從顧客的各項資料中汲取有利於商業價值的資訊,以擬訂企業經營策略,強化競爭優勢,獲取最大利潤,巨量資料絕對是強而有力的工具。 巨量資料平台Hadoop技術雖帶來了許多優點,如可横向擴展能力、單一節點故障不影響整體叢集系統的好處之外,亦有其先天上的限制。本研究藉由個案分析方法,針對NoSQL Database HBase運行時,所遇到的效能與穩定性問題進行探討,透過彙整HBase實際運行時所遇到的問題,以系統化、結構化問題分析與解決方法,找出影響HBase運行時問題發生與效能不佳的關鍵性因子,並依據各個影響因子,擬訂可行的解決方案,進行評估與追蹤後續問題的再發生率,以確認所提出之解決方案的有效性。 研究結果顯示,HBase效能與穩定性會受到底層元件,如Hadoop平台的穩定性、Hadoop DataNode及作業系統相關參數設定影響之外,每一RegionServer所承載的region個數也是關鍵因子之一,資料索引鍵的設計對於讀取、寫入效能亦有重大的影響。本研究亦發現HBase效能與穩定性的問題發生與管理程序不當有很大的關係,故除了提出上述的系統性因素改善之外,亦針對系統參數一致性、版本控制與軟體佈署方式、異動作業程序管理、系統監控方式、緊急問題處理等管理程序提出改善建議,如此才能降低人為錯誤,建置更加穩定的叢集系統。

關鍵字

巨量資料 效能 穩定性 影響因子

並列摘要


Business competition has become more aggressive these days. Meeting consumer demand is the focus at which every company is aiming nowadays in order to win the competition. In recent years, data mining and big data have become a hot topic. How to extract valuable information from various databases to benefit business, conduct accurate analysis, and predict consumer behavior become critical while big date tools is helpful in all these. The tools can help business develop business strategies, strengthen competitive advantage, and maximize profits. Although Hadoop, a big data platform, has brought many advantages, such as the scalability and the tolerance of single node failure, it has its disadvantage as well. This research is based on the case study of NoSQL Database HBase, which has been applied in a prominent foundry company in Taiwan. This research systematically studied the problems and tried to find out the factors that affected the system’s efficiency and stability and searched for solutions for each factor, and then evaluated the effectiveness of the solutions for the repeated problems. The results of this study shows that HBase’s performance and stability will likely be impacted by the factors such as Hadoop platform stability, DataNode xceiver parameter, operation system parameters, the region count of each RegionServer. Also, the row key design will impact the read/write performance as well. This study also found the stability problem of HBase was related to the inappropriate process management. This suggests that to improve HBase’s performance and stability has to be performed not only from system level perspective but also from the management perspective. Through the improvement of management control like consistency of parameter, version control of parameters, deployment processes, change management, enhancing monitoring and emergency management can reduce human error and construct a more stable distributed cluster system.

並列關鍵字

Big Data Efficiency Stability Impact factor

參考文獻


1. Cattell, R. (2010), “Scalable SQL and NoSQL Data Stores”, SIGMOD Record, December 2010, 39(4), 12-27.
2. Chang, F., Dean, J., Ghemawat, S., Hsieh, C. W., Wallach, A. D., Burrows, M., Chandra, T., Fikes, A. and Gruber, E. R. (2006), “Bigtable: A Distributed Storage System for Structured Data”, OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA.
7. Jiang, Y. (2012), HBase Administration Cookbook. Birmingham: Packt Publishing Ltd.
9. Pokorny, J. (2013), “NoSQL databases: A Step to Database Scalability in Web Environment”, International Journal of Web Information Systems, 9(1), 69-82.
12. White, T. (2012), Hadoop: The Definitive Guide 3rd Edition. Sebastopol: O’Reilly Media, Inc.

延伸閱讀