本研究主要目的在建置一個以雲端運算為基礎之非結構化文字資料之探勘服務;讓使用者得以透過網頁來進行登入、執行、監控與瀏覽的服務平台。本研究在單一主機上使用VMware ESXi軟體架設雲端架構中的Infrastructure as a service (IaaS) 環境,以安裝多個虛擬化作業系統;並運用Apache Hadoop開放原始碼軟體建置雲端中的Platform as a service (PaaS) 平台,可在多台實體電腦上建置小型雲端叢集。並透過Web service來驗證帳戶的權限與使用雲端叢集的探勘方法。 本研究嘗試在雲端系統上實作關聯規則;實驗結果顯示,在虛擬平台上建置叢集化雲端系統,較傳統單機環境更可大幅地有效利用系統資源,使得執行效能可以更加顯著。並且發現在Hadoop雲端叢集中的系統環境之設定,例如node的數量、task的個數及檔案的大小與數目,對於系統的執行效能上,皆有不同程度之影響。
This study tried to construct a mining service for unstructured text data based on cloud computing. Users can execute, monitor and browse the service through the web interface. Proposed system was developed in “Infrastructure as a service” environment embedded on VMware ESXi, that could provide many of virtual operation systems. In addition, we also adopted Apache Hadoop to design a “Platform as a service” platform. User identification could be verified via web service. The association rules were implemented on cloud computing. The test results demonstrated that cloud computing developed on virtual platform could take advantage of system resources notably compared with computing on a single computer. We also discovered that the configuration of Hadoop (the number of nodes and tasks, the size and number of files) would affect the computational performance.