摘要 近年來,在網際網路上資訊安全問題成為一個重要的研究議題,因為CGI腳本程式往往會因為程式人員的有意或不小心將後門程式與資訊漏洞給植入,而這些問題導致企業的內部資訊被違法取得而且也不易被偵測到。 此外,網際網路的快速成長也使得網路使用狀況探勘的研究也越顯重要。因此,為了能有效偵測到那些無法被網路安全工具偵測與危害企業經營的後門程式與資訊漏洞,我們提出利用網站日誌探勘的技術處理進而增進網站伺服的安全。首先,我們整合網站日誌與網站應用程式日誌以解決一般的網站日誌資訊量不足的問題,接著我們利用密度式聚類演算法探勘出異常的資料樣式。而系統管理者則藉著這些樣式資訊可以容易的從他們系統程式中找出後門程式或是資訊漏洞以達到提昇網路伺服器的安全性。最後,我們也利用實際網站的日誌檔加以分析探勘,幫助該網站管理者成功的偵測到該系統的CGI腳本程式的問題。
Abstract The problem of information security on the Web has become an important research issue recently. Because the Backdoors or information leak of scripts in Common Getaway Interface(CGI)is hidden inadvertently or premeditated by programmers, these problems cause enterprise’s information to be gotten illegally, and can’t be detected by security tools easily. Besides, Internet grows fast to encourage the important research of Web mining. Therefore, in order to detect Backdoor or information leak of CGI scripts that the some security tools can’t detect and to avoid damage of enterprises, we propose a log data mining to enhance the security of Web servers. First, we combine Web application log data with Web log data to solve the problems in Web log. Then, our method uses the density-based clustering algorithm to mine some abnormal Web log and Web application log data. The obtained information can help system administrator detecting the Backdoor or information leakages in programs more easily. Moreover, the mined information can help system administrator detecting the problem of CGI scripts from on-line Web site log data.