近期滲透測試的需求逐漸增長,但是現存的資訊安全漏洞掃瞄器的 API 爬蟲在現代以 Javascript 撰寫的動態 Ajax 網頁上表現非常不好。由 於現今的網站都非常大,爬完整個網站是不現實的,於是在某個時間 點終止爬蟲是必須的。我們在這篇論文提出了一個新的爬蟲模型,以 爬蟲會在某未知的時間中止為前提,並針對爬 API 做設計,在固定時 間下有比以前的爬蟲有更突出的表現。在我們的設計中,我們把爬蟲 所需要的花費轉換成已經被研究透徹的隨機最短路徑 (SSP) 問題。我 們的實驗結果顯示,我們的模型比起傳統的策略像是廣度優先搜索及 深度優先搜索,可以爬到更多的 API。
The requirement of security penetration testing grows in recent, but the ap- plication programming interface (API) crawler of existing web vulnerability scanners have bad performance on modern websites which rely on Javascript technologies like Ajax. Moreover, modern websites are often huge and it is impossible to crawl the full website. Hence, stopping the crawling in some time is necessary. This thesis presents a crawling algorithm design which sup- port breaking off at arbitrary time and also focuses on the API crawling. In this design, we re-define the crawling problem and reduce it to a well studied stochastic shortest path (SSP) problem. We implement two simple baseline models and evaluate on two small websites and ten huge commercial website. The results shows our simple baselines yield higher amount of crawled APIs than the traditional strategies such as depth-first and breadth-first.