透過您的圖書館登入
IP:3.22.248.208
  • 學位論文

適地性社群資料分析在犯罪預測之應用

Crime Analysis and Prediction with Machine Learning Approaches from Location-Based Social Network Data

指導教授 : 鄭士康
共同指導教授 : 李政德

摘要


近年來,隨著行動通信技術的進步,帶動了帶動適地性社群服務與應用的迅速發展。而其中使用者的點位資訊(point of interest)和打卡記錄(check-in),兩者分別擁有建築和使用者的地理、社群和時間資訊。在過去十幾年,都市環境中的人潮動態和犯罪的關係一直是被深入研究的議題。雖然已經有不少的研究探討人口特性和犯罪關係的議題,但多數的研究只考慮教育程度,年齡分布,性別和人種等時間 解析度低的資訊。另一方面,也有相當多的研究是利用過去的犯罪資料搜尋犯罪熱點(hot spot)並預測熱點的移動趨勢,適當配置警力。然而,利用適地性社群資料探討都市環境以預測都市犯罪的研究仍是少數。就資料特性而言,適地性社群資料較傳統的人口數據統計具,且數據獲取成本也較低,但也存在著許多挑戰,例如數據稀疏,和使用者族群代表性等。本研究分析都市高犯罪率區域的環境和人潮動態,挖掘出具有建築物和資訊之社群網路動態和高犯罪率區域空間資訊的相關性,其成果主要可提供都市計畫降低犯罪率之規劃依據。我們擷取了三藩市和芝加哥市2009~2010年適地性社群資料和犯罪資料,分析不同時段之不同網格內,人潮動態、外在環境與犯罪頻率的關聯性。根據社群資料的項目,每個方格選出 2 種地理特性和 9 種建築類別資訊;另外,在每一時段選出 5 種人潮動態資訊。為了量化每個特徵和每種犯罪的相關程度,我們使用了度量指Normalized Discounted Cumulative Gain (NDCG)。為了評估綜合特徵對於高犯罪率的預測能力,我們使用k-fold validation 選取數據,部份數據使用統計方法(線性迴歸,支持向量機,隨機森林)訓練模型,求出使預測效果最好的參數組合,其餘數據用以驗証是否能協助找到每種犯罪排名前百分之三十的網格。實驗結果顯示,資料量的多寡會影響三種統計方法對犯罪率的預測效果,但可以藉由改變格子大小改善,基準點改變對於預測效果卻不明顯;而訓練和驗証數據比例的改變對預測效果的差異並無太大改變;另外,NDCG 對於單一特徵的分析結果已經顯示出綜合特徵預測效果的極限。整體而言,機器學習的預測效果比僅考慮單一特性好,但表現不佳的特徵會降低預測結果的準確性。

並列摘要


With the advancement of location-based social networks (LBSNs), users are allowed to “check in” at points of interest (POI) with mobile devices. Compared with conventional demographics, social network data increases in unprecedented pace which resulting in user information. Therefore, human mobility has recently attracted much attention to be studied through LBSNs in spatial, temporal and social aspects. Prior work in urban studies suggests that there is a strong correlation between people dynamics and crime activities. Most works used kernel density estimation to calculate crime density distribution and predicted crime occurrence with it. Due to the proliferation of social media data, some studies implement crime prediction system through Twitter records. However, there is no research on to quantify human activities. In our model, the human activities and buildings are simulated by Gowalla and Foursquare data respectively for San Francisco and Chicago and each city is characterized by a set of grids. Five temporal periods, two geographic, five social and nine categorical factors are encoded in every grid. To retrieve relevance score of all factors to crime rate, Normalized Discounted Cumulative Gain metric is introduced for ranking relevant instances. Three machine learning models (Support vector machine, linear regression and random forest) are employed to build models to evaluate the predict crimes. The result shows data sparsity can affect the precision and collaborative factors have better predictive power than individual one. And the precision might slightly dropped attributed to less relevant factors.

參考文獻


[1] United Nations, Department of Economic and Social Affairs, and Population Division,
World urbanization prospects: the 2014 revision. 2014.
[2] H. Mehlum, K. Moene, and R. Torvik, “Crime induced poverty traps,” J. Dev. Econ.,
[3] J. B. Cullen and S. D. Levitt, “Crime, urban flight, and the consequences for cities,”
Rev. Econ. Stat., vol. 81, no. 2, pp. 159–169, 1999.

延伸閱讀