透過您的圖書館登入
IP:3.142.197.212
  • 學位論文

基於巨量城市資料之監測數值推估與感測器地點配置:以空氣質量爲例

Inferring Missing Sensor Values and Recommending Sensor Deployment Locations Based on Urban Big Data: A Case Study on Air Quality

指導教授 : 林守德

摘要


隨著無線通訊網路、感測裝置與網際網路等相關技術的成熟,資通產業之發展已從基礎設施和理論逐漸轉移至現實世界的無所不在與人們生活息息相關的應用,例如物聯網、感測網路和智慧城市,加上大數據技術在處理與分析巨量資料的強力支援,帶來了許多新的研究機會和應用問題,特別是針對物聯網方面。本論文研究目的在於解決物聯網與感測網路上兩個根本且重要的問題。首先,如何從給定感測器所蒐集到之歷史數據,加上從外在異質性社群與環境數據,來準確推估地理空間中任意地點之感測器即時數值,特別是在城市地區極為稀疏的感測器分布條件下。其次,如果相關公司或政府機構需要部署新的感測器來提高感測器值監測與推估之品質,如何讓機器從偌大的都市空間中自動學習並決定新感測器最佳的部署地點? 此二問題是城市計算和感測網路大數據分析領域中,非常重要且具有挑戰性的問題。雖然感測器被安裝在城市物聯網中提供即時之環境監測數據,然而真實世界往往絕大多數的地點(超過99%)並沒有被安裝感測器,這使得地理空間中幾乎所有地點的環境監測數據是沒有數值的,這種資料稀疏之問題也導致我們無法有效地使用傳統機器學習訓練模型,來推估這些無感測器設置之地點的環境監測數值。在本論文中,我們以空氣品質之環境感測為例,考慮空氣品質指數(Air Quality Index)為感測器所測得之環境數據,並以空氣品質監測站為感測器,我們的目標是設計出兼具效果、效率且可拓展之演算法,來解決上述的兩個研究問題-(1)準確推估沒有監測站之地點的空氣品質指數,(2)推薦新的空氣品質監測站之佈署地點,使得空氣品質推估之準確率提升得以極大化。 在本論文研究中,我們開發了一組廣義半監督式感測器數值推估模型,它不僅能有效地學習感測器之空氣品質指數與城市中異質性空間和時間特徵 (這些特徵包括氣象資料、人群移動模式、道路網路的結構、以及不同類型的地點資訊等) 之關聯性,更能準確預測地理空間中的任意地點之空氣品質。此外,為使得新的空氣品質感測器之佈署效益能達到最大化,我們開發出一熵最小化模型,以此推薦新的感測器該被設置在哪些地點,才能使得空氣品質推估之準確率能有最大的提升。實驗上,我們使用北京市的巨量空氣品質數據來驗證我們所開發之感測器數值推估與新感測器地點推薦模型,結果顯示出我們所提出的方法在準確率、時間效率與可擴展性等各方面,均能以極為顯著之優勢勝過當前最佳的方法,證明了我們提出的模型具有極大之潛力,不僅為新世代感測器之大數據分析解決實務上感測數據稀疏之根本問題,並可作為後續各種物聯網應用的絕佳基礎。

並列摘要


With the gradual maturity of the persuasive techniques for networking, sensors, and Internet, the paradigm of information and communication technology has shifted from fundamental facilities and theories to real-world ubiquitous applications, such as urban computing, sensor networks, and smart cities. Along with the power of big data techniques, such advances also bring new research opportunities and practical problems, especially for Internet of Things (IoT). This thesis aims to answer two questions. First, how to infer real-time sensor value of any arbitrary location given the environmental data and the sensor data collected from monitoring stations that are extremely sparse in urban areas. Second, if the government agency needs to deploy a number of new sensors to improve the quality of sensor value inference, how to automatically learn and determine the best locations to fulfill such purpose? These two problems are very essential for urban computing and big sensor data analysis because sensors will be mounted in urban areas in the future era of Internet of Things, and are considerably challenging since for most of the locations (more than 99%) we do not have any sensor data to train a model from. In this thesis, by considering air quality indices as sensor values and treating air-quality monitoring stations as sensors, we aim to solve the two research problems in an effective, efficient, and scalable manner. We develop a general-purpose semi-supervised inference model, which is capable of not only intelligently learning to correlation between air quality values and heterogeneous spatial and temporal features of city dynamics, including meteorology, human mobility, structure of road networks, and point of interests (POIs), but also accurately predicting the air quality values of arbitrary locations without monitoring stations placed ever. In addition, to facilitate the cost-effective deployment of new sensors, we devise an entropy-minimization model to efficiently recommend the geographical locations such that new sensors established there can lead to the maximum improvement of air quality inference accuracy. We evaluate the proposed models using a huge-scale air quality data in Beijing city. Experimental results exhibit a set of clear advantages over a series of state-of-the-art and commonly used methods for both tasks, and suggest the models have superior potential for applications for current big sensor data analysis and the upcoming era of Internet of Things.

參考文獻


[2] T. H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.), MIT Press.
[3] S. Donald. A two-dimensional interpolation function for irregularly-spaced data. In Proc. of the National Conference. pp. 517–524. 1968.
[5] A. V. Donkelaar, R. V. Martin, and R. J. Park (2006), Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing, J. Geophys. Res., 111, D21201.
[6] W. Du, Z. Xing, M. Li, B. He, L. H. C. Chua, and H. Miao. Optimal sensor placement and measurement of wind for water quality studies in urban reservoirs. In Proc. of IEEE International Symposium on Information Processing in Sensor Networks ISPN, 2014.
[7] D. Erdös, V. Ishakian, A. Lapets, E. Terzi, and A. Bestavros. The filter-placement problem and its application to minimizing information multiplicity. In Proc. VLDB 2012.

延伸閱讀