透過您的圖書館登入
IP:3.149.243.106
  • 學位論文

基於物聯網隱私保護性資料傳輸之時間序列模擬方法

Simulating IoT-type Time Series for Privacy-Protecting Data Sharing

指導教授 : 徐茉莉

摘要


資料科學研究中的資訊倫理議題日益受到重視,各國政府也提出了資料保護和個人 隱私的相關規範。一些常見的隱私議題包括政府大規模監控和用戶個資被不當使用 造成的隱私衝突。例如,在 Facebook–Cambridge Analytica 個資外露事件中,社群 媒體上的用戶並不知道個人資料被其收集,共享和出售。因此,歐盟首先釋出了通 用數據保護條例(GDPR)的法規,以限制個人資料的收集和處理。另一種方法則 是隱私保護性資料傳輸,用於與第三方進行數據共享。 我們的研究目標為基於物聯網隱私保護性資料傳輸之時間序列模擬方法,讓物聯網 時間序列資料得以自由傳輸並進行分析,並避免直接披露資料本體,以最大程度地 減少隱私問題。物聯網時間序列可能涉及非常敏感之使用者行為:例如,無論是智 慧家電中自動收集的感測資料或是用戶自行輸入資料,這些數據有助於智慧家電理 解我們的日常偏好和協助日常瑣事。目前在搜集物聯網資料上面,用來避免隱私議 題的常用方法包括:以收集較不涉及隱私的用戶資料或以用戶端模型取代而不需將 用戶資料上傳到雲端。但是,收集敏感數據仍會伴隨隱私風險,例如資料的重新識 別,重建和分解(Laforet等,2015)。我們專注於基於物聯網隱私保護性資料傳輸 之時間序列模擬方法。研究了GRATIS時間序列模擬架構,其提供了基於原始時間 序列特徵的時間序列模擬方法,藉由GRATIS 架構於物聯網時間序列的模擬研究, 釐清了下列三個研究問題: 1.哪一組時間序列特徵最適合用於模擬物聯網類型時間序列? 2.是否存在對於不同的時間序列週期(例如每小時或每天)中,不同的效能表現? 3.如何運用基於特徵的時間序列模擬來達到隱私保護的目的? 為了研究GRATIS 模擬方法的效用性,我們運用兩種模擬方法(全序列模擬,分段 模擬)在三個特徵集(GRATIS,CompEngine,catch22)的兩個週期上(每小時, 每分鐘)。 我們以圖形比較原始數據和模擬數據和計算RMSE相似度來評估其性能。我們使用 了兩份物聯網資料來驗證其效益,包含運動手環上的心律資料和家庭用電量資料。 我們的研究有助於了解如何共享物聯網類型的模擬時間序列,以平衡隱私保護和準 確性。我們也在此篇論文中提出了用於隱私控制和共享策略的方法。

並列摘要


Ethical issues in data mining have been receiving more attention, and several laws and regulati ons have emerged emphasizing the importance of data protection and privacy. Some common privacy concerns include big brother watching and unintended use of personal data. For instan ce, the Facebook–Cambridge Analytica data scandal shows that social media users are unawa re that personal data is collected, shared, and sold. Hence, laws are needed for protecting perso nal data-related rights. The European Union came out with the first regulation called the Gener al Data Protection Regulation (GDPR) to restrict the collection and processing of personal dat a. Another approach relies on privacy-preserving data sharing, such as methods employed by b ureaus of statistics for sharing administrative data with various users. This research aims to find a solution that allows sharing IoT time-series data for purposes of a nalysis, while preventing harm to the data subject from directly disclosing their data in order t o minimize privacy issues. IoT time series can be very sensitive: For example, collecting senso r data or user-entered data from smart home applications is necessary for understanding our pr eferences and assisting our daily chores. Common methods for avoiding sharing sensitive IoT data include collecting less sensitive data or building models on local machines without transm itting user data to cloud services. However, collecting sensitive data still poses privacy risks su ch as re-identification, reconstruction, and disaggregation (Laforet et al., 2015). We focus on a simulation approach for sharing time-series IoT data. In our research, we study t he ability of the GRATIS scheme by Kang et al. (2020) to provide simulated series that are suf ficiently different from the original, yet preserve the main features needed for analysis. By stud ying this approach, we are able to answer the following questions. 1. What is a suitable set of time series features for simulating IoT-type time series? 2. How does performance vary across different time series periodicities (e.g. hourly or daily)? 3. How can feature-based simulated time series be useful for protecting privacy? To study the ability of the GRATIS simulation approach, we compare three feature sets (GRA TIS, CompEngine, catch22) on two periodicities (hourly, minutely), for two simulation approac hes (entire series simulation, piecewise simulation). We evaluate their performance by graphica lly comparing the original and simulated data and compute RMSE similarity measures. Our ap plication uses real IoT data on household power consumption and heart-rate from a fitness ban d. Our findings contribute to the body of knowledge on how to share IoT-type simulated series fo r balancing privacy protection and accuracy. As an integration of this research, we propose sev eral approaches for privacy control and sharing strategy.

並列關鍵字

time-series privacy IoT feature-based simulation

參考文獻


Allhoff, F., & Henschke, A. (2018). The internet of things: Foundational ethical issues. Internet of Things, 1, 55–66.
Amazon.com: Echo (3rd Gen)- Smart speaker with Alexa- Charcoal: Amazon Devices. (n.d.). Retrieved March 17, 2020, from https://www.amazon.com/all-new-Echo/dp/B07NFTVP7P/ref=sxin_0_ac_d_pm?ac_md=2-1-QmV0d2VlbiAkNTAgYW5kICQxMDA%3D-ac_d_pm&cv_ct_cx=amazon+echo&keywords=amazon+echo&pd_rd_i=B07NFTVP7P&pd_rd_r=8f222265-b2e8-4f94-96cc-0ebde9556244&pd_rd_w=onTp1&pd_rd_wg=X2Lz5&pf_rd_p=0e223c60-bcf8-4663-98f3-da892fbd4372&pf_rd_r=NPEP9CA1NCG41J804EP4&psc=1&qid=1584425825
Amazon.com: Hello MB15226/W1 Sense with Voice Sleep System—Cotton (Current Generation—2nd): Health & Personal Care. (n.d.). Retrieved March 17, 2020, from https://www.amazon.com/Hello-MB15226-W1-Sense-System/dp/B01M9F2WLE/ref=dp_ob_title_wld
Apple. (n.d.). Retrieved March 17, 2020, from https://www.apple.com/
Ashouri, M., Shmueli, G., & Sin, C.-Y. (2019). Tree-based methods for clustering time series using domain-relevant attributes. Journal of Business Analytics, 2(1), 1–23. https://doi.org/10.1080/2573234X.2019.1645574

延伸閱讀