巨量資料交易之法律風險與管理意涵－以個人資料再識別化為中心

本文以個人資料再識別化為中心，探討巨量資料交易之法律風險與管理意涵。本文對我國各級法院關於個人資料去識別化之判決進行全面的實證研究，發現我國司法判決仍誤以為刪除部分直接識別標誌即可去識別化，毫無再識別風險之概念。在「學生資料案」中，大學意識到個人資料去識別化後具有再識別之風險，然很可惜法院認為此點無再加論述之必要，錯失一次審理再識別風險的絕佳機會。在「健保資料案」中，最高行政法院將審理重點擺在應由資料提供者抑或資料接收者進行去識別化程序之問題，否決原告所提驗證再識別化風險之聲請。本文淺見認為，運用已掌握幾位特定人之部分資料來驗證是否能辨識出該特定人，進而獲取該特定人之全部資訊，是相當好的再識別風險之驗證方法，最高行政法院於判決中全盤否定原告所提證據方法，恐有值得商榷之處。對於降低再識別風險之可能機制，本文認為對於大多數研究而言，著實沒有追求資料絕對精準之必要，概括處理應是平衡個人資料隱私保護與資料可用性最理想之去識別化方法。資料提供者為確保去識別化是否能完全斬除或至少大幅降低再識別風險，進而降低侵權風險，在交易前應對去識別化之效果進行驗證。巨量資料去識別化是否確實之驗證，應該挑選個人已知的幾筆資料，確認是否由這幾筆資料即可識別出特定人。

關鍵字

巨量資料交易；去識別化；再識別化；法律風險；管理意涵

並列摘要

This paper focuses on the re-identification of personal data and discusses the legal risks and management implications of big data transactions. A comprehensive empirical study of the Taiwanese courts' decisions regarding the de-identification of personal data is conducted. The findings indicate that the Taiwanese courts are unaware of the risk of re-identification. In the Student Information case, the university recognized that de-identified personal data poses a re-identification risk. However, the court stated that further elaboration on this concern was unnecessary, thereby missing a critical opportunity to pass a judgment that would address re-identification risk. In the Health Insurance case, the Supreme Administrative Court focused on procedural concerns regarding whether the de-identification process should be performed by the data provider or the recipient, and denied the plaintiff's argument about re-identification risk. This paper proposes that a validation measure for determining whether specific individuals may be identifiable based on partial personal data and whether this process makes the whole of their personal data obtainable would be suitable for use in assessments of re-identification risk. In light of the proposed method, the Supreme Administrative Court's complete negation of the evidentiary method proposed by the plaintiff is debatable. Regarding a possible mechanism to reduce the re-identification risk, this paper argues that for most research, absolute precision is not required, and that the ideal approach for de-identification is the ＂generalization treatment,＂ which can balance the protection of personal data privacy and data usability. In this approach, the data provider must validate the efficacy of de-identification before transactions to ensure that the de-identification process has completely eradicated or substantially reduced the re-identification risk, thereby reducing the infringement risk. To validate whether the de-identification of big data is reliable, several datasets belonging to known individuals must be examined to confirm whether the individuals may identified from the datasets.