透過您的圖書館登入
IP:3.16.66.206
  • 學位論文

使用因果架構模型進行預測值的反事實解釋

Using Causal Structure for Counterfactual Explanations of Predicted Scores

指導教授 : 徐茉莉

摘要


反事實解釋 (Counterfactual Explanations) 是解釋性機器學習中的一種新興方法,是以實際例子來說明如果機器學習 (Machine Learning) 模型的輸入值改變的話,其預測值會如何改變,但此方法僅解釋模型演算法是如何產生預測值而非解釋此情況發生的原因。當企業或客戶希望利用反事實解釋的方式來修改其行為或軟體系統時,只是依據模型的改變而下決策並非了解問題的根本原因,如此當模型發生錯誤時,企業與客戶將會面臨不好的後果。 為了解決此問題,本篇研究使用 Judea Pearl 所 提出的因果架構模型 DAG,透過 DAG 我們可以從領域知識中架構出因果關係,並鎖定一個變數來了解修改此變數對於結果會有什麼影響,依據 DAG 的特性 d-separation 讓我們可以判別哪些變數會同時影響的應該要被控制在模型中,而哪些會影響因果效果的變數不能加入到模型中。 此研究以共享電動車租借服務為例,研究使用者在預定租車後取消預定並不租用的機率高低,提供企業一個彈性的緩衝時間來緩解因為使用者取消租借帶來的營運損失。我們希望能夠建立一個預測模型,並給出反事實解釋「如果企業或使用者採取了不同的行為,使用者租車的機率是否會提高」。 在實驗中我們比較了用 DAG 建立的模型以及一般常見的幾種機器學習模型的效 能,結果發現 (1) 依據因果規則建立的模型和一般機器學習模型預測準確度相似 (2) 依據因果規則的模型產生出的反事實解釋不僅使用更少的變數也讓解釋變得更加穩健。 本研究帶來了幾項貢獻:對於資料科學家,DAG 提供一個新的方式來思考如何挑選 變數與進行特徵工程;對於企業來說,我們提供了一個更為簡單且具有因果關係的模型,讓模型更容易理解,同時又保有相當的準確度;對於客戶、使用者來說,反事實解釋的方式提供了一個很直覺且直接的方式,瞭解客戶行為可以如何改變來達到理想的結果。

並列摘要


Counterfactual explanation (CE) is a new approach used to explain the relationship between inputs of a machine learning (ML) algorithm and the predictions it generates. However, it only explains why the algorithm made this decision instead of the underlying causes of the outcome. The lack of causal explanations for predicted scores can cause damage when companies or customers want to modify or manipulate some aspects such as their software or behavior by looking at the counterfactual prediction. To address this issue, this research constructs the “map” of causal relationships between variables from a structural causal model, using Judea Pearl’s directed acyclic graph (DAG) methodology. By constructing the causal DAG and using the d-separation property, we can identify which confounding variables should be considered when estimating the causal effect of interest, and then based on that choose features to be included in the predictive models. Such models can then be used to provide counterfactual predictions that are based on causal arguments. In this research, we use a large dataset from a leading electric motorcycle sharing service to demonstrate the results. We focus on the problem predicting whether users rent a scooter within the 10-min window from the time they reserved it. Dropping such reservations (and not renting the scooter) causes an operating loss to the company. To solve this problem, we build predictive models to identify high-risk reservations at the time the user makes the reservation, so the company can offer a more flexible buffer to induce the user to rent and minimize the loss. Importantly, we want to be able to have causal explanations of a predicted value, and causal counterfactual predictions. We compare the predictive performance and the CEs from machine learning (ML) models with DAG models. As shown by our experiments on real-world data, the DAG model (1) shows comparable predictive performance with the ML models, and (2) generates more robust CEs, which rely on causal arguments, and require a much smaller number of predictors values to be held constant. Our research provides several contributions: For data scientists, it proposes a new way of thinking about feature selection and feature engineering. For companies, this method provides a visual way to show the model assumptions and use the simplest method to generate predictions based on causal relationships derived from domain knowledge. It makes the model more interpretable and understandable, and can also retain predictive accuracy similar to ML models. For customers, the model becomes more transparent and easier to understand. In addition, the counterfactual explanations provide a straightforward way to know how to change their actions to get the desired outcome.

參考文獻


Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton university press.
Bottou, L., Peters, J., Quiñonero-Candela, J., Charles, D. X., Chickering, D. M., Portugaly, E., Ray, D., Simard, P., & Snelson, E. (2013). Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1), 3207-3260.
Chatla, S. B., & Shmueli, G. (2017). An extensive examination of regression models with a binary outcome variable. Journal of the Association for Information Systems, 18(4), 1.
Chiappa, S., & Isaac, W. S. (2018). A causal Bayesian networks viewpoint on fairness. In IFIP International Summer School on Privacy and Identity Management (pp. 3-20). Springer, Cham.
Conrady, S. (2018). Media Mix Modeling and Optimization with Bayesian Networks [Blog Post]. https://forum.bayesia.us/t/y4hh646/media-mix-modeling-and-optimization-with-bayesian-networks

延伸閱讀