一類支援向量機是支援向量機用於處理無標籤資料的延伸版本,它作為一種成熟的異常值檢測方法,目前已經得到了廣泛的應用。然而,與一般的用於解決分類問題的二類支援向量機相比,一類支援向量機並沒有提供機率輸出這一功能,也就是說我們無法預測一筆資料出現異常的機率。目前,已經有了一些用於預測二類支援向量機機率輸出的有效方法,但一類支援向量機的這部分問題仍未被關注,主要原因是它作為非監督式學習的模型沒有標籤可以參考,導致預測機率困難。在這篇論文中,我們的目標是提出對於一類支援向量機可行的機率輸出方法,我們也探討了那些可以應用於二類支援向量機的方法無法在一類支援向量機上進行使用的原因。由於一類支援向量機標籤的缺失,我們認為讓機率輸出模仿決策值的分佈是一個可行的思路,並基於這一想法提出了幾種新的方法,後續又在實驗中使用人工資料集和真實資料集,對幾種新方法的可行性進行了驗證。
One-class SVM is an extension of SVM to handle unlabeled data. As a mature technique for outlier detection, one-class SVM has been widely used in many applications. However, similar to standard two-class SVM, the design of one-class SVM does not give probabilistic outputs. Thus for an instance we cannot directly predict its probability to be an outlier. For two-class SVM, some methods have been proposed to effectively obtain probabilistic outputs, but less attention has been paid on one-class SVM. The reason is apparently due to the lack of label information. Our aim in this work is to propose some practically viable techniques to generate probabilistic outputs for one-class SVM. We investigate existing methods of generating probabilistic outputs for two-class SVM and explain why they may not be suitable for one-class SVM. Due to the lack of label information, we think a feasible setting is to have probabilities mimic to the decision values of training data. Based on this principle, we propose several new methods. Detailed experiments on both artificial and real-world data demonstrate the effectiveness of the proposed methods.