垂直式聯邦式學習可以在隱私洩露不會發生的情況下,讓多家擁有同樣本但不同資料特徵的機構們共同學習一個模型。然而,這樣的學習前提是多家機構必須用有足夠的樣本重疊數必須要夠多,否則模型效果可能會不彰。在現實情形下,機構中擁有的資料裡能被認定為是可使用資料(重疊樣本)是非常少的,大部分的資料都無法被垂直式聯邦學習所使用。因此,我們提出了一個聯邦式降維演算法 (FedDRA),可以優化上述情況發生時的模型效果。我們替判別成分分析(discriminant component analysis)這個降維演算法演算法設計了一套安全的計算協議,使之可以符合垂直式聯邦式學習的隱私保護要求。此外我們設計特殊的優化,使利用那些為無重疊資料來使模型結果仍維持較好的準確性。我們主要的貢獻主要有三,一是在可使用重疊樣本有限時,我們仍維持較好的效能。二是相比於該領域其他研究,我們擁有最低的預測期傳輸成本,在實務上更有效率。三是我們是第一個提供了一種較為彈性垂直式聯邦式學習演算法,可以搭配各種下游任務或後續演算法使用。
Vertical federated learning (VFL) enables different parties that have different features of the same sample to learn a machine learning model together without exposing their own data. However, it is essential that all collaborative parties share enough overlapping samples to ensure the performance of VFL models. In reality, the fraction of overlapping samples is usually insufficient and most non-overlapping samples are unutilized. Therefore, we propose a dimensionality reduction algorithm for vertical federated learning, a supervised projection approach that improves the performance of VFL models under insufficient training samples. We adapt discriminant component analysis(DCA) to VFL settings and combine the information extracted from non-overlapping parts into the FedDRA(federated Dimensionality Reduction Algorithm). Our main contributions are three: 1. Performing well with a small number of overlapping samples. 2. It is a adaptable VFL framework that can be combined any data analysis technology 3. When compared to other VFL works, it has the lowest communication cost during prediction.