變異貝氏生成對抗網路

幾年，深層神經網路在各個應用領域吸引了大量的關注，特別是在圖形辨識，如圖片分類任務、物件識別、語音識別、語者識別和不同資料數據的合成或生成。這些數據包括圖片、文本、音檔、演說還有其他複雜數據等等。拿數據生成任務來看，我們並不估計密度函氏，反之我們建模一個生成模型為了能夠控制高維機率分布。除此之外，生成模型可以用半監督式的方式來學習而且還可以不同的方式結合增強式學習。特別是，生成模型中的生成式對抗網路就可被視為一種反增強式學習的實現。再者，生成對抗網路可以在不同任務下多模輸出，像是高解析度圖像生成、圖片與圖片風格轉換，這些本質上需要實際樣本從機率分布抽樣的任務。通常我們會將生成模型分成兩個種類: 隱式密度模型和顯式密度模型。隱式密度模型實現了一個直接生成資料的隨機程序。實際上，隱式密度模型藉由決定性函式將一個隱層變數轉換成觀察隨機變數。這個投射函式通常由神經網路來實現，通常轉換密度為不可解析且高維導數難以計算。對於這個問題，生成對抗網路提供了一個可行的解決辦法。通常，生成對抗網路包函一個雙人玩家遊戲並且公式化為一個建構兩個神經網路最小最大化的最佳化過程，其中一個為生成器另一個為判別器。判別器為一個決定給定的樣本是否為真實樣本來自訓練資料還是人工生成樣本的分類器。生成器則試著生成讓判別器難以區別的樣本。之後經過一系列的對抗學習過程，這個模型會估計出一個給定觀察資料收斂的生成式分布。生成對抗網路雖然可以在圖像生成達到驚艷的效果但仍然有著模式崩潰的問題使得生成對抗網路不能總是保證合成樣本有著優異的品質。另一方面，顯式密度模型基於對數似然函數提供觀測資料的分布一個顯式參數化的規格。最大似然函數提供這類模型一個直覺的方法。在這類別內，變異自動編碼器基於高彈性的先驗分佈和近似事後分佈，以最熱門的模型之一為人所知，變異自動編碼器最大化資料對數似然函數的下限並在資料重構上得到了良好的效果，但重構的圖像會傾向模糊。本篇論文我們針對建構參數不確定生成對抗網路性提出一個變異推斷的方法。值得一提的是，從機率的觀點來看，一般神經網路的參數最佳化過程等效於最大似然估計問題。基本上最大似然估計忽略參數不確定性而容易產生過擬模型，一個常見的解決辦法就是把模型正規化列入考量。從貝式的角度然看，正規化可藉由對模型的參數導入先驗分布。如果先驗分布為高斯，那就等效於對模型 L2 正規化。在生成對抗網內建模參數不確定性對模型正規化提供了一個有意義的解且對不同的訓練資料增進泛化能力。傳統上，這些解基於拉普拉斯近似或是馬可夫鏈蒙地卡羅抽樣，這些複雜度太低或需要太多時間收斂的方法。雖然漢米爾頓蒙地卡羅抽樣以有效地計算樣本梯度問世，隨後，隨機梯度漢米爾頓蒙地卡羅將此法擴大到能訓練大量的資料，但蒙地卡羅方法探索機率空間的額外計算耗費問題依然存在。對於貝式生成對抗網路，此研究將生成器與判別器考慮參數不確定性，處理了計算量問題、避免參數組收斂至本地最小，並提出了一個新的變異推斷方法。變異貝式生成對抗網路 (VB-GAN) 建構於最大化變異下限並結合自動編碼器，合成了合理的樣本藉由重構訓練資料而避免模式崩潰，此變異自動編碼器與生成對抗網路的混種新型態由對抗式學習實現並且避免了模糊資料的生成。重要地，我們也實現資料重構基於瓦式自動編碼器，並用最佳化傳輸來測量兩個機率分布之間的幾何距離。最小化此距離藉由對抗式程序來規範隱層變數的連續性混和分布與先驗分布而不是條件性分布。這樣一來，我們便可以生成更清晰的結果。最後，我們在MNIST手寫字圖片生成、分類、混和高斯合成資料和CelebA名人臉資料集跑實驗。然後是基於資料擴增NIST i-vector 語者辨識實驗。實驗結果皆顯示子空間學習基於多種對抗學習實現的好處。

關鍵字

深度類神經網路；對抗式學習； i-vector ；資料擴增；貝式學習；語者辨識；圖像生成；圖像分類

並列摘要

In the past decade, deep neural networks have been attracting plenty of attentions in different applications especially in pattern recognition tasks like image classification, object recognition, speech recognition, speaker recognition, and synthesis or generation of different technical data including image, text, audio, speech and other types of complicated data. For the task of data generation, instead of estimating the density function, building the generative model is capable of manipulating high-dimensional probability distribution. In addition, generative models can be trained with missing data in a manner of semi-supervised learning and can be also incorporated into reinforcement learning in many ways. In particular, the generative model based on the generative adversarial network (GAN) is seen as a realization of inverse reinforcement learning. Furthermore, GAN is also capable of learning to work with multi-modal outputs for different tasks that intrinsically require generation of samples based on some distributions in the applications of super-resolution imaging and image-to-image translation. It is common to distinguish two types of generative model: implicit density model and explicit density model. Implicit density model implements a stochastic procedure that directly generates data. In practice, implicit density model transforms a latent variable using a deterministic function that map latent variable to the observed random variable. Such a mapping function is usually realized by neural networks. The transformed density is basically intractable and the high-dimensional derivative is difficult to compute. GAN provides a practical and analytical solution to this problem. In general, GAN involves a two-player game formulated as a minimax optimization problem for construction of two neural networks. One is for generator and the other is for discriminator. The discriminator is a classifier that determines whether a given sample looks like a real sample from the training data or an artificially generated sample. The generator attempts to generate plausible samples that the discriminator cannot distinguish. After a series of adversarial learning process, this model aims to estimate a converged generative distribution from the observed data. GAN has achieved remarkable performance on image generation tasks but still suffers from the mode collapse problem such that GAN could not always assure the excellent quality of synthesized samples. On the other hand, the explicit density model provides an explicit parametric specification for the distribution of observed data based on a log likelihood function. Maximum likelihood provides a straightforward approach to this category of models. Under this category, variational auto-encoder (VAE) is known as one of the most popular models with highly flexible priors and approximate posteriors. VAE maximizes the lower bound of data log-likelihood which leads to excellent performance in data reconstruction. However, new images synthesized by VAE tend to be blurry. In this thesis, we develop a variational inference solution to characterize the weight uncertainty in construction of generative adversarial network. It is worth noting, from a probabilistic perspective, the optimization for the weights of standard neural network is equivalent to a maximum likelihood estimation (MLE) problem. Basically, MLE ignores the weight uncertainty and easily produces the overfitted model. One common solution is to take the model regularization account. From Bayesian perspective, the regularization is performed by introducing the prior over the weights of network. If the prior distribution is a Gaussian, then it is equivalent to L2 regularization. Modeling the uncertainty of weights in GAN provides a meaningful solution to model regularization with improved generalization in case of different amounts of training data. Traditionally, the solutions based on the Laplace’s approximation or the sampling method using Markov Chain Monte Carlo (MCMC) involve too low complexity or take long time to converge. The Hamiltonian Monte Carlo (HMC) is introduced to efficiently calculate the gradients from samples. Also, the stochastic gradient Hamiltonian Monte Carlo (SGHMC) is used to scale up the implementation in presence of large training data. However, the computational overhead still exists when Monte Carlo methods are implemented to explore the posterior space. This study deals with the issue of computational overhead, avoids to converge to local minima in each parameter sets and proposes a new variational inference method to Bayesian GAN where the weight uncertainty in generator and discriminator is compensated. Variational Bayesian GAN (VB-GAN) is constructed by maximizing the variational lower bound and combining with an auto-encoder where the generator synthesizes the reasonable samples by preventing the issue of mode collapse due to the reconstruction of training data. A new type of hybrid VAE and GAN is developed to carry out the adversarial learning where blurry data generation is avoided. Importantly, the data reconstruction based on the Wasserstein auto-encoder is implemented as well and optimal transport is realized to measure the geometric distance between two probability distributions. The distance is minimized to regularize the continuous mixture distribution of latent variable so as to match with prior distribution instead of conditional distribution by adversarial procedure. As a result, we could get sharper results. At last, the proposed method is evaluated by the experiments on MNIST hand-written digits images generation, classification, synthetic data, mixtures of Gaussian, and CelebA, large scale celebFaces attributes. Lastly, speaker recognition with NIST i-vector based on Data Augmentation. The experimental results show the merits of subspace learning based on various realizations of adversarial learning.