以類神經網路模擬函式的初步探討

類神經網路自從Geoffrey Everest Hinton 在 2006 年發表了一篇對此領域非常重要的論文後，其應用開始變的蓬勃發展，而且全世界也開始變的認可和接受類神經網路這方面的研究。有些人甚至認為類神經網路和人工智慧只是兩個同義詞，並聲稱「人工智慧的時代已經到來」。　　儘管大量類神經網路的應用取得了巨大的成功，但在其理論上，一些「簡單」的問題仍然未得到解答。例如說，眾所皆知，類神經網路很擅長逼近函式，但是如果我們對要逼近給定函式的類神經網路加以限制，那麼在限制之內，哪一種類神經網路的架構最能逼近給定的函式？再舉一個例子，如果因為一些其他的因素，決定使用特定的類神經網路模型（ = 特定的類神經網路架構 + 特定的接線權重）來逼近給定的函式，那麼我們應該輸入甚麼樣的數據來讓模型逼近函式呢？是要輸入「所有可能的數據」嗎？還是應該是「某種可能的數據」呢？　　如果答案是後者，那麼「某種可能的數據」是哪種數據？是否應該有個標準來選擇數據？（讓所使用的類神經網路「擅長」預測「某種可能的數據」，而非「所有可能的數據）那麼標準應該是甚麼？　　本研究使用了一些非常簡單的類神經網路來逼近一些非常簡單的函式。首先，確定要使用的基本類神經網路模型，並選擇了23個函式進行近似。然後，使用選定的類神經網路模型來逼近23個選定的函式（使用反向傳播和梯度下降等常見方式）並對逼近的結果進行評估。　　本研究有一個簡單和一個複雜的結論。簡單的結論是，如果我們要逼近的是一個線性函式，那麼類神經模型中就不應該使用ReLU等激勵函數來進行訓練。複雜的結論是，在逼近函數時，我們應該要謹慎的選擇要使用的輸入集。當使用類神經網路模型逼近給定的（非線性）函式時，即使所使用的類神經網路模型已經是（在同類中）可以逼近的最好的模型，訓練結果也是完全有可能只能接受某個範圍的輸入。為了逼近相同（非線性）函式中不同範圍的輸入，我們可能要使用不同的類神經網路模型去逼近此函式。　　本論文的主要貢獻有兩個。第一個貢獻是，提出了「可模擬」（或是「可近似」）的概念，正是由於這個概念，類神經網路模型只能用於模擬（或近似）給定函式的某些輸入集，這些輸入集可能是、也可能不是所有輸入的集合。（這裡提到「一個類神經網路模型在這組輸入的範圍內可以逼近一個函式」，指的是這個函式中，只要輸入是在這個範圍內，那這個類神經網路就可以逼近這個函式）。第二個貢獻是，給出一個流程。該流程可以從一個非常有限的類神經網路架構中，選擇出一個最佳的、可以近似給定函數中某些輸入的類神經網路模型。

關鍵字

類神經網路；可模擬性；通用近似定理；線性函數；非線性函數；激勵函數； ReLU

並列摘要

Since Professor Geoffrey Everest Hinton published his monumental papers in 2006, neural network applications have flourished and got world-wide recognition and acceptance. Some people may even go as far as suggesting that neural networks and artificial intelligence are just two synonyms, claiming that “the age of artificial intelligence has arrived.” 　　Depite the tremendous success achieved by a very large population of neural network applications, some “simple” theoretical questions still remain unanswered. For example, it is well known that neural networks are very good at approximating functions, but if we put constraints on the neural networks used for approximating functions, what network architecture(s) (satifying the given constraints) can best approximate a given function? As another example, if, for some reason, we decide to use a particular neural network model (= a particular neural network architecture + a particular assignment of weights of the edges) to approximate a given function, what is the set of input data we should use for the approximation? Should it be the set of “all possible input”? Or should it be the set of “some kind of input”? If the answer is the latter, what should these “some kind of input” be? Should there be a criterion for making such choices (about the kinds of input that the neural network used is “good at” in predicting their corresponding function values)? What should this criterion (or these criteria) be? 　　This research investigates the use of some very simple neural networks for approximating some very simple functions. First, the basic neural network models to use were decided, and 23 functions were selected for the approximation. Then, the selected neural network models were used to approximate the 23 selected functions (using the usual techniques of backward propagation and gradient descent) and evaluations of the approximation results were made. 　　There were one simple result and one complicated result. The simple result is that if what we want to approximate is a linear function, then activation functions such as ReLU should not be used for the approximation. The complicated result is that in approximating a function, it may be the case that we should choose the set of inputs to use with care. When approximating a given (non-linear) function using a neural network model, it is entirely possible that the results are acceptable only for a certain kind of input, even if the neural network model used is already the best (among its peers) to use for approximating the given function. To approximate the same (non-linear) function for a different kind of input, it is likey that we should use a different neural network model for the approximation. 　　The main contribution of this thesis is twofold. One, the concept of “simulatable” (or “approximatable”) is proposed. It is due to this concept that a neural network model can only be used to simulate (or approximate) a given function for some set of input, which may or may not be the set of all possible input. (Here, by saying that “a neural network model can approximate a function for a set of input,” it is meant that the function and the set of input is “approximatable” by the neural network.) Two, an algorithm is given. This algorithm heuristically selects, from a very limited pool of neural network models, the best neural network models that can approximate functions for some set of input.

並列關鍵字

neural network ； simulatable ； approximatable ； linear function ； non-linear function ； activation function ； ReLU

參考文獻

[1] Hinton, G. E., Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504-507.

Google Scholar

[2] Hinton, G. E., Osindero, S., Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.

Google Scholar

[3] Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105.

Google Scholar

[4] McCulloch, W. S., Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.

Google Scholar

[5] Hebb, D. O. (1949). The organisation of behaviour: a neuropsychological theory. New York: Science Editions.

Google Scholar

國際替代計量

以類神經網路模擬函式的初步探討

全文下載

主題瀏覽