最常用來訓練倒傳遞網路的演算法為最陡坡降法,因其降低訓練誤差的效果良好。然而該法亦存在一些缺點,如收斂緩慢以及容易陷入局部解而無法跳脫的問題。許多改良的方法被提出用以改進前述的缺失,如加入慣性項可以加速收斂;而使用全域搜尋法(例如機率登山類演算法或禁忌搜尋等)可以跳脫局部解。但這些改良的演算法也存在若干弱點。例如慣性項的加入有時對收斂速度助益不大;機率登山類演算法經常假設誤差函數呈現某種分佈,但有時情況並非如此。另外禁忌搜尋法雖然可能尋得全域最佳解,但由於使用太多隨機值,以致求解品質不穩定。同時經常需耗費大量的運算時間。本篇論文提出一個改良的方式,使得在不顯著增加訓練時間下,可以加速收斂並且有效降低訓練誤差。 然而,無論使用上述何種演算法都有求解的瓶頸,亦即訓練誤差到某水準時就難有進展甚至停滯。此時若能配合演化類演算法(如基因演算法),並使用適當的演化策略,理論上將能無限提升求解精度到極限。本篇論文所提的基因演算法著重在演化策略,而非有些研究者所專注的改良演算子。經初步實驗證實,在長久演化過程中,演化策略的影響力比演算子重大。由於結合使用的基因演算法採平行處理,所以不會增加訓練時間。
Gradient steepest descent (GSD) is often used to train the back-propagation neural network (BPN) because of its excellent performance of reducing training errors; however, it also has some drawbacks such as slow convergence and local optimum problem. Many improved methods are proposed to amend the aforementioned demerits; for example, momentum can be employed to accelerate convergence, and global search methods, e.g. probabilistic climbing search and taboo search (TS), etc. are introduced to fix the local optimum problem. Nevertheless, some weaknesses exist in those methods. For instance, added momentum may sometimes not work well in speeding up convergence; probabilistic climbing methods assume that error function follows a certain distribution, which may not always be true. While TS might approximate the global solutions, its quality of solution remains unstable on account of too many random variables and it often requires heavy computation. This paper proposes an improved method to hasten convergence and decrease the training errors effectively without much more training time. Even so, whatever algorithms which are mentioned above encounter bottleneck of achieving more accuracy of training. That is, diminishing of training errors becomes stagnant at some convergence level. If evolutionary algorithms e.g. genetic algorithm (GA) is combined, training accuracy may be in theory refined indefinitely to its maximum precision with proper evolving strategies. This paper lays emphasis on evolving strategies instead of evolving operators. It’s preliminarily proved in this paper that during the long evolution process, influence of evolving strategies is greater than that of evolving operators. The training time of combining GA would not grow as a result of parallel processing.