貝葉斯優化和Hyperband的超參數優化研究

在機器學習和深度學習的領域，已經有許多被證實了再許多不同的科學應用中，已經有了出色的解決問題能力。這些領域的表現性能，都依賴著一組良好的超參數(Hyperparameter)。但是，這些超參數是需要更多的專業知識或在特定領域的大量經驗來手動調整超參數，這也件很浪費人力資源事情。此外，在不同場景應中使用相同的超參數配置是不可行的，因為每個學習場景對應的每個配置都具有不同的優化設置，這也是為何通常需要很久的時間才能找到一組好的超參數配置。貝葉斯優化(Bayesian optimization)和Hyperband就是實現了在各種超參數的優化問題上，最先進的表現性能。因此，我們將使用網格搜尋(GrideSearch)作為比較其他超參數優化方法的基準。此外，我們也將討論再有限的資源下，哪種方法最適合哪些不同的數據集與模型架構。在本文中，我們的貢獻是發現在作為超參數優化的回饋的損失值(Loss value)中，如果希望優化的超參數配置(hyperparameter configuration)包含直接影響該損失值的超參數，則它將影響超參數優化的性能。因此，總之，我們建議兩種解決方案來避免這種情況。首先，考慮直接影響損失值的超參數作為固定值，否則，將反饋從損失更改為準確。這可以有效地降低超參數優化性能失敗的可能性

關鍵字

超參數優化；貝葉斯優化；隨機搜索；網格搜索； Hyperband ； SUCCESSIVE Halving

並列摘要

Machine learning and Deep learning domains have shown excellent power in solving problems in many different scientific fields. The performance in these domains depends on a suitable set of hyperparameters. However, it requires more professional knowledge or lots of experience in a particular field to tune the hyperparameters manually, a waste of human resources. Furthermore, it is not feasible to use the same hyperparameter configurations in different scenarios because each configuration for learning scenarios has different optimize settings, which usually takes a long time to find the good configuration. Bayesian optimization and Hyperband have achieved state-of-the-art performance on various hyperparameter optimization problem. Thus, we will use GrideSearch as our benchmark to compare hyperparameter optimization method. Furthermore, we will discuss which method is the best fit for different dataset and model architecture when encountering limited resources. In this paper, our contribution is discovering the flaws of hyperparameter optimization by identifying the loss value of network lost function so that users of hyperparameter configurations can effectively use it. We used loss value as a hyperparameter optimization feedback to update their method of calculation. Therefore, we realized that to optimize the hyperparameter configurations, it will have direct effect on loss value parameters and further damage the update feature of hyperparameter optimization. In conclusion, we suggest two solutions to avoid this situation. First, consider the hyperparameter that directly affects the loss value as a fixed value. Second, change the feedback from loss to accuracy. These two can effectively reduce the possibility of failure of hyperparameter optimization performance.

並列關鍵字

Hyperparameter optimization ； Bayesian optimization ； Random Search ； Grid Search ； Hyperband ； SUCCESSIVE Halving

參考文獻

[1] J. Mockus, “On bayesian methods for seeking the extremum,”in Proceedings of the IFIP Technical Conference, (London, UK, UK), pp. 400–404, Springer-Verlag, 1974.

Google Scholar

[2] E. Brochu, V. M. Cora, and N. de Freitas,“A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,”CoRR, vol. abs/1012.2599, 2010.

Google Scholar

[3] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter optimization,” Journal of Machine Learning Research, vol. 18-185, pp. 1–52, 2018.

Google Scholar

[4] J. Bergstra and Y. Bengio, “Random search for hyperparameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.

Google Scholar

[5] D. R. Jones, “A taxonomy of global optimization methods based on response surfaces,” J. of Global Optimization, vol. 21, pp. 345–383, Dec. 2001.

Google Scholar

國際替代計量

貝葉斯優化和Hyperband的超參數優化研究

查找全文

主題瀏覽