透過您的圖書館登入
IP:3.19.30.232
  • 學位論文

貝葉斯優化和Hyperband的超參數優化研究

A Study of Bayesian Optimization and HYPERBAND for Hyperparameter Optimization

指導教授 : 鄭振牟
共同指導教授 : 廖世偉(Shih-Wei Liao)
本文將於2024/07/31開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


在機器學習和深度學習的領域,已經有許多被證實了再許多不同的科學應用中,已經有了出色的解決問題能力。這些領域的表現性能,都依賴著一組良好的超參數(Hyperparameter)。但是,這些超參數是需要更多的專業知識或在 特定領域的大量經驗來手動調整超參數,這也件很浪費人力資源事情。此外,在不同場景應中使用相同的超參數配置是不可行的,因為每個學習場景對應的每個配置都具有不同的優化設置,這也是為何通常需要很久的時間才能找到一組好的超參數配置。貝葉斯優化(Bayesian optimization)和Hyperband就是實現了在各種超參數的優化問題上,最先進的表現性能。因此,我們將使用網格搜尋(GrideSearch)作為比較其他超參數優化方法的基準。此外,我們也將討論再有 限的資源下,哪種方法最適合哪些不同的數據集與模型架構。在本文中,我們的 貢獻是發現在作為超參數優化的回饋的損失值(Loss value)中,如果希望優化的超參數配置(hyperparameter configuration)包含直接影響該損失值的超參數,則它將 影響超參數優化的性能。 因此,總之,我們建議兩種解決方案來避免這種情況。 首先,考慮直接影響損失值的超參數作為固定值,否則,將反饋從損失更改為準確。這可以有效地降低超參數優化性能失敗的可能性

並列摘要


Machine learning and Deep learning domains have shown excellent power in solving problems in many different scientific fields. The performance in these domains depends on a suitable set of hyperparameters. However, it requires more professional knowledge or lots of experience in a particular field to tune the hyperparameters manually, a waste of human resources. Furthermore, it is not feasible to use the same hyperparameter configurations in different scenarios because each configuration for learning scenarios has different optimize settings, which usually takes a long time to find the good configuration. Bayesian optimization and Hyperband have achieved state-of-the-art performance on various hyperparameter optimization problem. Thus, we will use GrideSearch as our benchmark to compare hyperparameter optimization method. Furthermore, we will discuss which method is the best fit for different dataset and model architecture when encountering limited resources. In this paper, our contribution is discovering the flaws of hyperparameter optimization by identifying the loss value of network lost function so that users of hyperparameter configurations can effectively use it. We used loss value as a hyperparameter optimization feedback to update their method of calculation. Therefore, we realized that to optimize the hyperparameter configurations, it will have direct effect on loss value parameters and further damage the update feature of hyperparameter optimization. In conclusion, we suggest two solutions to avoid this situation. First, consider the hyperparameter that directly affects the loss value as a fixed value. Second, change the feedback from loss to accuracy. These two can effectively reduce the possibility of failure of hyperparameter optimization performance.

參考文獻


[1] J. Mockus, “On bayesian methods for seeking the extremum,”in Proceedings of the IFIP Technical Conference, (London, UK, UK), pp. 400–404, Springer-Verlag, 1974.
[2] E. Brochu, V. M. Cora, and N. de Freitas,“A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,”CoRR, vol. abs/1012.2599, 2010.
[3] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter optimization,” Journal of Machine Learning Research, vol. 18-185, pp. 1–52, 2018.
[4] J. Bergstra and Y. Bengio, “Random search for hyperparameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.
[5] D. R. Jones, “A taxonomy of global optimization methods based on response surfaces,” J. of Global Optimization, vol. 21, pp. 345–383, Dec. 2001.

延伸閱讀