透過您的圖書館登入
IP:3.135.190.101

摘要


Traditionally, in machine learning, the quality of the result improves steadily with time (usually slowly but still steadily). However, as we start applying reinforcement learning techniques to solve complex tasks-such as teaching a computer to play a complex game like Go-we often encounter a situation in which for a long time, there is no improvement, and then suddenly, the system's efficiency jumps almost to its maximum. A similar phenomenon occurs in human learning, where it is known as the aha-moment. In this paper, we provide a possible explanation for this phenomenon, and show that this explanation leads to the need to reward students for effort, not only for their results.

延伸閱讀