In order to use Wavelength Division Multiplexing (WDM) networks efficiently, a lightpath is dynamically established upon the arrival of a connection request. Generally, dynamic routing is usually determined by conventional shortest path algorithm. Different link weight assignment results in different routing and network performance. Using reinforcement learning (RL), this paper deals with the problem of call admission control (CAC) and routing in differentiating the services of WDM networks to obtain maximized system revenue. The problem is formulated as a finite-state discrete-time dynamic programming problem that is too complex to be solved exactly. Here we adopt the RL method together with a decomposition approach, to solve this problem and demonstrate that it is able to earn significantly higher revenue than the alternatives.