This thesis proposes an approach that is applied to human-robot interaction domain to learn the user needs and preferences and adjust robot behaviors. Accordingly, as robots are put into use in humans’ daily life, the assigned tasks to robots are miscellaneous, and the quantity of people to be interacted by robots is immense. As a result, when facing different users, it is important for robots to personalize the interactions and provide user-desired services. For occupational people, staying at working places takes a large amount of time. In this thesis, we study the service robot with application to the office environment. The research content in this work is distinct from traditional machine learning. In human-robot interaction, the training data can be collected only or largely from real experiments. Besides, different individuals possess different preferences, and his/her preferences may even vary with many internal or external factors. Last but not the least, natural human communication and interactive behaviors add additional uncertainties to the learning of robots. This thesis has three principal contributions. First, we propose an approach under which the robot adjusts its behaviors to adapt to user preferences while it is interacting with users. The method of action selection can effectively explore actions based on past human responses. Moreover, the method of approximating the reward functions and transition functions are especially designed for human-robot interaction. Second, due to the fact that human preference can vary and reactions to the same robot behaviors are different from person to person, the rewards produced from the pre-constructed model should be modified online. To achieve this, we examine the correlation between the robot action and the human response, and then fine-tune the reward of the predictive model for adaptive learning. Third, natural human responses and the human’s interaction with the environment are considered in our work. In this way, the learning efficiency can be enhanced and the required human efforts for robot learning can be reduced.