In this thesis, we proposed a framework for classifying emotions by utilizing the face skin color variations. Previous approaches of classifying emotions with philological signal are limited by the difficulty of acquiring such signals in practice. The proposed method use a spatial-temporal filter to extract the face skin color variation signal in a video which is recorded by a consumer level camera and classify emotions with the extracted signal. The proposed approach is evaluated on a public database MAHNOB-HCI-Tagging and compared with the result provided by the database provider. The results showed the feasibility of the proposed approach, which implies the possibility of emotion classification by remotely estimated physiological signals in face.