In this paper, we focus on the problem of vehicle detection and classification in aerial images by using CNN architectures. The detection and classification of vehicles at the intersection in urban area has always been an important issue in intelligent transportation system. We use high resolution digital camera on unmanned aerial vehicles to record aerial images. We adopt Mask R-CNN to detect and classify the vehicles into four types including buses, trucks, cars, and other. The detected mask of each vehicle will be fitted by a rotatable rectangle shape as the final result. The experimental results show that the mean average precision is outstanding in the test aerial image.