The use of high-resolution remote sensing images to quickly and accurately detect urban building information is the current research focus. In this paper, aiming at the problems of small target loss, rough edge and poor semantic segmentation in the traditional algorithm of extracting buildings from high-resolution remote sensing images, an improved deep convolutional neural network based on U-Net is proposed to realize the end-to-end semantic segmentation at the pixel level. The model fusion strategy was adopted to improve the segmentation accuracy, and the mIoU in the data set reached 70.4%.