Recently, deep methods have shown good performance in depth estimation and Visual Odometry from monocular video sequence by optimizing the photometric consistency between frames. However, it remains hard to obtain large-scale ground truth depth maps for supervising a neural network for depth estimation. Meanwhile, existing solutions for depth estimation typically produce low resolution results. Inspired by recent deep learning methods for semantic segmentation, we present a simple but effective unsupervised learning deep network for more accurate depth estimation and camera motion estimation. An atrous spatial pyramid pooling module and an additional refinement layer are combined to an encoder-decoder base model. Besides, we introduce a consistency-regularization loss to increase the robustness towards handling illumination change. Our approach produces high-resolution depth maps with sharper object boundaries and achieve better results on the KITTI benchmark.