Noise is an inevitable problem of photography due to hardware limitations. To tackle with it, researchers have developed various kinds of denoising methods. One of the methods use neighbor frames from video to help denoising each frames, which is so-called video denoising. In this paper, we use N3NET as backbone, which leverages neighbor patches to help denoising, and extend the concept of it to multiple images denoising problem. Furthermore, we train another sub-model to learn a so-called detail-level map of images, an analogy to noise-level map of noise from photography terms. In the end we use both detail-level map and original frames to predict the denoised result. We show that by using 3D N3Net we can have similar visual quality with state-of-the-art methods. And with close-to-ground-truth detail-level map, we can further improve the result.