In image processing area, there are many algorithms that will need to ex- ecute their algorithm multiple times to one image. To this kind of algorithms, there will be redundant overheads between kernels. We can apply kernel fu- sion to eliminate these overheads. Halide now provides a way to do kernel fusion, but using this way will cause redundant works and redundant pixels accessed. We introduce another way to perform kernel fusion without redundant works in order for Halide to add this way to OpenCL CodeGen and improve the performance of kernel fusion.