In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. By taking the advantage of a multi-path structure in the architecture of specific models, our separating strategy divides the paths into several groups and schedules the communications with rule-based methods to reduce the amount of transmission data. In different groups, the computation and communication are overlapped to alleviate the overhead of data transmission. Moreover, we further adopt the method of a neural architecture search to sample communication decisions with policy and search for high-accuracy low-transmission models. The best model we found decreases 86.6% of the amount of data transmissions as compared to the original model and does not impact performance much. Under proper specifications of devices and configurations of models, inference of large neural networks on edge clusters can be distributed and accelerated with our approach.