In order to solve the problem of cold start and low recommendation accuracy caused by the recommendation system due to sparse information, researchers usually use more auxiliary information to improve recommendation performance, such as collecting more user-related information, target products, and related information. What's more, build a knowledge graph based on users or products, and use more knowledge to help predict user preferences and improve recommendation performance. This article believes that only relying on information in the form of text, or simple splicing of different modal information, is not enough to predict user preferences. Here, we propose a recommendation system based on a multi-modal knowledge graph. Based on the video based on the user's historical clicks, we construct the corresponding multi-modal knowledge graph, and then use our recommendation system model to extract the graph layer by layer. Based on this feature, we can filter and recommend more suitable videos for users based on this feature. By collecting a large number of experimental data sets, we have proved that the performance of the recommendation system based on the multimodal knowledge graph greatly surpasses several state-of-the-art baselines, and the click-through rate has been greatly improved.