透過您的圖書館登入
IP:18.118.37.240
  • 學位論文

使用基於長短期記憶的生成對抗網路實現自動智慧影片編輯

Automated Intelligent Video Editing Using LSTM-GAN

指導教授 : 施吉昇

摘要


經驗豐富的影片編輯人員會使用不同的編輯技術,包括攝像機的移動,鏡頭的類型和鏡頭的構圖,以創造出不同語義的影片,從而傳遞不同的含義給觀賞者。在影片的製作過程中,影片本身的內容很重要,但如何將每個鏡頭組合起來的方式也很重要。我們的目標是訓練一個模型去學習如何編輯出符合攝像規則的影片。我們提出了一個深度生成模型,其中的生成器和鑑別器都是單向的LSTM網絡,這個模型用於生成影片編輯所使用的鏡頭轉換的序列。製作不同種類的影片時,我們會使用不同類型的鏡頭轉換方式,而我們的模型是從兩個不同類型的音樂節目中學習其所使用的不同鏡頭轉換的方式,其中一個是韓國音樂節目,另一個是中國音樂節目。通過結合不同類型的鏡頭和攝像機的移動,我們的AI影片編輯器可以為觀眾帶來了各式各樣的觀看體驗。而最後我們從三個方面(包括創造力,繼承性和多樣性)衡量用於影片編輯時的生成鏡頭轉換序列的質量。 LSTM-GAN生成的序列的質量在衡量創意性的方面 (值0到1)平均比馬爾可夫鏈生成的好0.35,但比LSTM生成的略差0.0204。在衡量繼承的方面 (值-1到1)平均分別比馬爾可夫鏈和LSTM生成的好0.0007和0.0223。在衡量多樣性方面 (值0到1)分別比馬爾可夫鏈和LSTM生成的質量好0.2957和0.37305。權衡這三方面來說,LSTM-GAN生成的序列都比馬爾可夫鏈生成的好,而與LSTM生成的比較時,創意性的方面較LSTM生成的略差,不過在繼承方面與多樣性方面都比LSTM生成的好。總結來說,在同時確保創造力,繼承和多樣性下 LSTM-GAN生成的序列的質量比馬爾可夫鏈或LSTM生成的質量更好。

並列摘要


Experienced video editors use different editing techniques, including camera movement, types of shots, and shot compositions to create different video semantics delivering different messages to the viewers. In the video production process, the content of the video is important, but so is the way to put it together. Our goal is to train a model to learn how to edit the video that meets the videography rules. We propose a deep generative model with both the generator and discriminator are unidirectional LSTM networks to generate the sequences of shot transitions for video editing. Different kinds of productions use different types of editing transitions and our model learns two types of editing transitions from two different productions. One is the performance stages of Korean music programs, and another is Chinese music programs. By combining different types of shots and camera movements, our AI video editor brings various viewing experiences to the viewers. We measure the quality of the generated shot sequences for video editing from three aspects, including creativity, inheritance, and diversity. The quality (ensuring creativity, inheritance, and diversity at the same time) of the synthetic sequences generated by LSTM-GAN are better than those generated by the baseline model (Markov chain or LSTM). On average the quality of the sequence generated by LSTM-GAN is 0.35 better than that generated by the Markov chain in terms of measuring creativity (the value is [0,1]), but slightly worse than that generated by LSTM by 0.0204. In terms of measuring inheritance (the value is [-1,1]), it is 0.0007 and 0.0223 better than Markov chain and LSTM, respectively. In terms of measuring diversity (the value is [0,1]), the quality generated by the Markov chain and LSTM is 0.2957 and 0.37305 better than those generated by LSTM, respectively. The sequences generated by LSTM-GAN are better than those generated by Markov chains, and when compared with those generated by LSTM, the creativity is slightly worse than those generated by LSTM, but they are better than those generated by LSTM in terms of inheritance and diversity. In summary, the quality of the sequence generated by LSTM-GAN is better than the quality generated by the Markov chain or LSTM while ensuring creativity, inheritance, and diversity at the same time.

參考文獻


[1] J. R. Smith, D. Joshi, B. Huet, W. H. Hsu, and J. Cota, “Harnessing a.i. for augment-ing creativity: Application to movie trailer creation,” Proceedings of the 25th ACM international conference on Multimedia, 2017.
[2] J. Choi, T. Oh, and I. S. Kweon, “Contextually customized video summaries via natural language,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 1718–1726.
[3] “Self-learning ai for process automation and data extraction.” [Online]. Available: https://www.vilynx.com/
[4] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014, cite arxiv:1406.2661. [Online]. Available: http://arxiv.org/abs/1406.2661
[5] B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial lstm networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

延伸閱讀