透過您的圖書館登入
IP:18.117.127.127
  • 學位論文

以雙流注意力機制模型擷取直播影片精華

Two-Stream Attention Model for Highlight Extraction

指導教授 : 陳建錦

摘要


近年來隨著談話型的串流影片越來越普及,直播平台漸漸的成為人們吸收新資訊的另一個管道。然而,談話型的直播影片通常較為冗長,使得大部分的觀眾無法全程參與直播,為了吸引觀眾加入直播串留影片甚至進一步成為訂閱者,提供精華片段對直播主和直播平台而言就變得格外重要。近年來有許多影片精華擷取相關的研究,其中多數研究使用影像上的資訊作為特徵再進一步擷取影片精華片段,然而這樣的方式並不適用於談話型的直播影片,原因在於談話型直播影片的精華與影像畫面並沒有直接相關,而是與直播主的言談以及觀眾的反應有關。在此篇論文中,我們使用了直播主的言談以及觀眾的留言作為模型輸入,提出了針對談話型直播影片精華擷取的模型,並進一步利用了位置的特徵增強和專注力機制強化特徵向量。此外,我們也透過自調節權重網路給予兩個文字分流預測分數不同的權重增強模型的表現。實驗證明我們的方法在現實生活的資料籍上,表現比起近年提出的幾個知名的精華擷取模型來得更好。

並列摘要


As more and more conversation-oriented streaming videos are available, streaming platforms have gradually taken the place of traditional media for people to access information. Nevertheless, conversation-oriented streaming videos are often lengthy, which makes people reluctant to attend to the whole video. Highlight extraction has thus become necessary for streamers and platform providers to attract people and to watch their videos to become subscribers. Previous highlight extraction methods analyzed visual features of videos and were unable to deal with conversation-oriented streaming videos whose highlights are related to streamer discourses and viewer responses. In this research, we investigate highlight extraction on conversation-oriented streaming videos. Instead of evaluating visual features, the proposed highlight extraction method simultaneously examines textual streams of streamer discourses and viewer messages to conduct highlight extraction. The two techniques of position enrichment and message attention are developed to distill meaningful embeddings of the two textual streams. Also, a self-adaptive weighting scheme is deployed to effectively leverage the embeddings for highlight extraction. Experiments based on real world streaming data demonstrate that the two textual streams, self-adaptive weighting scheme, position enrichment, and message attention are useful to extract highlights of conversation-oriented streaming videos. Moreover, the extraction results are superior to those derived by well-known deep learning-based highlight extraction methods.

參考文獻


References
Bahdanau, D., Cho, K., Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Paper presented at 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States.
Brink, A. D., Pendock, N. E. (1996). Minimum cross-entropy threshold selection. Pattern recognition, 29(1), 179-188.
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 4171-4186).
Duprez, C., Christophe, V., Rimé, B., Congard, A., Antoine, P. (2015). Motives for the social sharing of an emotional experience. Journal of Social and Personal Relationships, 32, 757-787.

延伸閱讀