本文提出了一種全新有關於承擔特質的任務:在影片中的每一幀畫面中,尋 找具有承擔特質的區域以及判斷承擔特質的有無。以往有關承擔特質的研究 中,只著重在圖像的偵測;為了此項在影片中偵測承擔特質的新任務,我們 提出一個新的承擔特質資料庫Support Affordance Video (SAV) dataset,蒐 集支撐承擔特質的影片並設計一系列的動作情境,使得承擔特質有無的狀態 隨著情境中的動作及環境而改變。我們提出了網路架構,使用兩條不同的分 支加上專注於時間序的模組,預測在影片中的承擔特質的關注區域、承擔特 質的區域、及承擔特質有無的標籤。我們檢驗在SAV 資料集上測試的結果, 以驗證此方法的有效性。
This thesis proposes a new task on affordance: detecting the affordance region and predicting the existence of affordance for each frame in a video sequence. In the past, researches about affordance only focus on detection for a single image. For this new task about affordance detection in videos, we build a new affordance dataset, Support Affordance Video (SAV) dataset. The dataset consists of support affordance videos that exhibit a series of action scenarios to make the affordance existence status change as actions and environments change in scenarios. We propose a network architecture that uses two different branches and temporal modules to predict affordance attention area, affordance region, and affordance existence label in a video. The experimental results on SAV dataset provide a baseline of the new task and validate the effectiveness of our method.