With rapid advances in the information technology, user's direct access to a huge amount of videos on the Internet is desired and often required. In this thesis, we propose a method to perform high-level narrative structure extraction of films. Our objective is to utilize the knowledge of film production to analyze and extract the structure of films. This is achieved by combining visual and aural cues with cinematic principles. An aesthetic model is developed to integrate the visual and aural cues (aesthetic fields) to evaluate the aesthetic intensity curve that is associated with the narrative structure. Finally, we conduct experiments on different genres of films. Experimental results demonstrate the effectiveness and significance of our approach.