This paper proposed a method of 'video analysis for two-level semantic descriptors.' The goal is to construct a metadata generator by means of off-line video analysis. Visual features of moving objects are analyzed based on detected moving objects to extract candidate frames which have possible middle-level semantics. Important key frames can then be obtained by means of the proposed algorithm revised from the prior art 'Perceived Motion Energy' method. Middle and low level metadata will be computed for the extracted objects in all key frames.