Humanoid robots are highly anticipated to compensate the labor in our daily lives. To use humanoid robots in the real world, they have to understand human actions as linguistic expressions and perform human-like motions from linguistic commands. In this paper, we propose a statistical system to realize these abilities. Previous research symbolizes motion data as Hidden Markov Models. However, they use only joint angle or joint position data and subsequently recognize and generate simple motions but not complex motions such as manipulations. To understand human behaviors and generate robot motions, objects information like positions and name is helpful. This paper proposes a system to detect and recognize the object on which motions act and understand human behaviors as sentences by establishing the statistical networks among motions, objects and language.