The present study aimed to examine the effect of media presentation on English listening comprehension and cognitive load in the ubiquitous learning environment. Learners’ attitudes toward ubiquitous learning were also investigated in the study. Participants were 162 university students majoring in Applied Foreign Languages and were randomly assigned to either single mode group (82 students) or dual mode group (80 students). Students in the single-mode group received spoken messages only, whereas students in the dual-mode group received spoken messages and caption simultaneously. The result revealed that: (1) Caption significantly enhanced English listening comprehension and effectively decreased cognitive load. (2) Learners with better English listening comprehension had lower cognitive load, whereas learners with worse English listening comprehension had higher cognitive load. (3) Caption was positive to English listening comprehension, but negative to schema automation in long-term memory.