The purpose of this research was to examine the validity of the Mandarin Token Test within the framework of Rasch analysis. Two thousand children aged 3 to 6 were employed as the subjects of the research. The major findings can be summarized as follows. First, most of our test items demonstrated good model fit. Second, we used concurrent estimation to achieve vertical scaling and rescaled the performance of each age group with one other age group as reference alternately. As such, despite the different choice of test items across age groups, it is possible to compare the estimates of their abilities with vertical scaling. Finally, the results showed that item difficulty increased as complexity and length of commands increased and the difficulty of common items decreased with age.