Purpose: The reliability of Direct Observation of Procedural Skills (DOPS) has been of concerns. For convincingly examining the reliabilities of DOPS, we built up three video-recorded clinical scenarios played by standardized patients/family members and standardized house officers. The objective of this study was to investigate the test-retest reliability, interrater reliability and internal consistency reliability of DOPS to assess house officers' capability of performing procedural skills. Methods: We developed Scenario I, Scenario II, and Scenario III for guaranteeing that all raters assessed exactly the same procedural skills performed by standardized house officers. We recruited the raters, using purposive sampling, with plentiful experiences of rating house officers' performance of procedural skills from one governmental university-affiliated, and three medical university-affiliated medical centers, and two Veterans Affairs Council-affiliated medical centers. Each rater was required: (1) to watch the video-recorded performance in a consecutive order starting from Scenario I in a single rating session; and (2) to rate each house officer's performance of procedural skills in each scenario. We examined the test-retest reliability, interrater reliability, and internal consistency reliability of DOPS using Pearson's correlation coefficients (PCC), intra-class correlation coefficient (ICC), and internal consistency reliability (Cronbach's alpha), respectively. Results: We found that: (1) a rater gave similar rating results in face of exactly the same procedural skill as indicated by acceptable or more than acceptable test-retest reliability (PCC = 0.75 ~ 0.81, P < .01); (2) the raters gave the similar rating results to a house officer's performance of procedural skills as indicated by an excellent interrater reliability (PCC = 0.93 ~ 0.96, P < .01); and (3) all the items of DOPS reflected uni-dimensionally a construct-a house officer's performance of procedural skills as supported by a much better internal consistency reliability (Cronbach's alpha = 0.84 ~ 0.91). Conclusions: Our study result convincingly reported that DOPS is a reliable assessment tool for assessing a house officer's performance of procedural skills, and can be widely used in clinical encounter to rate the performance of procedural skills.