In this study, we focus on improving the performance of Hidden Markov Model-based Text-to-Speech system for Mandarin Chinese to achieve better smoothness and fluency of synthesized speech. Two factors are taken into consideration in our work: the design of acoustic model and pitch tracking algorithm for the training process. We implement three acoustic models, “consonants and vowels”, “consonants and tonal vowels”, and “right context dependent phonemes of syllables”. As for pitch tracking, we compare “RAPT” against “UPDUDP”. We employed preference tests to evaluate the synthesized speech. According to the result, we choose “right context dependent phonemes of syllables” as the acoustic model and “RAPT” as pitch tracking algorithm to construct our speech synthesis system. The implemented system is publicly available at http://mirlab.org/Demo/TTS/.