Performance is measured by throughput. The higher the throughput, the higher the performance. ILP (Instruction-Level Parallelism) and TLP (Thread-Level Parallelism) are two major technologies to improve CPU’s performance. This thesis is to design a 6-stage pipelined ARM-like CPU to fulfill ILP, and then develop it to TLP with a 2-thread fine-grained multithreading. For power efficiency, a lot of work has been done on low power design from architecture level down to gate level.