Digital Signal Processors (DSP) have been widely used in processing video and audio streaming data. Due to the increasing number of streaming data, increasing throughput is the key issue in designing DSP architecture. One way to increase the throughput of DSP is to increase the instruction level parallelism. To increase the instruction level parallelism, many architectures are proposed and can be classified into two main approaches, the superscaler and the VLIW architectures. Among the hardware architectures, the VLIW attracts a lot attention due to its simple hardware complixity. However, the VLIW architecture suffers from the problem of memory explosion due to the overhead of instruction grouping. To improve the problem of memory explosion, we propose a novel DSP architecture which contains three pipelines and performs dynamic instruction grouping by hardware. The experimental results shows that our architecture can reduce 6% of memory requiremnt on average and still achieve 2% of performance improvement.