Sparse matrix multiplication (SMM) is widely used in many vital scientific and engineering computations, such as least-squares problems, eigenvalue problems, partial differential equations, and image reconstruction. However, it is a very time-consuming process and the irregular structure of sparse matrices usually causes general-purpose processors to perform poorly and to suffer cache misses severely. In this paper, we develop an SMM system based on network-on-a-chip (NoC) technology to parallelize the needed computations. To facilitate the load balancing and effciency of packet distribution on the proposed SMM system, a method of mapping and partitioning a large matrix onto the system is also proposed. In addition, the proposed SMM system is fully parameterizable so that it has the maximum exibility to meet the hardware resource available at hand. The proposed SMM system has been veried with a variety of network sizes, including 2 × 2, 2 × 4, 4 × 4, 4 × 8, and 8 × 8, on a Xilinx Virtex 5 device (XC5VLX110T) operating at 100 MHz. A number of random and real-application matrices are used to evaluate the performance of the proposed SMM system. In addition, the effects of network sizes, matrix sizes, and sparsity on the system performance are considered. The results show that the proposed SMM system can achieve up to 40× and 2× speedup over MicroBlaze and Intel processors, respectively. The proposed SMM system is also realized with a TSMC 0.18 µm cell library. The core area of the 4×4 system is 1,986.5 µm × 1,985.4 µm, equivalent to 259,026 gates. The average power consumption is 417 mW at the operating frequency of 166 MHz.