A novel hardware architecture for c-means clustering is presented in this paper. Our architecture is fully pipelined for both the partitioning and centroid computation operations so that multiple training vectors can be concurrently processed. A simple divider circuit based on table lookup, multiplication and shift operations is employed for reducing both the area cost and latency for centroid computation. The proposed architecture is used as a hardware accelerator for a softcore NIOS CPU implemented on a FPGA device for physical performance measurement.Numerical results reveal that our design is an effective solution with low hardware complexity and high computation performance for c-means design.