極限分類涉及大量類別,已成為現代應用(如產品搜尋、推薦系統和語言模型)中的關鍵技術。隨著類別數量增加到百萬級,最終分類層的權重可以輕易達到數百GB,導致巨大的記憶體需求。傳統馮·諾伊曼架構中大量的權重數據移動導致了記憶體牆問題。 為了緩解這一問題,現有解決方案採用基於近似算法的存儲內計算技術,但它們在權重傳輸過程中仍存在額外的SSD內部數據移動,使進一步提高性能變得困難。 我們利用3D-NAND快閃記憶體的存內計算架構來克服這些挑戰。我們的架構可以提供更高的過濾率,以減少整體數據傳輸,同時消除低精度權重的傳輸。我們提出了一種軟硬體協同設計的方法,通過聚類數據放置和自適應門檻調整來提升極限分類的性能。聚類數據放置提高了我們的存內計算架構在執行過程中的效率。自適應門檻調整確保我們的系統在不同推理過程中保持預期的過濾率。 總體而言,與最先進的儲存體內處理基準相比,我們的研究作品達到了8.1倍的加速和6.2倍的能量節省。
Extreme classification, involving a vast number of categories, has become essential in modern applications like product search, recommendation systems and language models. As the number of categories increases to the million-scale level, the weight of the final classification layer can easily reach several hundred gigabytes, leading to enormous memory requirements. The massive movement of weight data in the traditional Von Neumann architecture results in a memory wall problem. To alleviate this problem, existing solution uses in-storage processing techniques based on approximate algorithms. However, this method still involves redundant data movement for weight transfer, making it difficult to achieve further performance improvements. We leverage a computing-in-memory 3D-NAND flash memory to overcome these challenges. Our architecture can provide a higher filter rate to reduce overall data transfer while eliminating the transfer of low-precision weights. We present a co-designed software and hardware approach with clustering data placement and adaptive threshold adjustment to improve performance for extreme classification. Clustering data placement enhances the efficiency of our computing-in-memory architecture during execution. Adaptive threshold adjustment ensures that our system maintains an expected filter rate across different inferences. Overall, our paper achieves a significant 8.1x speedup and 6.2x energy savings compared with the state-of-the-art in-storage processing baseline.