近期,基於Transformer的方法因其在各種下游任務中卓越的性能而引起了廣泛的關注以及研究興趣,然而其架構仍然高度仰賴於相關專業性知識以及人為手動設計。在神經網路搜索的領域,也陸陸續續有許多相關的在Transformer的研究,但多數方法都是在固定的搜索空間上進行架構搜索,僅有極少數的方法會先對搜索空間進行優化再做神經網路的搜索。而這些少數方法儘管已考慮了搜索空間的更新,卻因為無法在源頭限制模型的大小而因此時常搜索出非常大型且複雜的模型,進而導致運算資源的大量消耗。 為了解決這些問題,我們引入了一個優化框架,能夠在更新Transformer架構搜索空間的同時,並考慮資源約束條件。該框架允許搜索空間在指定的約束條件(例如模型大小、FLOPS等)下從前一個搜索空間逐漸演化,以便更好地探索架構。具體而言,根據每個搜索維度對於準確度的影響,近似地計算準確度梯度。接著,我們根據與近似準確度梯度最相近的搜索空間合法方向來更新搜索空間。通過在包括Cifar10、Cifar100、Tiny ImageNet和SUN397等各種資料集上進行實驗,結果顯示我們提出的方法可以一致地發現更輕量級的架構,同時在性能上優於原始模型及使用其他NAS方法所找到的模型。此外,我們還展示了所提出的方法可以幫助探索對於最近流行的CLIP模型在新的下游任務中有效且輕量級的adapter。
Recently, transformer-based methods have gained significant attention and research interests due to their superior performance in various tasks, whereas their architectures still highly rely on manual design by human. Although recently neural architecture search (NAS)-based methods have been introduced to automate the process, these methods either require human to manually specify a fixed search space for architecture search or allow search space update but usually result in large and complex models for satisfactory performance. To address these issues, we introduce a constrained optimization framework to resource-aware search space exploration for transformer architecture search (Se-TAS), which allows the search space to evolve gradually from the previous one under user specified constraints (e.g., model size, FLOPS, etc.) for better architecture exploration. To be specific, the impact of each search dimension of a search space is calculated based on unit accuracy differential over the dimension as the approximate accuracy gradient. Then, we update the search space according to one of legitimate directions with the highest cosine similarity to the approximate accuracy gradient. With extensive experiments on various benchmarks, including Cifar10, Cifar100, Tiny ImageNet, and SUN397, the results demonstrate that the our method can consistently find much more lightweight architecture while achieve better performance than the original and the models searched by the compared NAS methods. Furthermore, we also show the proposed method can help explore effective and lightweight adapters for the recently popular foundation models to new downstream tasks.