任務專家協作：多智能體系統的任務分解與專家分配

大型語言模型如GPT-4 在各種任務中展現了卓越的能力。然而，沒有任何單一的大型語言模型能夠最佳地處理所有類型的任務和應用。為了應對這一限制，必須結合各種特定領域大型語言模型的優勢，利用它們的獨特能力來克服單個模型的約束。鑑於通過付費API 服務訪問最強大的大型語言模型相關費用高昂，一種具有成本效益的方法是根據任務難度整合不同規模的大型語言模型。路由的範圍可以擴展到包含由多個領域或不同難度級別的子任務組成的更複雜的任務。在這種情況下，將單個任務分配給一個模型不足以完成任務。這些複雜的任務必須按任務領域或難度進行分解，並將子任務進一步分配給不同的大型語言模型。我們提出了一種基於大型語言模型的新方法，即任務專家協作（CoTE），該方法利用大型語言模型的推理能力和豐富知識來進行任務分解和專家分配。CoTE 將每個大型語言模型的獨特特徵，包括其專業領域和模型規模，整合到提示中。在簡單任務上的廣泛實驗比較中，CoTE 表現出卓越的路由準確性。在複雜任務實驗中，CoTE 在多領域MMLU 任務上實現了95.00% 的路由準確性和17.00% 的總體準確性提升，在多難度MMLU 任務上實現了79.60% 的路由準確性並顯著降低了成本，這突顯了其在任務分解和專家分配方面的有效性。

關鍵字

大型語言模型；查詢路由；多智能體系統

並列摘要

Large Language Models (LLMs) like GPT-4 have shown remarkable proficiency across various tasks. However, no single LLM can optimally manage all types of tasks and applications. To address this limitation, it is essential to combine the strengths of various domain-specific LLMs, leveraging their unique capabilities to overcome the constraints of individual models. Given the high costs associated with accessing the most powerful LLMs via paid API services, a cost-effective approach involves incorporating LLMs of different sizes based on task difficulty. The scope of routing can be extended to encompass more complex tasks that consist of sub-tasks spanning multiple domains or varying levels of difficulty. In such cases, assigning a single task to one model is insufficient to achieve task completion. These complex tasks must be decomposed by task domains or difficulty, with sub-tasks further assigned to different LLMs. We propose a novel LLM-based method, Collaboration-of-Task-Expert (CoTE), which uses the reasoning ability and vast knowledge of LLMs for task decomposition and expert assignment. CoTE integrates the unique characteristics of each LLM, including their areas of expertise and model size, into prompts. Extensive experimental comparisons with previous routing methods on simple tasks demonstrate CoTE ’s superior routing accuracy. In complex task experiments, CoTE achieves a 95.00% routing accuracy and a 17.00% overall accuracy improvement on multi-domain MMLU tasks, as well as a 79.60% routing accuracy and a significant cost reduction on multi-difficulty MMLU tasks, highlighting its effectiveness in task decomposition and expert assignment.

並列關鍵字

Large Language Models ； Query Routing ； Multi-Agent System

參考文獻

[1] M. Abdin, S. A. Jacobs, A. A. Awan, J. Aneja, A. Awadallah, H. Awadalla, N. Bach, A. Bahree, A. Bakhtiari, H. Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.

Google Scholar

[2] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.

Google Scholar

[3] Anthropic. Introducing the next generation of claude. https://www.anthropic.com/news/claude-3-family, 2024. Accessed:18/7/2024.

Google Scholar

[4] M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, L. Gianinazzi, J. Gajda, T. Lehmann, M. Podstawski, H. Niewiadomski, P. Nyczyk, and T. Hoefler. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17682–17690, Mar 2024.

Google Scholar

[5] C. E. E. Center. The questions in the science portion of the general scholastic ability test. https://www.ceec.edu.tw/xmfile?xsmsid=0J052424829869345634, 2021. Accessed:18/7/2024.

Google Scholar

主題瀏覽