通用序列模型：一種通用的自然語言序列到序列學習的架構

建立智能對話系統一直是學術界和工業界人工智能的長期目標。近年來，人們提出了各種神經網絡架構來處理序列到序列的自然語言問題，如機器翻譯、文本到結構式查詢語言、知識性問答、社交聊天機器人等。然而，現有的架構無法有效處理現實世界中覆雜的自然語言任務，例如涉及知識推理的開放領域對話。在本文中，我們介紹了一種端到端的自然語言處理模型，稱之為通用序列模型，它可以實現通用性的序列到序列學習，包含一種新穎的兩階段式架構：單個模型接收自然語言輸入，可執行所有傳統上通常由多個模塊所完成的工作，如槽填充、分類、命名實體標註、特定領域規則等，並生成適當的自然語言回覆。我們提出了包括功能式標記化和動態序列解碼在內的原創的設計和推理策略，幫助該架構實現其功能。在我們的標注數據集上進行的實驗表明，與現有的端到端的模型相比，對於牽涉知識推理的問答，我們的架構在通用性、可解釋性和可維護性方面更加有效和優越。

關鍵字

通用序列模型；功能式標記化；功能式解碼；動態解碼；語言模型

並列摘要

Building intelligent dialog systems has been a long-running goal of artificial intelli gence in both academic and industrial communities. In recent years, various neural net work architectures have been proposed to deal with sequence to sequence (S2S) NLP problems, such as machine translation, text to SQL, knowledge question answering, so cial chatbots, etc. However, existing architectures fail to work effectively on real-world complicated NLP tasks like open-domain conversation that involve knowledge reasoning. In this paper, we introduce a sequence to sequence learning model called GS2S, which stands for generalized sequence to sequence learning and contains a novel ”two-stage” architecture: a single model takes a natural language input, performs all of the tasks to gether that are conventionally done by multiple modules such as slot filling, classification, named entity recognition, domain specific rules, etc., and generates a proper response. In addition, novel design and inference strategies including functional tokenization and dy namic sequence decoding are proposed that help the architecture realize its capabilities. Experiment on our labeled dataset illustrates the effectiveness and superiorities of our architecture over existing end-to-end models in terms of accuracy, maintainability, inter pretability, and generalizability.

並列關鍵字

generalized sequence to sequence ； functional tokenization ； functional decoding ； dynamic decoding ； language model

參考文獻

A. Bordes, Y.-L. Boureau, and J. Weston. Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683, 2016.

Google Scholar

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.

Google Scholar

Y. Chen, J. E. Argentinis, and G. Weber. Ibm watson: how cognitive computing can be applied to big data challenges in life sciences research. Clinical therapeutics, 38(4):688–701, 2016.

Google Scholar

Y.-N. Chen, A. Celikyilmaz, and D. Hakkani-Tur. Deep learning for dialogue systems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 8–14, 2017.

Google Scholar

V. Cheng and Y. Zhang. Analyzing chatgpt＇s mathematical deficiencies: Insights and contributions. In Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023), pages 188–193, 2023.

Google Scholar

國際替代計量

通用序列模型：一種通用的自然語言序列到序列學習的架構

全文下載

主題瀏覽