在商業和工業領域中經常使用資料庫的應用,而將自然語言轉換為SQL也受到了廣泛的關注。隨著深度學習技術被大量使用在自然語言處理中,透過text-to-SQL技術可以將使用者的問題轉換為SQL查詢,讓不懂SQL語法的使用者也能輕鬆查詢資料庫的資料,提升使用者便利性。先前的相關研究,將使用者提出自然語言問題查詢的關鍵字,進行簡單的單詞匹配任務,考量問題與資料表之間的關聯性,並容許使用者的詢問關鍵字可以不同於資料表的欄位名稱,資料庫與問題間的關係被認為是text-to-SQL任務中的當前瓶頸。我們提出了一種使用嵌入預訓練和微調模型神經網路的text-to-SQL方法,透過資料表提取命名實體辨識型態,作為模型額外輸入,識別每個欄位的內容,使用命名實體辨識欄位的型態加強問題與資料表的關聯性,提升模型的邏輯形式準確度與執行準確度。
Database applications are frequently used in commercial and industrial fields, and the conversion of natural language to SQL has also received extensive attention. With the extensive use of deep learning technology in Natural Language Processing (NLP), text-to-SQL technology can convert user questions into SQL queries, allowing users who do not understand SQL syntax to easily query database data, improving user convenience. Previous related studies have asked users to put keywords for natural language questions, perform simple word matching tasks, consider the relevance between questions and data tables, and allow users to query keywords that are different from the columns of the data table. The relationship between databases and questions are considered to be the current bottleneck in text-to-SQL tasks. This thesis proposes a text-to-SQL approach that uses a neural network embedded in a pre-trained and fine-tuned model to extract Named Entity Recognition (NER) patterns from a data table as additional input to the model to identify the content of each field, using the NER field. The type of bit strengthens the relationship between the problem and the data table improves the logical form accuracy and execution accuracy of the model.