移動型機器人之自動語音控制

本論文提出一個語音模型控制移動型機器人的方法。此語音模型經由深度神經網路訓練後，可將特定說話者之語音進行命令辨識後，再將命令傳給機器人執行命令。此語音模型主要包含兩個部分：(1)說話者分離(speaker separation)、(2)自動語音識別(ASR)。在說話者分離上，本論文使用VoiceFilter網路模型來分離說話者語音。VoiceFilter網路模型可分為三部分；(1)說話者聲紋特徵提取、(2)頻譜掩蔽(spectrogram masking)以及(3)損失函數(Loss function)，該模型可通過設置特定說話者之參考音訊，在嘈雜之輸入音訊下專門分離、保留特定說話者之聲紋，其餘人之聲紋皆會將其過濾；自動語音辨識上，本論文使用Conformer語音模型進行語音轉文字之任務。最後經由實驗來實現機器人確實可以經由語音進行動作控制，驗證所提的方法確實有效。

關鍵字

自動語音辨識；源分離；說話者辨識；說話者驗證；語音轉文字

並列摘要

This paper proposes a voice model to control a mobile robot. After the voice model is trained by the deep neural network, it can recognize the voice of a specific speaker, and then transmit the voice command to the robot to execute the command. This speech model mainly includes two parts: (1) speaker separation, (2) automatic speech recognition (ASR). In speaker separation, this paper uses the VoiceFilter network model to separate the speaker's voice. The VoiceFilter network model can be divided into three parts; (1) speaker's voiceprint feature extraction, (2) spectrum masking and (3) loss function, the model can be set by setting a reference for a specific speaker Audio, under noisy input audio, specifically separates and retains the voiceprint of a specific speaker, and the voiceprints of the rest will be filtered. For automatic voice recognition, this paper uses the Conformer voice model to perform the task of voice-to-text. Finally, it is realized through experiments that the robot can indeed be controlled by voice, verifying that the proposed method is indeed effective.