台語聲調辨識

本文探討台語聲調自動辨識。使用地藏經台語語音語料庫，我們先使用 HTK (Hidden Markov Model Toolkit) 切出音節，每一個音節的疊合短時音框計算 acf/amdf (autocorrelation function divided by absolute mean difference function)，當作基礎特徵量；然後使用二種方式做台語聲調分類。第一種方式先從基礎特徵量計算音高軌跡，概念上是在音高的等高線圖上，去尋找最大島嶼的山稜線，音高軌跡再配適三階多項式，所配適多項式的係數當做最終特徵量，然後使用線性判別分析(LDA)、二次判別分析(QDA) 等方法分類；在交叉驗證之下，效率 52%~59% 左右。第二種方式是將基礎特徵量當作一張圖片，將圖片標準化，當做輸入的特徵量，然後再使用最近研究效果良好的深度信念網路(Deep Belief Networks， DBN) 做分類，交叉驗證辨識效率可達 72% 以上，顯示DBN 在大量資料之下可獲得較佳結果。

關鍵字

台語聲調辨識；深度信念網路

並列摘要

This thesis explores Taiwanese tone automatic recognition. Using DeZongGing (地藏經) Taiwanese speech corpus and the Hidden Markov Model Toolkit (HTK), we first segment a speech waveform into syllable segments. Then for each syllable segment, short time speech analysis is performed using acf/amdf (autocorrelation function divided by absolute mean difference function). Using these as basic features, we then explore two kinds of classifiers for Taiwanese tones. For the first kind, we further reduce the basic features into the coefficients of third order polynomial fit on the pitch tracks; pitch tracks can be obtained in a different number of ways, and we use the ridge of the largest island size in the acf/amdf map. With now four coefficients for each syllable, we then classify the syllables for their tones using LDA (linear discriminant analysis), QDA (quadratic discriminant analysis). Under cross validation, the accuracies of these classifiers range from 52% to 59%. For the second kind, we treat the basic features as a gray level picture, normalized them into size 28×28, and then use the Deep Belief Networks(DBN) for classification, as in the recognition case of hand written digits. The cross validation accuracies can go upto 72%, with or without noise perturbations.

並列關鍵字

Taiwanese tone recognition ； Deep belief networks

參考文獻

[2] 陳雅婷. (2012). 使用擴展修剪演算法決定語音音週標記及在台語語音合成的應用. 清華大學統計學研究所學位論文, 1-40。

[8] 黃士旗. (2006). “中文語音聲調辨識的改良與錯誤分析”. 清華大學資訊系統與應用研究所學位論文, 1-62。

[15] 游聲峰. (2014). 語音辨識輔助的台語語料庫收集方法探討. 清華大學統計學研究所學位論文。

[5] “Deep learning 學習筆記整理” (2014取閱).

[6] “Deep learning 學習總結” (2014取閱).

國際替代計量

台語聲調辨識

主題瀏覽