探討語音驗證碼之設計對聽覺選擇性注意力之影響

驗證碼(Completely Automated Public Turing test to Tell Computers and Human Apart, CAPTCHA)是一種用來區分使用者身份是人類或是電腦程式的驗證系統，通常由一個主要問題與其他干擾組成，較常見的設計有文字、語音與圖像等型態。本研究將進行語音驗證碼的探討，目前此類驗證碼多用於語音認證或適合盲胞使用的非視覺化環境介面。常見的破解方法是透過自動語音辨識系統(Automatic Speech Recognition, ASR )分析訊號內容並進行猜測。據相關研究發現，現階段的語音驗證碼為了提高破解難度，其設計對人類來說普遍都太難，表示尚有改善空間。雞尾酒會效應(Cocktail Party Effect)是指人類在吵雜的談話環境中，大腦會優先處理主要的聲音訊號並暫時忽略其他不相關聲音的能力，語音驗證碼的設計若能符合此效應，即可提高人類選擇性注意力的優勢。本研究將以此理論為依據，探討現有語音驗證碼組合中是否包含資訊意義干擾以及男女音調(Pitch)差異對錯誤率與喜好程度的影響。研究結果顯示，干擾中若含有相關資訊意義確實會提升錯誤率，設計時應盡量避免，但最為關鍵的問題卻是男女音調組合的差異性。音調差異較大的組別(如男性播報+群女干擾)錯誤率明顯較低，且主觀喜好評量的分數也較高；而音調差異較小的組別(如男性播報+群男干擾)則出現最高的錯誤率與最低的主觀喜好評價。建議往後語音驗證碼系統設計時可採用音調差異較大的組合(如女播報員+男聲干擾)，既可有效降低程式破解的機率，亦能符合人類聽覺選擇性注意的優勢，提高辨識度。

關鍵字

雞尾酒會效應；音調差異；資訊意義干擾；語音驗證碼

並列摘要

CAPTCHA is a verification system used for distinguishing whether user identity is human or program through a main question and other interferences. The common designs are patterns like characters, voices and images, etc. This study attempts to explore the voice CAPTCHA, currently, this type of CAPTCHA is mostly used in voice verification or circumstance appropriate for the blind people. The common cracking method is to analyze content and conduct guessing through Automatic Speech Recognition (ASR). According to relevant study, it finds that the voice CAPTCHA at this stage is too difficult for most human beings so as to improve cracking difficulty, which means there’s still room for improvement. Cocktail Party Effect refers to the ability of the brains of human beings which will process main audio signals preferentially and ignore other irrelevant ones in noisy environment, if the design of voice CAPTCHA is able to accord with this effect, it will improve the advantage of human beings’ selective attention. Based on this theory, this study explores the existence of information meaning interference in current voice CAPTCHA and the influence of pitch difference between male and female on error rate and preference degree. The study result shows that if there are relevant information meanings in interference, it will truly increase error rate, which should be avoided as possible in design, but the most critical question is the difference in the pitch combination of male and female. There’s significantly lower error rate in the groups which have great pitch difference (for example, male broadcasting and females interference), and the score of subjective preference assessment is higher; Meanwhile, there’s the highest error rate and lowest score of subjective preference assessment in the groups which have small pitch difference (for example, male broadcasting and males interference). It is suggested to adopt the groups which have greater pitch difference (for example, female broadcaster and male interference) when designing voice CAPTCHA in the future, which can both effectively lower the rate of cracking malicious program and accord with the advantage of selective attention in the auditory system of human beings so as to improve visibility. In addition, the broadcast of main message can more adopt the female voice which is rather sensitive for human reaction.

並列關鍵字

Cocktail Party Effect ； Audio CAPTCHA ； information meaning interference ； pitch difference

參考文獻

[7] 林信鋒、蔡正富，「植基於離散小波轉換之聲音浮水印技術」，

Haichang Gao, Honggang Liu, Dan Yao, Xiyang Liu, & Aickelin, U. (2010). An Audio CAPTCHA to Distinguish Humans from Computers. 2010 hird International Symposium on Electronic Commerce and Security (ISECS)(pp. 265-269).

Barry Arons (2008) . A Review of The Cocktail Party Effect .MIT Media Lab Conversational Computer Systems.

Bigham, J. P., & Cavender, A. C. (2009). Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use.Proceedings of the 27th international conference on Human factors in computing systems, CHI ’09 (1829–1838)

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the acoustical society of America,25(5), 975–979.

國際替代計量

探討語音驗證碼之設計對聽覺選擇性注意力之影響

主題瀏覽