Translated Titles

A study of the effect of Audio CAPTCHA design on auditory selective attention



Key Words

雞尾酒會效應 ; 音調差異 ; 資訊意義干擾 ; 語音驗證碼 ; Cocktail Party Effect ; Audio CAPTCHA ; information meaning interference ; pitch difference



Volume or Term/Year and Month of Publication


Academic Degree Category




Content Language


Chinese Abstract

驗證碼(Completely Automated Public Turing test to Tell Computers and Human Apart, CAPTCHA)是一種用來區分使用者身份是人類或是電腦程式的驗證系統,通常由一個主要問題與其他干擾組成,較常見的設計有文字、語音與圖像等型態。本研究將進行語音驗證碼的探討,目前此類驗證碼多用於語音認證或適合盲胞使用的非視覺化環境介面。常見的破解方法是透過自動語音辨識系統(Automatic Speech Recognition, ASR )分析訊號內容並進行猜測。據相關研究發現,現階段的語音驗證碼為了提高破解難度,其設計對人類來說普遍都太難,表示尚有改善空間。 雞尾酒會效應(Cocktail Party Effect)是指人類在吵雜的談話環境中,大腦會優先處理主要的聲音訊號並暫時忽略其他不相關聲音的能力,語音驗證碼的設計若能符合此效應,即可提高人類選擇性注意力的優勢。本研究將以此理論為依據,探討現有語音驗證碼組合中是否包含資訊意義干擾以及男女音調(Pitch)差異對錯誤率與喜好程度的影響。 研究結果顯示,干擾中若含有相關資訊意義確實會提升錯誤率,設計時應盡量避免,但最為關鍵的問題卻是男女音調組合的差異性。音調差異較大的組別(如男性播報+群女干擾)錯誤率明顯較低,且主觀喜好評量的分數也較高;而音調差異較小的組別(如男性播報+群男干擾)則出現最高的錯誤率與最低的主觀喜好評價。建議往後語音驗證碼系統設計時可採用音調差異較大的組合(如女播報員+男聲干擾),既可有效降低程式破解的機率,亦能符合人類聽覺選擇性注意的優勢,提高辨識度。

English Abstract

CAPTCHA is a verification system used for distinguishing whether user identity is human or program through a main question and other interferences. The common designs are patterns like characters, voices and images, etc. This study attempts to explore the voice CAPTCHA, currently, this type of CAPTCHA is mostly used in voice verification or circumstance appropriate for the blind people. The common cracking method is to analyze content and conduct guessing through Automatic Speech Recognition (ASR). According to relevant study, it finds that the voice CAPTCHA at this stage is too difficult for most human beings so as to improve cracking difficulty, which means there’s still room for improvement. Cocktail Party Effect refers to the ability of the brains of human beings which will process main audio signals preferentially and ignore other irrelevant ones in noisy environment, if the design of voice CAPTCHA is able to accord with this effect, it will improve the advantage of human beings’ selective attention. Based on this theory, this study explores the existence of information meaning interference in current voice CAPTCHA and the influence of pitch difference between male and female on error rate and preference degree. The study result shows that if there are relevant information meanings in interference, it will truly increase error rate, which should be avoided as possible in design, but the most critical question is the difference in the pitch combination of male and female. There’s significantly lower error rate in the groups which have great pitch difference (for example, male broadcasting and females interference), and the score of subjective preference assessment is higher; Meanwhile, there’s the highest error rate and lowest score of subjective preference assessment in the groups which have small pitch difference (for example, male broadcasting and males interference). It is suggested to adopt the groups which have greater pitch difference (for example, female broadcaster and male interference) when designing voice CAPTCHA in the future, which can both effectively lower the rate of cracking malicious program and accord with the advantage of selective attention in the auditory system of human beings so as to improve visibility. In addition, the broadcast of main message can more adopt the female voice which is rather sensitive for human reaction.

Topic Category 理工學院 > 工業工程與管理系
工程學 > 工程學總論
社會科學 > 管理學
  1. [7] 林信鋒、蔡正富,「植基於離散小波轉換之聲音浮水印技術」,
  2. Barry Arons (2008) . A Review of The Cocktail Party Effect .MIT Media Lab Conversational Computer Systems.
  3. Bigham, J. P., & Cavender, A. C. (2009). Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use.Proceedings of the 27th international conference on Human factors in computing systems, CHI ’09 (1829–1838)
  4. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the acoustical society of America,25(5), 975–979.
  5. Chan, T.-Y. . (2003). Using a test-to-speech synthesizer to generate a reverse Turing test. 15th IEEE International Conference on Tools with Artificial Intelligence, 2003. Proceedings (p 226 – 232).
  6. Chellapilla, K., Larson, K., Simard, P. Y., & Czerwinski, M. (2005). Building segmentation based human-friendly human interaction proofs (HIPs). Proceedings of Second International Workshop on Human Interactive Proofs, May 2005, 1–26.
  7. Cainer, K. E., James, C., & Rajan, R. (2008). Learning speech-in-noise discrimination in adult humans. Hearing Research, 238(1–2), 155–164.
  8. Du, Y., Kong, L., Wang, Q., Wu, X., & Li, L. (2011). Auditory frequency-following response: A neurophysiological measure for studying the “cocktail-party problem”. Neuroscience & Biobehavioral Reviews, 35(10), 2046–2057.
  9. Haichang Gao, Honggang Liu, Dan Yao, Xiyang Liu, & Aickelin, U. (2010). An Audio CAPTCHA to Distinguish Humans from Computers. 2010 hird International Symposium on Electronic Commerce and Security (ISECS)(pp. 265-269).
  10. Killion, M.C., Niquette, P.A., Gudmundsen, G.I., (2004). Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss innormal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am.116, 2395–2405.
  11. Kolupaev, A., & Ogijenko, J. (2008). CAPTCHAs: Humans vs. Bots. IEEE Security & Privacy, 6(1), 68-70.
  12. Soupionis, Y., Tountas, G., & Gritzalis, D. (2009). Audio CAPTCHA for SIP-Based VoIP., Emerging Challenges for Security, Privacy and Trust, IFIP Advances in Information and Communication Technology ( 297, p25–38).
  13. Venetjoki, N., Kaarlela-Tuomaala, A., Keskinen, E., & Hongisto, V. (2006). The effect of speech and speech intelligibility on task performance. Ergonomics, 49(11), 1068–1091.
  14. Yan, J., & El Ahmad, A. S. (2008). Usability of CAPTCHAs or usability issues in CAPTCHA design. Proceedings of the 4th symposium on Usable privacy and security, 44–52
  15. 中文部分
  16. [1] 韓寶強,「音的歷程」,中國文聯出版社,2003。
  17. [2] 蕭雅文,「聽力學導論」,五南圖書出版公司,第8-31頁,2008。
  18. [3] 詹智閔,「小波分析在音樂訊號上的應用」,生醫電資所,2012。
  19. [4] 徐茂銘,漫談聽力及聽力障礙,1977。
  21. [5] Henry Doktorski,「在線音樂理論:音調、音質與音色」,2012。
  22. http://www.dolmetsch.com/musictheory27.htm
  23. [6] 張智星,「音訊處理與辨識」,2006。
  24. http://neural.cs.nthu.edu.tw/jang/books/
  25. audiosignalprocessing/index.asp
  26. 東華大學,2004。
  27. [8] 劉超群,「雙重任務干擾與資訊處理模式」,國立中央大學機械工程學
  28. 系,2010。
  29. [9] 許勝雄、彭游、吳水丕,「人因工程」,滄海書局,第118-119頁,2004。
  30. 英文部分
  31. Ahn, L., Blum, M., & Hopper, N. J. (2004). Telling humans and computers apart (Automatically) or How lazy cryptographers do AI.Communications of the ACM, 47, 57–60.
  32. Brokx J.P.L and Nooteboom S.G. (1982) Intonation and the perceptual separation of simultaneous voices. Journal of Phonetics 10,23-36.
  33. Helenius, R. & Hongisto, V., (2004), The effect of acoustical improvement of an open-plan office on workers. Proceedings of Inter-Noise 2004, Paper 674, 21–25 August.
  34. Jennifer Tam, Jiri Simsa, Sean Hydn, and Luis Von Ahn.(2007). Breaking Audio CAPTCHAs. Computer Science Department Carnegie Mellon University .
  35. Jennifer Tam, Jiri Simsa, David Huggins-Daines, Luis von Ahn, and Manuel Blum.(2008). Improving Audio CAPTCHAs. Computer Science Department,Carnegie Mellon University .
  36. Sauer, G., Hochheiser, H., Feng, J., & Lazar, J. (2008). Towards a universally usable CAPTCHA. Proceedings of the Symposium onAccessible Privacy and Security.
  37. Wickens,C. D.(1984).Engineering psychology and human performance.Upper Saddle River New Jersey 07458.