以crowdsourcing方式建立SKYPE/SILK網路電話
使用者經驗模型: 數據的收集、過濾與分析

對於VoIP服務，使用者體驗的建模與量測一直是個課題，有了Crowdsourcing平台的幫助，使用者體驗的研究者能夠執行大規模的使用者調查，從更大且更廣的人口分布中。大量的受測者資料在一定的時間中能夠被輕易地取得。然而，關於以Crowdsourcing的方式推導一個可靠的QoE模型的細節常常被忽略且值得被關注。目前的研究提供三個主要的貢獻，首先，三階段的使用者調查在Crowdsourcing平台上執行，參與者對數個帶有不同網路延遲的網路電話通話做滿意度評分，再利用得到的資料建立預測模型。結果中得到端對端延遲對使用者體驗的影響為指數型的衰退。其次，本論文也對於先前實驗室所有收集的使用者資料，做了一系列的分析並且提供檢測資料可靠度的方法。本文提出的三個主要的檢測方法，Cheat proof test、Normality test 和 Convergence test。其中，Cheat proof test能夠自動根據使用者的評分和資訊來判斷資料是否該被濾除、Normality test 則是來檢視使用者評分的分布是否符合Normal distribution、Convergence test利用數值分析的方式對使用者評分做收斂性的分析。第三點，本論文利用上述三種方法交叉比對不同的資料集(位元率、封包遺失率、網路延遲)的檢測結果，並且三種檢測方法的有效度及資料的特性夠被詳細地分析及討論。

關鍵字

網路電話；使用者感受； Crowdsourcing ；心理物理學

並列摘要

Modeling and measurement of user experience for Voice over IP (VoIP) service has long been a subject of study. With the help of crowdsourcing platform, researchers of studying user perception are allowed to perform user study from a large and diverse population. Moreover, an amount of subjects/user score data can be easily collected in a limited time. However, some details concerning the process of deriving a reliable QoE model with crowdsourcing was often being neglected but desperately needed to be addressed. Current study provides three main contributions. First, 60 participants are recruited to score emulated Skype calls with different levels of delay, and 44 users’ data are adopted to build a closed-form QoE model. Results show that the end-to-end delay has an impact on the user experience on an exponential scale. Second, taking all our previous user studies as an example, a set of analysis and quality control methodologies for user scores data are provided to increase the reliability of our study. Proposed methodologies involved in three kinds of test: cheat-proof test, normality test and convergence test. Proposed cheat-proof test investigates the details of how users’ data were screened based on their behaviors on rating scores. Normality test shows the scores in most of tracks are normally-distributed. Convergence test examines the scores did reach pre-defined convergence criterion in a numerical view. Third, by cross-comparing the results of three tests, the effectiveness and results of these tests were discussed and analyzed respectively among three data sets (bit-rate, loss rate and delay).

並列關鍵字

VoIP ； Crowdsourcing ； User Perception ； QoE ； Psychophysics

參考文獻

[2] ITU-T, Recommendation. "P. 862." Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001).

[4] ITU-T, P. 880: Continuous evaluation of time varying speech quality. 2004.

[6] Chen, S., Chu, C. Y., Yeh, S. L., Chu, H. H., & Huang, P. (2014). Modeling the qoe of rate changes in SKYPE/SILK VoIP calls. IEEE/ACM Transactions on Networking (TON), 22(6), 1781-1793.

[7] McGraw, Kenneth O., Mark D. Tew, and John E. Williams. "The integrity of Web-delivered experiments: Can you trust the data?." Psychological Science 11.6 (2000): 502-506.

[8] Surowiecki, James. The wisdom of crowds. Anchor, 2005.

國際替代計量

以crowdsourcing方式建立SKYPE/SILK網路電話使用者經驗模型: 數據的收集、過濾與分析

全文下載

主題瀏覽