透過您的圖書館登入
IP:3.22.61.187
  • 學位論文

基於問卷資料中問答語意之個人偏好推論

Personal Preference Inference Based on Semantics of Question-Answer Pairs in Questionnaire Data

指導教授 : 張時中

摘要


問卷調查是獲知個人偏好的泛用方法,透過電腦快速、精準地從問卷資料推論個人偏好的技術,在商業競爭激烈的現代社會中更顯重要。問卷調查即透過問卷中的問題來描述某物品或行為,再要求個人回答對該物品或行為的偏好為何。一對問題、答案即揭露了個人對行為的偏好。 宋名傑於2015年設計了基於語意的QPIE分析問卷資料並據以推論個人偏好。QPIE首先根據問題的語法萃取三關鍵字來代表問題的語意,再將每個關鍵字轉為語意向量並堆疊之形成問題的語意向量,最後,問題的語意向量根據個人對該問題的回答分類。然而分開考慮問題語意和答案是QPIE可改進之處。 本論文的研究問題為「如何基於問答語意來推論個人偏好?」為解決該問題,本論文的基本構想是轉化問題加答案的語意為問答語意向量,來代表個人偏好行為,再利用新舊個人偏好行為間的語意關聯,推論新的個人偏好行為,即對新問題的回答。因此需要克服以下挑戰:一)輸入的每組問題和答案是分開的兩個句子,如何萃取出兩者的語意?二)如何定義並計算每一組問題、答案之間的語意相似度,並藉此推論新問題當中的個人偏好行為。 本論文針對上述挑戰設計「基於問題答案組合之個人偏好推論模型」(QAPIE)系統,其設計包含以下部分對應上述挑戰: 1. 基於問題語法結構來結合並萃取問答關鍵字以構成問答語意向量之方法 句子中各單字依據語法組成句子。個人偏好行為由問題加答案共同組合而成,其中問題和答案分屬不同句子。問、答的語法結構首先被分析再合併,再用「基於語法結構之關鍵字萃取演算法2.0 (SKEA2.0)」將問題、答案中的關鍵字依動詞、受詞、答案、修飾語的順序萃取出。最後將各關鍵字的語意向量堆疊成問答語意向量。 2. 藉由既有個人偏好行為、候選新偏好行為之間的語意關聯來推理偏好 填答者選擇某答案選項代表該答案選項是他個人偏好的正例,同時他未選的其他答案選項是他個人偏好的反例。所有「既有問題加已選答案」的語意向量劃分為個人偏好行為的正例集合,所有「既有問題加未選答案」的語意向量劃分為個人偏好行為的反例集合。將新問題和可能的答案選項合併成4個候選新問答語意向量,然後使用SVM分類方法找出哪一個候選新問答語意向量最接近個人偏好正例集合而遠離個人偏好反例集合,進而推論該候選新問答語意向量當中的答案選項就是個人對新問題的答案。 為實踐並驗證QAPIE創新之方法設計,本論文採用科技部傳播調查資料庫[TCS13]之問卷資料,藉由1313人對於44個詢問對某行為偏好程度的問題測試QAPIE的推論準確度。本實驗使用[Sun15]定義的預測準確度來評估得QAPIE的預測準確度78.86%,顯著高於QPIE的預測準確度66.65%。 本論文的貢獻在於創新利用問卷中問題和答案的語意來描述個人偏好行為,並藉此設計了QAPIE利用個人偏好行為之間的語意關係來推論新的個人偏好行為,亦即推論個人對新問題的偏好回答。具體貢獻條列如下: (1) 將答案語意納入推論過程已捕捉其中隱含的偏好。 (2) 使用建構句子的語法來結合問題、答案兩個句子並萃取出能代表其語意的關鍵字。 (3) 問題、答案的語意被結合成個人偏好行為來考慮,找出哪個問題加哪個答案語意上最相似已知的個人偏好行為。 (4) 藉由實際問卷資料驗證基於以上方法所實作之系統QAPIE之個人偏好推論準確度平均達78.86%,顯著高於參考基準QPIE之66.65%。

並列摘要


Questionnaire survey is a widely used approach to acquire personal preference. The technique to infer personal preference using computers is even more important for businesses in the competitive modern society. Questionnaire survey describes certain behaviors or objects in questions, and asks respondents to rate their preferences to those behaviors or objects by answering questions. A question and answer (Q&A) pair therefore reveals Personal Preference to a Behavior (PPB). Ming-Chieh Sung, 2015, designed a semantics-based approach, QPIE, to analyze and infer personal preferences from a questionnaire survey. QPIE first extracts three keywords as semantics of every question, according to the syntax of the question. Second, it transforms every keyword into a semantic vector and concatenates three semantic vectors of keywords to form a semantic vector of the question. At last, semantic vectors of questions are classified into four categories which represent four answer options. The main deficiency of QPIE is the separate consideration of a question and its answer. In this thesis, the research problem is: how to extract and infer personal preferences based on the semantics of each Q&A pair? To solve the problem, there are two ideas. First, a Q&A pair is transformed into a Q&A semantic vector to represent a PPB. Second, the answer to a test question is inferred by identifying the answer that form a Q&A pair most similar to the training PPBs of the person. Accordingly, there are two challenges: (1) An input question and the associated answer are essentially two different sentences. How should the semantics be extracted from them? (2) How should the semantic similarity be defined and calculated among Q&A pairs for inferring PPB of a test question? To solve the problem, this thesis designs a “Question and Answer-based Personal Preference Inference Engine (QAPIE),” which consists of two parts matching the two challenges above. 1. Q&A semantic vector construction by extracting and combining keywords from a Q&A pair based on its syntax. Syntax is the structure to construct a sentence from words within. The syntaxes of Q&A are first parsed as dependency trees and merged. Then Syntax-based Keywords Extraction Algorithm 2.0 (SKEA2) is utilized to identify verb, object, answer, modifier as keywords in order from the merged Q&A. At last, semantic vectors of keywords are concatenated to form a Q&A semantic vector. 2. Preference inference based on the semantic similarity among test PPB and training PPBs A respondent selects an answer option to a training question as a positive instance of his preference, while he leaves other answer options unselected as negative instances of his preference. A PPB set of positive instances is constructed with every “training question & selected answer option”, while the other PPB set of negative instances is constructed with every “training question & unselected answer option”. A test question is then combined with possible answer options to formulate candidates of test PPB. The SVM classifier is then used to classify which candidate of test PPB is most likely to belong to the PPB set of positive instances and dissimilar from PPB set of negative instances, so that the answer in that candidate is the inferred answer to the test question. To implement and verify QAPIE, this thesis tests the Inference Accuracy of QAPIE on the questionnaire data from Taiwan Communication Survey with 1313 respondents and 44 questions regarding personal preferences to Internet usage behaviors. In this experiment QAPIE achieves 78.86% inference accuracy which is significantly higher than 66.65% achieved by QPIE due to using ordinal relationships among PPBs. The main contribution of this thesis is to describe personal preferences by the semantics of questions as well as answers, and thus design QAPIE to infer test personal preferences based on the semantic relationships among PPBs. Specifically, contributions are listed below: (1) Answer semantics is adopted in preference inference process to capture the preference implied within. (2) Syntax constructing sentences from words is utilized to combine and extract keywords from Q&A as two sentences. (3) Q&A are bound as PPB to compare semantics of answer as well as question, so the answer to a test question can be inferred by identifying the answer that form a Q&A pair most similar to the training PPBs. (4) On real questionnaire data, QAPIE achieves 78.86% Inference Accuracy which is significantly higher than 66.65% achieved by QPIE as a benchmark.

參考文獻


[Hug68] Hughes, G., 1968. On the mean accuracy of statistical pattern recognizers. IEEE transactions on information theory, 14(1), pp.55-63.
[JPK11] Han, J., Pei, J. and Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.
[MaJ00] Martin, J.H. and Jurafsky, D., 2000. Speech and language processing. International Edition, 710, p.25.
[MYZ13] Mikolov, T., Yih, W.T. and Zweig, G., 2013, June. Linguistic regularities in continuous space word representations. In hlt-Naacl (Vol. 13, pp. 746-751).
[Sch05] Scherer, K.R., 2005. What are emotions? And how can they be measured?. Social science information, 44(4), pp.695-729.

延伸閱讀