透過您的圖書館登入
IP:18.188.61.223
  • 學位論文

基於問卷資料之個人偏好推論模型設計

Design of Personal Preference Inference from Questionnaire Data with Exemplary Application

指導教授 : 張時中
共同指導教授 : 陸寶森(Peter B. Luh)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在快速變遷之數位化時代,為了更有效預測市場需求及提供給顧客適切服務,如何透過電腦自動化辨別、並推論個人偏好的重要性越發增加。其中,透過問卷來獲知個人偏好已被廣泛運用。然而現有的問卷分析方法,僅能就問卷所提問題來分析,無法進一步預測個人對於新問題的潛在偏好。因此,如何透過分析既有問卷中問題間的關聯,來推論個人對於新問題的偏好是個相當重要且挑戰的議題。 為了預測新偏好,勢須了解不同問題間之關聯性,而本論文認為此關聯性可建立在問題間的語意相關性之上。因此偏好推論之方法牽涉下列四點挑戰:一)建立包含個人答案、及問卷問題意涵之問卷知識庫。二)利於電腦自動化分析之問題數值化表示法。三)推論既有問題與新問題間語意關聯性之方法。四)預測新問題之偏好機率。 有鑑上述挑戰,本論文設計一連串以語意為基礎之偏好分析方法,並將其整合為一”基於問卷資料之個人偏好推論模型”(QPIE) 系統作為實踐。此外,本論文透過科技部傳播調查資料庫之問卷資料,實際測試系統之推論能力。QPIE在方法層面的創新上整合了既有之軟體,和作者自行設計之程式與溝通介面分述如下: (1) 單句問題之語意抽象化表達 以電腦自動化擷取合適之句意關鍵字來呈現句子意義是個具挑戰性的任務。因此,QPIE利用了句子的語法結構來幫助達成知識之抽象化。QPIE採用了概率自然語言語法分析器(Stanford Dependency Parser)作為獲取語句依存樹資訊的工具,透過語句依存性結果來幫助電腦挑選合適之句意關鍵字。藉著語句依存樹,提出了〝基於語句結構之關鍵字萃取演算法〞(SKEA)來辨別句子中句意相關的關鍵字,並呈現句子的句意。 (2) 單句問題在語意空間中之數值化表示法 此外,QPIE利用word2vec來對句意關鍵字作含有語意訊息的編碼,將文字轉換為向量表示。Word2vec是建立於類神經網路之模型,能透過學習語料庫間字與字的關係,提供單字之數值化向量,文字向量即為語意相似性推論的基礎。若視每一句子為關鍵字之整合,則句子之數值化且含有語意訊息之表達即可透過整合各單字向量來實現。 (3) 問題之語意關聯性推論 一旦有了問題的語意向量,QPIE將透過分類演算法,藉著問題所屬分類來推論相關問題之答案,並以支持向量機(SVM)為分類演算法之實現。SVM將學習計有問題與新問題間的語意關聯性,並將問題根據答案分類為不同族群,透過問題間關聯性做出新問題偏好的機率預測。 (4) 以現有資料推論對新問題之偏好答案 本論文提供一參考實踐系統來實現QPIE之方法設計,整合現有之Stanford Dependency Parser, word2vec, LIBSVM、本研究所設計之SKEA,並利用VirtualBox和MATLAB中之資料交換介面,做到系統的整合與運行。藉由1313人對於44個偏好程度四選一(從不、很少、有時、經常)的問卷問題的真實填答資料中,進行系統測試與實驗。實驗一證明了既有問題與測試問題間的語意高度相似性可反映在偏好機率預測上;而實驗二中,QPIE於14個測試問題、1313次的個人偏好測試結果中,有著平均達66.65%的預測準確度,顯著高於隨機猜測結果。 本論文的貢獻在於跳脫既有問卷分析方法,設計基於語意之創新偏好分析方法(QPIE),能基於問題間語意關聯性,更有效推論個人對於新問題之偏好。基於前述創新,將之整合微系統並加以實踐、透過測試問題的準確度來評估系統表現。此外,透過實驗結果發現並證明了,問題間的語意關聯性強弱,能透過偏好機率來表現,且透過深入分析偏好機率,可更了解個人答題之模式。具體貢獻分列如下: (1) 對於單句選擇題之抽象化表達。 (2) 基於既有資文字向量化表示法,設計了完整句子的數值化表示法供電腦自動化處理。 (3) 採用SVM來學習數值化表示後的句子間的語意關聯。 (4) QPIE系統之參考實踐。 (5) 驗證偏好機率與語意相似性之關聯性 (6) 藉由實際問卷資料驗證QPIE之個人偏好推論能力平均達66.65%,顯著高於隨機猜測結果(47.66%)。

並列摘要


In a rapidly developing digital society, computer identification and inference of personal preference is more important than ever to predict market trends and tailor services to customers. To assess personal preference, questionnaires are often used as a direct approach. Current methods in questionnaire analysis, however, are only able to derive preferences stated directly in questionnaires. To predict personal preferential answer to a new question, a methodology is needed to profile a person and to perform inference based on a knowledge base of existing questionnaire data. This thesis designs a semantic-based methodology – “Questionnaire data-based Personal preference Inference Engine” (QPIE) to predict the preferential answer to a new question by analyzing the relationships between the new questions and the existing questions. Such relationships include the semantic meaning of each question and the associated answer. QPIE innovatively integrates existing methods and the corresponding tools in the public domain into a implemented system, and successfully solves the following four challenges arising from personal preference inference processing: i) Construction of knowledge base of questionnaires, including personal preference profile from answers and meaning of questions, ii) Numerical representation of meaning of questions and answers for further computer processing, iii) Inferring semantic relationships between existing questions and the new questions, and iv) Predicting the preferential answer. The design of QPIE consists of following four parts in response to the four challenges: (1) Semantic abstraction of single-sentence questions It is challenging to extract proper keywords by computer processing for representing meaning of a question. QPIE first exploits the grammatical structure of a sentence to facilitate abstraction by adopting a probabilistic natural language parser, the Stanford Dependency Parser, for deriving dependency-parsing tree of each question. Based on the parsing result, a Syntax-based Keyword Extraction Algorithm (SKEA) identifies keywords to represent the meaning of each single-sentence question. (2) Numerical representation of a single-sentence question in “semantic” space QPIE then applies word2vec to encode each keyword of a question to a numeric vector representation based on its semantics. Word2vec is a class of neural-network models that provides each word with a set of numerical coordinates in a semantic space learned from an un-labeled corpus. Vectors of words serve as the foundation of semantic similarity calculation. By treating a sentence as a concatenation of syntax-based keywords, QPIE encodes the semantics of a single-sentence question into a vector by concatenating the vectors of syntax-based keywords of the question. (3) Semantic inference among questions Once semantic-based vectors of questions are available, QPIE performs straightforward classification of questions according to their respective answers, one class per preferential answer choice. To infer the preferential answer to a new question, QPIE adopts support vector machine (SVM) as a probabilistic classifier to calculate, by exploiting the semantic-based vectors of questions, the similarity of the new question to existing questions in each class and the preference probability of choosing the answer of the class. (4) Preferential answer prediction for new questions based on real questionnaire data A reference implementation of this research implements QPIE methodology into a system by exploiting existing tools including Stanford Dependency Parser, word2vec, and LIBSVM, and new design, SKEA. System integration is realized by sharing data folder between MATLAB® and VirtualBox®. The training and testing data set consist of 44 single-sentence questions, each with the same four possible choices: {never, seldom, sometimes, often}, selected from Taiwan Communication Survey . In the Experiment 1, it is proven that higher preference probability can be related to higher semantic similarity between training and testing questions. In the Experiment 2, QPIE statistically and significantly outperforms the random guess approach by personal average accuracy of 66.65% over 1,313 people in predicting answers of 14 testing questions. The contribution of this thesis is an innovative design of a semantic-based methodology, QPIE, for enriching questionnaire analysis with personal preference inference capacity, which is capable of predicting personal preferential answers to new questions according to semantic relationships among questions. Based on the design, an integrated system is developed, which can be evaluated by prediction accuracy. Besides, inspirational results proven and discussed in experiments include that preference probability to new questions accounting for the semantic similarity, and further analysis of preference probabilities showing insights of personal patterns. Specifically, contributions include: (1) Abstracting the meaning of single-sentence, multiple-choice questions ; (2) Representing each question numerically for computer processing based on externally trained word vectors; (3) Semantical inference from numerical representation of questions by adopting SVM model; (4) Reference implementation of QPIE into a system; (5) Verification of the preference probability accounting for semantic similarity; (6) Achievements of i) personal average accuracy of 66.65%, significantly higher than random guess (47.66%).

參考文獻


[AdT05] Gediminas Adomavicius, and Alexander Tuzhilin, "Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions," Knowledge and Data Engineering, IEEE Transaction on, 17.6, pp. 734-749, IEEE, 2005.
[AhU72] Alfred V. Aho, and Jeffrey D. Ullman, The theory of parsing, translation, and compiling, Prentice-Hall, Inc., 1972.
[BaB82] Anton P. Barten, and Volker Böhm, Consumer theory, Vol. 2, 1982.
[BEL85] Ben-Akiva, Moshe E., and Steven R. Lerman, Discrete choice analysis: theory and application to travel demand, Vol. 9, MIT press, 1985.
[Ben08] Yoshua Bengio, "Neural net language models," Scholarpedia 3.1, 3881, 2008.

延伸閱讀