簡易檢索 / 詳目顯示

研究生: 簡培修
Pei Hsiu Chien
論文名稱: 以支持向量機為基礎之問卷填答識別研究
Support Vector Machine Based Questionnaire Marking Recognition Research and Applications
指導教授: 李忠謀
Lee, Chung-Mou
學位類別: 博士
Doctor
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 74
中文關鍵詞: 問卷填答識別表單處理系統支持向量機試卷評分系統
英文關鍵詞: questionnaire marking recognition, form processing system, support vector machine, exam grading system
論文種類: 學術論文
相關次數: 點閱:89下載:18
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現今電腦網路蓬勃發展的世代,部分的紙本問卷已轉成線上問卷,方便快速統計結果,然而仍然有許多電腦與網路不便使用的場合,例如:餐廳用餐、商店購物、銀行存提款、參加產品發表會或研討會、或是到政府機關洽公等,在這些場景中,通常不方便提供電腦及網路供問卷填寫,若要在第一時間取得意見回饋,紙本型式的問卷還是最直接且最便利的管道。而一般問卷設計,為了讓填答者方便填寫,以及快速統計填答結果,大部分會以選擇題方式呈現,不論是學術研究領域或是商業軟體,對這一類型問題的處理方式仍以計算填答區域中的可視點數量,作為是否有被標記之主要依據,然而雜訊問題以及填答者填答方式的多樣性(勾選、畫叉、塗滿等),經常讓這些計算可視點數的方法無法正確辨識選項是否被標記。
    本論文提出一套完整的問卷處理流程,從空白的問卷自動擷取填答區域,並依照題目順序加以群組,輔助問卷設計者建立填答區域的model檔案;然後以支持向量機方法結合輔助判定規則,進行標記自動辨識,利用機器學習的途徑解決雜訊的問題,提高辨識正確率;同時嘗試利用「填答者意圖」的理念,嘗試解決填答者塗改答案的問題,而在實驗部份,以兩個真實的問卷應用驗證系統效能,另外,擴展系統功能為大學新生智慧財產權測驗進行評分。實驗結果顯示,SVM對於選項是否被標記的正確率達到99%以上;另一方面,以問題為基礎的正確率也達98%以上。最後本論文亦提出混合型支持向量機的作法來處理非一般性的選項符號,經實驗的結果顯示,將混合型支持向量機應用在上述的問卷與試卷,其正確率也都可達95%以上,表示混合型的SVM可應用於對正確率要求不是那麼高的問卷。

    Even in this electronic age, paper-based forms are still very much part of daily life. Filling out the service quality questionnaire during a flight, completing survey after attending a seminar, and filling out a passport application form are all common tasks that still require some paper and pen-based form input. If a large number of forms are to be collected, a form processing system that can automatically extract and tally inputs of the forms would be needed to save time and to prevent errors. Most systems recognize marks in regions of interest by counting the visible pixels in them. However, the accuracy of mark recognition is strongly affected by noises because the respondent may use various types of input as marks.
    The proposed system divides the automatically marking recognition process into two stages. The first stage is to recognize regions of interest and group them by each problem automatically. The second stage is to recognize marks made by respondents. The system applies the SVM method as major technology to avoid the noise problem. The respondent’s intent is also considered for eliminating the cross-out marks. The proposed system was put to use at two different instances. First, the system was used to automatically tally and report results of a quality of (University) service questionnaire and end-of-semester course survey. Second, the system was used to automatically grade the Intellectual Property Rights Exam taken by the incoming freshmen. The accuracy of the SVM classifier for checked/unchecked mark detection is higher than 99%, and the accuracy is above 98% about recognizing the choice for each question. Finally, we propose a blend SVM for new different types of symbols used as options which usually need to retrain a new SVM. The same questionnaires and test were used for evaluating the performance of the blend SVM. The accuracy is a little lower, but holds above 95%. That means the blend SVM is suitable for those new questionnaires which may allow lightly lower accuracy.

    第一章 緒論 1 第一節 研究背景與動機 1 第二節 問題與挑戰 4 第三節 論文架構 7 第四節 名詞解釋 8 第二章 文獻探討 10 第一節 表單文件的處理 10 第二節 選票自動判別處理 14 第三節 支持向量機理論回顧 18 第三章 系統架構及研究方法 21 第一節 系統架構 21 第二節 填答區域的自動辨識 24 第三節 填答區域的自動群組 28 第四節 已填寫之問卷與空白問卷的疊合對齊 30 第五節 標記的辨識 32 第四章 實驗結果與討論 35 第一節 各類型問卷填答區域的辨識 36 第二節 SVM方法適切性評估 40 第三節 問卷的填答處理 44 第四節 延伸應用:試卷自動評分 50 第五節 通用型支持向量機的建立與測試 53 第五章 結論 55 參考著作 58 附 錄 61 A. 標記辨識結果檔案範例 61 B. 填答區的規則樣板 64 C. 選票資料辨識結果 66

    [1] C. C. Aydin and G. Tirkes, “Open source learning management systems in e-learning and Moodle,” in Education Engineering (EDUCON), 2010 IEEE, 2010, pp. 593-600.
    [2] R. Casey, D. Ferguson, K. Mohiuddin, and E. Walach, “Intelligent forms processing system,” Machine Vision and Applications, vol. 5, pp. 143-155, 1992.
    [3] C.-C. Chang, and C.-J. Lin, “LIBSVM: A library for support vector machines,” Intelligent Systems and Technology, ACM Transactions on, vol. 2, no. 3, pp. 1-27, 2011.
    [4] C.-C. Chang, and C.-J. Lin, “LIBSVM: A library for support vector machines,” Technical Report, Department of Computer Science and Information Engineering, National Taiwan University. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
    [5] J. L. Chen and H. J. Lee, “An efficient algorithm for form structure extraction using strip projection,” Pattern Recognition, vol. 31, pp. 1353-1368, 1998.
    [6] P. H. Chien, and G. C. Lee, “A template-based method for identifying input regions in survey forms,” Pattern Recognition and Image Analysis, vol. 21, no. 3, pp. 469-472, 2011.
    [7] P. H. Chien and G. C. Lee, “An automated exam marking system for paper-based multiple-choice tests,” in The IASTED International Conference on
    Computers and Advanced Technology in Education, CATE 2011, Cambridge, UK, 2011
    [8] A. Cordero, T. Ji, A. Tsai, K. Mowery, and D. Wagner, “Efficient user-guided ballot image verification,” presented at the Proceedings of the 2010 international conference on Electronic voting technology/workshop on trustworthy elections, Washington, DC, 2010.
    [9] A. Dengel and B. Klein, “smartFIX: A Requirements-Driven System for Document Analysis and Understanding,” presented at the Proceedings of the 5th International Workshop on Document Analysis Systems V, 2002.
    [10] C. W. Hsu, C. C. Chang, and C. J. Lin, “A practical guide to support vector classification,” Technical Report, Department of Computer Science and Information Engineering, National Taiwan University. [Online]. Available : http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
    [11] L. Jinhui, D. Xiaoqing, and W. Youshou, “Description and recognition of form and automated form data entry,” in Document Analysis and Recognition (ICDAR), 1995 International Conference on, 1995, pp. 579-582.
    [12] B. Klein and A. Dengel, “Problem-adaptable document analysis and understanding for high-volume applications,” International Journal on Document Analysis and Recognition, vol. 6, pp. 167-180, 2003.
    [13] B. Klein, S. Agne, and A. D. Bagdanov, “Understanding document analysis and understanding (through modeling),” in Document Analysis and Recognition (ICDAR), International Conference on, 2003, pp. 1218-1222.
    [14] C.-L. Liu, H. Sako, and H. Fujisawa, “Discriminative learning quadratic discriminant function for handwriting recognition,” Neural Networks, IEEE Transactions on, vol. 15, pp. 430-444, 2004.
    [15] D. Lopresti, “Instructions for Ground-Truthing OpScan Ballot Images”, PERFECT Project, [Online]. Available: http://perfect.cse.lehigh.edu/Documents/GTinstructions_1-18-10.pdf
    [16] D. Lopresti, G. Nagy, and E. B. Smith, “A Document Analysis System for Supporting Electronic Voting Research,” in Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on, 2008, pp. 167-174.
    [17] D. Lopresti, G. Nagy, and E. H. B. Smith, “Document analysis issues in reading optical scan ballots,” presented at the Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Boston, Massachusetts, 2010.
    [18] G. Nagy, B. Clifford, A. Berg, G. Saunders, D. Lopresti, and E. B. Smith, “Camera-Based Ballot Counter,” in Document Analysis and Recognition (ICDAR), 2009 International Conference on, 2009, pp. 151-155.
    [19] G. Nagy, D. Lopresti, E. H. B. Smith, and Z. Wu, “Characterizing challenged Minnesota ballots,” in Proc. SPIE, San Francisco Airport, California, USA 2011, p. 787413.
    [20] N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos, and N. Papamarkos, “Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths,” Image and Vision Computing, vol. 28, pp. 590-604, 2010.
    [21] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 9, pp. 62-66, 1979.
    [22] J. C. Perez-Cortes, L. Andreu, and J. Arlandis, "A Model-Based Field Frame Detection for Handwritten Filled-in Forms," in Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on, 2008, pp. 362-368.
    [23] P. Sarkar and G. Nagy, “Style consistent classification of isogenous patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, pp. 88-98, 2005.
    [24] N. Sherkat, T. Allen, and S. Wong, “Use of colour for hand-filled form analysis and recognition,” Pattern Anal. Appl., vol. 8, pp. 163-180, 2005.
    [25] C. Singh, N. Bhatia, and A. Kaur, “Hough transform based fast skew detection and accurate skew correction methods,” Pattern Recognition, vol. 41, pp. 3528-3546, 2008.
    [26] E. H. B. Smith, S. Goyal, R. Scott, and D. Lopresti, “Evaluation of Voting with Form Dropout Techniques for Ballot Vote Counting,” in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp. 473-477.
    [27] E. H. B. Smith, D. Lopresti, and G. Nagy, “Ballot mark detection,” in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008, pp. 1-4.
    [28] E. H. B. Smith, D. Lopresti, G. Nagy, and W. Ziyan, “Towards Improved Paper-Based Election Technology,” in Document Analysis and Recognition (ICDAR), 2011 International Conference on, 2011, pp. 1255-1259.
    [29] E. H. B. Smith, G. Nagy, and D. Lopresti, “Mark detection from scanned ballots,” in Proc. SPIE, San Jose, CA, USA, 2009, pp. 72470P-10.
    [30] P. Soille, Morphological Image Analysis: Principles and Applications 2ed. Secaucus, NJ: Springer-Verlag New York, Inc., 2003.
    [31] S. Taylor, R. Fritzson, and J. Pastor, “Extraction of data from preprinted forms,” Machine Vision and Applications, vol. 5, pp. 211-222, 1992.
    [32] L. Y. Tseng and R. C. Chen, “Recognition and data extraction of form documents based on three types of line segments,” Pattern Recognition, vol. 31, pp. 1525-1540, 1998.
    [33] D. Tuganbaev, A. Pakhchanian, and D. Deryagin, “Universal data capture technology from semi-structured forms,” in Document Analysis and Recognition (ICDAR), 2005 International Conference on, 2005, pp. 458-462.
    [34] V. N. Vapnik, “The nature of statistical learning theory”, New York, NY: Springer-Verlag, 1995.
    [35] F. M. Wahl, K. Y. Wong, and R. G. Casey, “Block segmentation and text extraction in mixed text/image documents,” Computer Graphics and Image Processing, vol. 20, pp. 375-390, 1982.
    [36] P. Xiu, D. Lopresti, H. Baird, G. Nagy, and E. B. Smith, “Style-Based Ballot Mark Recognition,” in Document Analysis and Recognition(ICDAR), 2009 International Conference on, 2009, pp. 216-220.
    [37] B. Yu and A. K. Jain, “A Generic System for Form Dropout,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 1127-1134, 1996.

    下載圖示
    QR CODE