Title

雲端環境上的複合式知識融合與推論

Translated Titles

Hybrid Knowledge Fusion and Inference on Cloud Environment

Authors

張景棠

Key Words

知識融合 ; 知識推論 ; 機率模型 ; Knowledge fusion ; Knowledge inference ; Probabilistic model

PublicationName

臺北大學資訊工程學系學位論文

Volume or Term/Year and Month of Publication

2016年

Academic Degree Category

碩士

Advisor

張玉山;戴志華

Content Language

英文

Chinese Abstract

在大數據的時代,數量驚人的知識無時無刻並且以各式各樣的方法在產生。知識的融合與推論成為了應用知識的重要議題。 在這篇論文中,我們關注於可以表達為關聯式規則的複合式知識(包含類別或是數值的資訊),並且討論在雲端環境上的複合式知識融合與推論的相關問題。我們一共列舉了6個議題並且提出一個有效的解決辦法:HyKFICE(Hybrid Knowledge Fusion and Inference on Cloud Environment)。HyKFICE可以利用機率理論來對已知發生的實情進行知識的融合與推論,來推測未來可能發生的事件及其機率。HyKFICE利用將相似的知識轉換成 3-layer directed bipartite graph來達到雲端上的平行運算。在實驗部份利用真實的資料來展示效能,以及系統的執行時間不只受資料量大小的影響,也會因為資料之間的關聯性而有變化。

English Abstract

In the age of big data, an incredible amount of knowledge is produced everywhere everyday through various ways. Knowledge fusion and inference thus has become an important issue for better utilization of knowledge. In this paper, we focus on the hybrid knowledge that can be represented in the form of association rules (with categorical and/or numerical information), and address the problem of such hybrid knowledge fusion and inference on cloud environment. In light of the use of the problem, we specify six issues of knowledge fusion and inference and propose a HyKFICE (Hybrid Knowledge Fusion and Inference on Cloud Environment) system as an effective solution. HyKFICE is capable of inferring the possibilities of the happening of events at a given condition through knowledge fusion and inference based on the probability theory. HyKFICE can also perform the computation in parallel on clouds by grouping and summing up similar knowledge in 3-layer bipartite graphs. Experiments conducted on real data sets demonstrate the efficiency of HyKFICE and show that it is not only the amount of knowledge but also the associations between knowledge dominating the execution time of the hybrid knowledge fusion and inference.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學系
Reference
  1. [2] Martin Hilbert. Big data for development: From information-to knowledge societies.
    連結:
  2. [3] Molly Engle. Qualitative data analysis: An expanded sourcebook (2nd ed.). The American
    連結:
  3. knowledge discovery in databases. AI Magazine, 17(3):37–54, 1996.
    連結:
  4. [6] Karine Zeitouni. A survey of spatial data mining methods databases and statistics point of
    連結:
  5. [7] Thiago Christiano Silva and Liang Zhao. Machine Learning in Complex Networks.
    連結:
  6. [8] Yu Zheng. Methodologies for cross-domain data fusion: An overview. IEEE Trans. Big
    連結:
  7. Morgan Kaufmann, 2011.
    連結:
  8. [10] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules
    連結:
  9. application in knowledge modelling and inference. In IEEE SMC 2013.
    連結:
  10. [12] Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shao-
    連結:
  11. hua Sun, and Wei Zhang. From data fusion to knowledge fusion. CoRR, abs/1503.00302,
    連結:
  12. bayesian knowledge sources. Int. J. Approx. Reasoning, 52(7):935–947, 2011.
    連結:
  13. knowledge bases - reasoning about uncertainty with temporal constraints. Expert Syst.
    連結:
  14. [15] Ran Yan, Guoqi Li, and Bin Liu. Knowledge fusion based on d-s theory and its application
    連結:
  15. [16] Alexander L. Tulupyev and Sergey I. Nikolenko. Directed cycles in bayesian belief net-
    連結:
  16. works: Probabilistic semantics and consistency checking complexity. In MICAI, 2005.
    連結:
  17. of vision-based vehicle detection with knowledge fusion. In IEEE Proceedings. Intelligent
    連結:
  18. [19] Patrick Delfmann, Sebastian Herwig, and Lukasz Lis. Unified enterprise knowledge rep-
    連結:
  19. resentation with conceptual models - capturing corporate language in naming conventions.
    連結:
  20. [20] Peter E. Midford, Thomas Dececchi, James P. Balhoff, Wasila M. Dahdul, Nizar Ibrahim,
    連結:
  21. Hilmar Lapp, John G. Lundberg, Paula M. Mabee, Paul C. Sereno, Monte Westerfield,
    連結:
  22. fusion framework using graph partitioning. In IEA/AIE, 2003.
    連結:
  23. [22] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy,
    連結:
  24. [23] Nengfu Xie, Wensheng Wang, Bingxian Ma, Xuefu Zhang, Wei Sun, and Fenglei Guo.
    連結:
  25. Research on an agricultural knowledge fusion method for big data. Data Science Journal,
    連結:
  26. [24] Kamal Premaratne, Duminda A. Dewasurendra, and Peter H. Bauer. Evidence combi-
    連結:
  27. nation in an environment with heterogeneous sources. IEEE Trans. Systems, Man, and
    連結:
  28. fuzzy petri nets for knowledge representation and reasoning. IEEE Trans. Systems, Man,
    連結:
  29. [28] Jian-Bo Yang, Jun Liu, Jin Wang, How-Sing Sii, and Hongwei Wang. Belief rule-base
    連結:
  30. inference methodology using the evidential reasoning approach - RIMER. IEEE Trans.
    連結:
  31. ence analysis and adaptive training for belief rule based systems. Expert Syst. Appl.,
    連結:
  32. 38(10):12845–12860, 2011.
    連結:
  33. Dean M. Jones, and Zhan Cui. The KRAFT architecture for knowledge fusion and trans-
    連結:
  34. [31] Xiao Hu, Jie Hu, Aicha Sekhari, Ying-hong Peng, and Zhaomin Cao. A fuzzy knowledge
    連結:
  35. fusion framework for terms conflict resolution in concurrent engineering. Concurrent
    連結:
  36. context-aware decision support systems. CSIMQ, 1:24–41, 2014.
    連結:
  37. [34] James C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer
    連結:
  38. [35] F. Russo and G. Ramponi. Fuzzy methods for multisensor data fusion. IEEE Transactions
    連結:
  39. for generating fuzzy rules from numerical data. Fuzzy Sets and Systems, 86(3):251 – 270,
    連結:
  40. [37] Shigeo Abe and Ming-Shong Lan. A method for fuzzy rules extraction directly from
    連結:
  41. [38] Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald,
    連結:
  42. and Fatma Ozcan. Clash of the titans: Mapreduce vs. spark for large scale data analytics.
    連結:
  43. [39] Li Yunyan and Chen Juan. Application of association rules mining in marketing decision-
    連結:
  44. Keun Ho Ryu. Discovering medical knowledge using association rule mining in young
    連結:
  45. adults with acute myocardial infarction. J. Medical Systems, 2013.
    連結:
  46. [41] Xiaoqing Yu, Huanhuan Liu, Jianhua Shi, Jenq-Neng Hwang, Wanggen Wan, and Jing
    連結:
  47. Lu. Association rule mining of personal hobbies in social networks. In BigData Congress,
    連結:
  48. mining. In Proceedings of the Fourth International Conference on Knowledge Discovery
    連結:
  49. 80–86, 1998.
    連結:
  50. [43] Angela Schwering. Approaches to semantic similarity measurement for geo-spatial data:
    連結:
  51. by measuring possibilistic uncertainty in CBR. Fuzzy Sets and Systems, 160(2):214–230,
    連結:
  52. [45] Eibe Frank, Mark A. Hall, Geoffrey Holmes, Richard Kirkby, and Bernhard Pfahringer.
    連結:
  53. WEKA - A machine learning workbench for data mining. In The Data Mining and Knowl-
    連結:
  54. [46] Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and
    連結:
  55. Ian H. Witten. The WEKA data mining software: an update. SIGKDD Explorations,
    連結:
  56. [1] S Grimes. Big data: Avoid wanna v confusion. InformationWeek.com, 2013.
  57. SSRN 2205145, 2013.
  58. Journal of Evaluation, 20(1):159 – 160, 1999.
  59. [4] John K. Kruschke. Tutorial: Bayesian data analysis. In CogSci, 2015.
  60. [5] Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From data mining to
  61. views. In IRMA, 2000.
  62. Springer, 2016.
  63. Data, 1(1):16–34, 2015.
  64. [9] Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques.
  65. in large databases. In VLDB, 1994.
  66. [11] Khalil AbuDahab, Dong-Ling Xu, and Yu-Wang Chen. Generic expert system and its
  67. 2015.
  68. [13] Eugene Santos Jr., John Thomas Wilkinson, and Eunice E. Santos. Fusing multiple
  69. [14] Eugene Santos Jr., Deqing Li, Eunice E. Santos, and John Korah. Temporal bayesian
  70. Appl., 39(17):12905–12917, 2012.
  71. on expert system for software fault diagnosis. In (PHM), Oct 2015.
  72. [17] Y. Zhu, D. Comaniciu, V. Ramesh, M. Pellkofer, and T. Koehler. An integrated framework
  73. Vehicles Symposium, 2005., pages 199–204, June 2005.
  74. [18] James Llinas, Lauro Snidaro, Jes´us Garc´ıa, and Erik Blasch. Context and Fusion: Defini-
  75. tions, Terminology, pages 3–23. Springer International Publishing, Cham, 2016.
  76. In ICIS, 2009.
  77. Todd J. Vision, and David C. Blackburn. The vertebrate taxonomy ontology: a framework
  78. for reasoning across model organism and species phenotypes. J. Biomedical Semantics,
  79. 4:34, 2013.
  80. [21] Tsung-Ting Kuo, Shian-Shyong Tseng, and Yao-Tsung Lin. Ontology-based knowledge
  81. Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: a web-scale ap-
  82. proach to probabilistic knowledge fusion. In SIGKDD, 2014.
  83. 14, 2015.
  84. Cybernetics, Part A, 2007.
  85. [25] Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data:
  86. The role of source dependence. PVLDB, 2(1):550–561, 2009.
  87. [26] Xin Luna Dong, Barna Saha, and Divesh Srivastava. Less is more: Selecting sources
  88. wisely for integration. PVLDB, 6(2):37–48, 2012.
  89. [27] Hu-Chen Liu, Qing-Lian Lin, Ling-Xiang Mao, and Zhi-Ying Zhang. Dynamic adaptive
  90. and Cybernetics: Systems, 2013.
  91. Systems, Man, and Cybernetics, Part A, 36(2):266–285, 2006.
  92. [29] Yu-Wang Chen, Jian-Bo Yang, Dong-Ling Xu, Zhi-Jie Zhou, and Dawei Tang. Infer-
  93. [30] Alun D. Preece, Kit-ying Hui, W. A. Gray, Philippe Marti, Trevor J. M. Bench-Capon,
  94. formation. Knowl.-Based Syst., 13(2-3):113–120, 2000.
  95. Engineering: R&A, 19(1):71–84, 2011.
  96. [32] Jihong Liu and Bo Li. An ontology-based architecture for service-orientated design
  97. knowledge fusion in group corporation cloud manufacturing. In CSCWD, 2012.
  98. [33] Alexander V. Smirnov and Tatiana Levashova. Knowledge fusion patterns for design of
  99. Academic Publishers, Norwell, MA, USA, 1981.
  100. on Instrumentation and Measurement, 43(2):288–294, Apr 1994.
  101. [36] Ken Nozaki, Hisao Ishibuchi, and Hideo Tanaka. A simple but powerful heuristic method
  102. 1997.
  103. numerical data and its application to pattern classification. IEEE Trans. Fuzzy Systems,
  104. 3(1):18–28, 1995.
  105. PVLDB, 8(13):2110–2121, 2015.
  106. making based on rough set. In ICEE, 2010.
  107. [40] Dong Gyu Lee, Kwang Sun Ryu, Mohamed Ezzeldin A. Bashir, Jang-Whan Bae, and
  108. 2014.
  109. [42] Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule
  110. and Data Mining (KDD-98), New York City, New York, USA, August 27-31, 1998, pages
  111. A survey. Trans. GIS, 12(1):5–29, 2008.
  112. [44] Jos´e M. Ju´arez, Francisco Guil, Jos´e T. Palma, and Roque Mar´ın. Temporal similarity
  113. 2009.
  114. edge Discovery Handbook., pages 1305–1314. 2005.
  115. 11(1):10–18, 2009.