  • 學位論文


Maximum Likelihood Estimation for Parametric Interval Symbolic Data

指導教授 : 黃怡婷


在全球資料總量以倍數成長的二十一世紀,創新的統計科學與資訊科技方法蓬勃發展,說明大數據的時代來臨。為處理大量數據與整合數據, Diday (2006) 提出象徵性資料分析 (Symbolic Data Analysis),其資料特徵為每個觀察值表示一個類別或群體,稱為象徵性觀察值 (Symbolic observation) 或概念 (Concept)。因為每個象徵性觀察值包含複數個觀察個體,象徵性資料中變數所涵蓋的是一個較複雜的資料結構。使用參數化方式, Le-Rademacher 與 Billard (2011) 討論區間型象徵性資料與直方圖型象徵性資料參數的最大概似估計方法,但其參數化設定是直接假設區間型象徵性資料服從母數族,而非推論原始資料的參數。假設原始資料服從特定分配,本論文推導出傳統資料轉變成象徵性資料型式的參數分配,利用最大概似估計法來推估原始資料的參數分配,最後以蒙地卡羅模擬討論不同參數分配及不同轉換方式參數估計的表現。


The amount of global data is accumulated dramatically in the past 20 years. Many new developments in statistical science and information technology have been established. It shows that the era of big data is coming. In order to deal with massive data and integrating data, Diday (2006) proposed the symbolic data analysis, where each symbolic object known as a concept might be a category or a group. Since a symbolic object might contain many observations, variables featuring a symbolic object might not be a simple real number and could be an interval and so forth. Under certain parametric assumptions, Le-Rademacher and Billard (2011) discussed the maximum likelihood estimation for interval symbolic data and histogram symbolic data. However, their parametric assumption assumes that the internal variable follows a specific distribution. Normally, the feature of the underlying population is of interest. Instead, this thesis assumes that the variable of interest for the underlying population follows a specific distribution. The distribution of the variable for the symbolic objects is derived. The estimators of parameters are then obtained by the maximum likelihood estimation. Finally, Monte Carlo simulations are used to evaluate the performance of parameter estimates under various parameter situations.


Bock, H-H, Probabilistic Modeling for Symbolic Data. Compstat, 55-65, 2008.
Le-Rademacher J. and Billard, L., Likelihood Functions and Some Maximum Likelihood Estimators for Symbolic Data. Journal of Statistical Planning and Inference, 141:1593-1602, 2011.
Billard L. and Diday E., Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons, England, 2006.
Iris Data Set. Machine Learning Repository, University of California, Irvine, 1980. URL https://archive.ics.uci.edu/ml/datasets/Iris/
Moore, R. E., Kearfott, R. B. and Cloud, M. J., Introduction to Interval Analysis. Society for Industrial and Applied Mathematics, Philadelphia, 2009.
