透過您的圖書館登入
IP:3.144.237.77
  • 學位論文

Center-based clustering with the string data

Center-based clustering with the string data

指導教授 : 曾富祥
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


分群已在許多研究之中被廣泛的討論與應用,其目的為達到群中的資料相似度最大;群與群之間的資料相似度最小。而應用在分群的資料型態有很多種,目前應用較頻繁為數值型的資料,而字串型資料則是較少被討論與使用的資料型態,但字串型資料卻常以不同的方式出現在我們的生活中,例如產品的生產流程、零件的維修程序與疾病發生的徵兆順序等等。與其他資料型態相比,字串型資料必須多考量順序的問題,因此在本研究中我們將針對此種資料型態提出可行的分群方式。 在過去對數值型或類別型資料做集群分析時,資料大多具有相同的維度,這種情形下的資料已被諸多學者以完整的定義其分群過程。而在字串型資料中卻包含了各種維度不一的資料,亦即資料長度不相等,如生產產品1依序需經過機台A、B與C,而生產產品2須經過機台B、C、D與A,因此如何在不影響順序的前提下衡量字串型資料的相似度為重要的課題。在本研究中我們採用的Edit distance 與 Simple matching distance兩種方法來衡量資料的相似度。目前針對字串型資料的分群方式,大多使用階層式方法針對字串型資料進行分群,如Tian et al (1996), Dinu and Sgarro (2006)與Tseng(2013)。而在本文中將以非階層式方法作為分群基礎,藉由找到集群中的中心點,來衡量字串型資料的相似度。 在非階層式分群中有很多學者提出了很多不同的演算法,以中心點為基準的演算法相較之下更有效率,因此研究過程將以非階層式中K-mean與K-mode兩方法center的概念,來做我們建立模型的基礎,因其各有部分優點,使我們可以達成建立分群的目標。

並列摘要


The clustering has been studied and applied in many researches in the past. In the goal of the similarities between objects in the same clustering are high while the similarities between objects in different clustering are low. In the clustering have lot of data type, but the most be used is numerical data type. Until now the string data type haven’t been conducted into the development, but it contain the enormous potential for application, such as parts repair processes, products manufacturing processes and disease signs occurrence of order etc. Compared with other data types, the string data type have two inevitable elements need to be considered, that are the character and order. Therefore, in this study we will propose a viable method for clustering with string data. In the past of research, most studies focus on dealing the object with same dimensionality. Having same dimensional has been complete defined clustering process by many scholars. But in string data most the objects with different dimensionality, which is the length of objects are not equal. For example, if product 1 process through the machine A, B and C and product 2 process through the machine B, C, D and A. How to measuring the similarity does not affect the order of the string data, that is an important issue. In our study, we apply the Edit distance and Simple matching distance measuring dissimilarity with string data. At present mostly using hierarchical clustering method to deal with the string data, such as Tian et al. (1996), Dinu and Sgarro (2006), and Tseng (2013). But in our study, we have been reported based on the non-hierarchical clustering to deal with the string data. Compared to other type of clustering algorithms, center-based algorithms are very efficient for clustering. So, we proposed the new model combining the concept of K-means and K-modes. Let us establish the goal of clustering for string data.

參考文獻


1. Akutsu, T., “A relation between edit distance for ordered trees and edit distance for Euler strings”, Information Processing Letters, vol. 100, pp105-109, 2006.
2. Altuntas, S., Selim, H., “Facility layout using weighted association rule-based data mining algorithms: Evaluation with simulation”, Expert system with applications, vol.39, pp.3-13, 2012.
4. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Perez, J. M., Perona, I., “An extensive comparative study of cluster validity indices”, Pattern Recognition, vol. 46, pp.243-256, 2013.
6. Berry, J. A., Linoff, G., Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, second edition, Wiley Publishing, Inc., Indiana, 2004.
9. Chen C.H., Lan, G.C., T.p., Lin Y.K., “Mining high coherent association rules with consideration of support”, Expert system with applications, vol.40, pp.6531-6537, 2013.

被引用紀錄


蔡運生(2011)。利用資料探勘技術分析WIFLY用戶通路移轉分析〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2011.01203
潘皚亦(2010)。應用e-CRM觀點對顧客滿意度與顧客忠誠度影響之研究-以花旗銀行為例〔碩士論文,長榮大學〕。華藝線上圖書館。https://doi.org/10.6833/CJCU.2010.00124
蔡佳璇(2007)。運用資料倉儲支援顧客價值決策---以南山人壽為例〔碩士論文,崑山科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0025-0306200810422698
蕭聖霖(2009)。量測回饋系統之設計開發-以輪胎製造業為例〔碩士論文,元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-1601200901001900