Title |
Finding Sequence Clusters: A Shared Near Neighbors Approach |
DOI |
10.6688/JISE.2015.31.5.9 |
Authors |
Jia-Lien Hsu;Tzu-Han Hsiao |
Key Words |
SNN ; MBR ; multi-label clustering ; subsequences clustering ; sequence clustering |
PublicationName |
Journal of Information Science and Engineering |
Volume or Term/Year and Month of Publication |
31卷5期(2015 / 09 / 01) |
Page # |
1647 - 1667 |
Content Language |
英文 |
English Abstract |
Sequence clustering is one of most fundamental topics which can be applied in various research field. Most of previous work on sequence clustering is dedicated to the single- label clustering in which the whole similarity of equal-length sequence is considered and measured by Euclidean distance function. However, intrinsic properties behind sequence demand the multi-label clustering. In addition, the Euclidean distance in the high dimensional space introduce the problem of dimensionality curse. Therefore, in this paper, we employ the concept of shared near neighbors (SNN), for sequence similarity, which will be integrated in the multi-label clustering process. Given a set of sequences, in our approach, we first apply the sliding window technique and the DCT mapping on sequences to obtain feature vectors. Those feature vectors, associated with the SNN similarity, are further grouped by applying the graph-based clustering and the hierarchical clustering, respectively. We also design a validity measure and perform experiments to show the efficiency and effectiveness of our approach. Meanwhile, those feature vectors are also approximated by the minimum bounding rectangles (MBR). Due to the less amount of MBRs, compared to all feature vectors, the computational complexity can be reduced accordingly without compromising clustering validity. |
Topic Category |
基礎與應用科學 >
資訊科學 |
Times Cited |