Finding Sequence Clusters: A Shared Near Neighbors Approach




Jia-Lien Hsu;Tzu-Han Hsiao

Key Words

SNN ; MBR ; multi-label clustering ; subsequences clustering ; sequence clustering


Journal of Information Science and Engineering

Volume or Term/Year and Month of Publication

31卷5期(2015 / 09 / 01)

Page #

1647 - 1667

Content Language


English Abstract

Sequence clustering is one of most fundamental topics which can be applied in various research field. Most of previous work on sequence clustering is dedicated to the single- label clustering in which the whole similarity of equal-length sequence is considered and measured by Euclidean distance function. However, intrinsic properties behind sequence demand the multi-label clustering. In addition, the Euclidean distance in the high dimensional space introduce the problem of dimensionality curse. Therefore, in this paper, we employ the concept of shared near neighbors (SNN), for sequence similarity, which will be integrated in the multi-label clustering process. Given a set of sequences, in our approach, we first apply the sliding window technique and the DCT mapping on sequences to obtain feature vectors. Those feature vectors, associated with the SNN similarity, are further grouped by applying the graph-based clustering and the hierarchical clustering, respectively. We also design a validity measure and perform experiments to show the efficiency and effectiveness of our approach. Meanwhile, those feature vectors are also approximated by the minimum bounding rectangles (MBR). Due to the less amount of MBRs, compared to all feature vectors, the computational complexity can be reduced accordingly without compromising clustering validity.

Topic Category 基礎與應用科學 > 資訊科學
Times Cited
  1. 王加元(2008)。對於閱讀的感興趣程度與眼動特徵關係之研究。政治大學資訊科學學系學位論文。2008。1-51。