A Framework for Discovering Variable-length Motifs in Medical Data Streams

In this paper, we explore two key problems in time series motif discovery: releasing the constraints of trivial matching between subsequence with different lengths and improving the time and space efficiency. The purpose of avoiding trivial matching is to avoid too much repetition between subsequence in calculating their similarities. We describe a limited-length enhanced suffix array based framework (LiSAM) to resolve the two problems. We first convert the continuous time series to the discrete time series using the Symbolic Aggregate approXimation procedure, and then introduce two covering relations of the discrete subsequence: α-covering between the instances of LCP (Longest Common Prefix) intervals and β-covering between LCP intervals to support the motif discovery: if an LCP interval is βuncovered, its instances form a motif. The βUncover algorithm of LiSAM identifies the β-uncovered l-intervals, in which we introduce two LCP tabs: presuf and nextsuf to support the identification of the α-uncovered instances of an l-interval. Experimental results on Electrocardiogram signals indicate the accuracy of LiSAM on finding motifs with different lengths.

關鍵字

motif discovery ； suffix array ； time series

國際替代計量

全文下載

主題瀏覽

A Framework for Discovering Variable-length Motifs in Medical Data Streams

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies