TY - GEN
T1 - Measuring similarity for multidimensional sequences
AU - Wang, Hui
AU - Lin, Zhiwei
AU - McClean, Sally
AU - Liu, Jun
N1 - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 ; Conference date: 14-12-2010 Through 17-12-2010
PY - 2010
Y1 - 2010
N2 - Multidimensional sequences are common, and measuring their similarity is a key to any analysis of such data. There is a wealth of similarity measures for sequences in the literature, but most of them are designed for a special type of sequence and later extended to more general types. These extensions are usually ad hoc, and the extended versions may lose the original conceptual interpretation of the measure. In this paper we consider the problem of how to measure similarity for the general type of multidimensional sequences effectively in a conceptually uniform way. We show that the subsequence concept behind longest common subsequence and all common subsequences can be extended from the temporal dimension to the spatial dimension, and we generalize the all common subsequences similarity to multidimensional sequences. The hard problem is how to compute the generalized similarity. We present a theorem that combines the temporal and spatial dimensions in a simple formula. This theorem suggests a dynamic programming algorithm to compute the generalized similarity. A preliminary experiment shows that this similarity produces competitive outcomes. However, this approach counts some subsequences multiple times when a sequence has repeated elements. We present a theorem that allows counting of distinct common subsequences.
AB - Multidimensional sequences are common, and measuring their similarity is a key to any analysis of such data. There is a wealth of similarity measures for sequences in the literature, but most of them are designed for a special type of sequence and later extended to more general types. These extensions are usually ad hoc, and the extended versions may lose the original conceptual interpretation of the measure. In this paper we consider the problem of how to measure similarity for the general type of multidimensional sequences effectively in a conceptually uniform way. We show that the subsequence concept behind longest common subsequence and all common subsequences can be extended from the temporal dimension to the spatial dimension, and we generalize the all common subsequences similarity to multidimensional sequences. The hard problem is how to compute the generalized similarity. We present a theorem that combines the temporal and spatial dimensions in a simple formula. This theorem suggests a dynamic programming algorithm to compute the generalized similarity. A preliminary experiment shows that this similarity produces competitive outcomes. However, this approach counts some subsequences multiple times when a sequence has repeated elements. We present a theorem that allows counting of distinct common subsequences.
KW - All common subsequences
KW - Dynamic time warping
KW - Multidimensional sequences
KW - Similarity
KW - The longest common subsequence
U2 - 10.1109/ICDMW.2010.202
DO - 10.1109/ICDMW.2010.202
M3 - Conference contribution
SN - 9780769542577
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 281
EP - 287
BT - Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
ER -