Disclosed herein is an apparatus and method for similarity searches using
hyper-rectangle based multidimensional data segmentation. The similarity
search apparatus has MBR generation means, first sequence pruning means,
second sequence pruning means, and subsequence finding means. The MBR
generation means segments a multidimensional data sequence to be
partitioned into subsequences, and represents each subsequence by each
Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated
from the multidimensional data sequence, and the MBR sets are stored in a
database. The first sequence pruning means prunes irrelevant data
sequences using a distance D.sub.mbr between MBRs extracted from an
inputted query sequence and the MBR sets stored in the database in a
multidimensional Euclidean space. The second sequence pruning means prunes
irrelevant data sequences using a normalized distance D.sub.norm between
MBRs extracted from the query sequence and the MBR sets of data sequences
remaining after the data sequences are pruned in a multidimensional
Euclidean space. The subsequence finding means detects subsequences
similar to the given query sequence by obtaining sets of points contained
in MBRs involved in a calculation of the distance D.sub.norm from each
sequence obtained using the distance D.sub.norm.