A partition-based high dimensional similarity join method allowing
similarity to be efficiently measured by beforehand dynamically selecting
space partitioning dimensions and the number of the partitioning
dimensions using a dimension selection algorithm. A method of efficiently
performing similarity join for high dimensional data during a relatively
short period of time without requiring massive storage space. The method
includes according to the present invention comprises the steps of
partitioning a high dimensional data space and performing joins between
predetermined data sets. Dimensions for use in partitioning the high
dimensional data space and the number of partitioning dimensions are
determined in advance before the space partitioning, and the joins are
performed only when respective cells of the data sets are overlapping
with each other or are neighboring each other.