A system and method for providing similarity indexing and searching in multi-dimensional
databases. In one aspect, given a set of data points in a multidimensional space,
the values of the data points on each dimension are partitioned into a plurality
of grids, wherein each grid is assigned a grid value. Given a target data point,
similarity candidates (i.e., data points that are similar to the target data point)
are identified based on matching grid values. An inverted grid index comprising
an index on the data points falling into each grid of each dimension is utilized
to identify similarity candidates. A similarity selection process is employed to
select the closest identified similarity candidates for output, which utilizes
a similarity function to measure the closeness of each identified similarity candidate
to the target data point. A preferred similarity function is one that considers
a subset of the dimensions in which a point falls within a similar grid of the
target point. In addition, a correlation effect among the grids in different dimensions
may be a factor captured in the similarity function.