Image features are generated by performing wavelet transformations at
sample points on images stored in electronic form. Multiple wavelet
transformations at a point are combined to form an image feature vector.
A prototypical set of feature vectors, or atoms, is derived from the set
of feature vectors to form an "atomic vocabulary." The prototypical
feature vectors are derived using a vector quantization method, e.g.,
using neural network self-organization techniques, in which a vector
quantization network is also generated. The atomic vocabulary is used to
define new images. Meaning is established between atoms in the atomic
vocabulary. High-dimensional context vectors are assigned to each atom.
The context vectors are then trained as a function of the proximity and
co-occurrence of each atom to other atoms in the image. After training,
the context vectors associated with the atoms that comprise an image are
combined to form a summary vector for the image. Images are retrieved
using a number of query methods, e.g., images, image portions, vocabulary
atoms, index terms. The user's query is converted into a query context
vector. A dot product is calculated between the query vector and the
summary vectors to locate images having the closest meaning. The
invention is also applicable to video or temporally related images, and
can also be used in conjunction with other context vector data domains
such as text or audio, thereby linking images to such data domains.