A technique for efficient representation of dependencies between
electronically-stored documents, such as in an enterprise data processing
system. A document distribution path is developed as a directional graph
that is a representation of the historic dependencies between documents,
which is constructed in real time as documents are created. The system
preferably maintains a lossy hierarchical representation of the documents
indexed in such a way that allows for fast queries for similar but not
necessarily equivalent documents. A distribution path, coupled with a
document similarity service, can be used to provide a number of
applications, such as a security solution that is capable of finding and
restricting access to documents that contain information that is similar
to other existing files that are known to contain sensitive information.