Method and apparatus for reconstructing new documents from a group of old
ones by removing the existing redundant information. Redundant information (images,
text paragraphs) from retrieved multimedia documents is removed. Each document
consists of two main parts stored in different databases. The first part of a document
represents text paragraphs, the second part consists of the images and drawings
related with the text paragraphs. An information reduction methodology examines
first the text paragraphs of each document related with a specific topic, and removes
the redundant information, such as same or similar paragraphs, by keeping pointers
useful for a future reconstruction of the original documents. The remaining text
paragraphs and the set of points are used to compose the first version of a new
document. The invention also examines all the images related with the set of original
documents and removes the same or similar images while keeping pointers that could
assist a future reconstruction of the original documents. The invention merges
text-paragraphs and images and creates the first stage new document.