Paired image information and text information correlated to each other are
retrieved as information sets. Frequency information on words used in
text is extracted from text information in a group of information sets,
and text information features are extracted based on frequency
information. Text features are used to lay out information sets in a
virtual space such that similar pieces of text are located close to each
other, and images are displayed at those positions. Further, important
words are extracted from those words extracted from text information in a
group of information sets, and those words are laid out in the virtual
space in the same manner as with information sets and displayed as
labels.