On receipt of a tagged file, as a tagged document, at step S1, a document
processing apparatus at step S2 derives the attribute information for read-out
from tags of the tagged file and embeds the attribute information to generate a
speech read-out file. Then, at step S3, the document processing apparatus
performs processing suited for a speech synthesis engine, using the generated speech
read-out file. At step S4, the document processing apparatus performs processing
depending on the operation by the user through a user interface.