Modality information indicating various functions of a multimodal document
reception processing apparatus is transmitted to a multimodal document
transmission apparatus, and a multimodal document, which is generated by
the multimodal document transmission apparatus based on the modality
information is received. A speech synthesis unit synthesizes speech of
text data to be output as speech in the multimodal document, and a speech
output unit outputs the synthesized output speech. A GUT display unit
displays text data to be displayed in the multimodal document.