A method is provided for customizing a multi-media message created by a
sender for a recipient, in which the multi-media message includes an
animated entity audibly presenting speech converted from text by the
sender. At least one image is received from the sender. Each of the at
least one image is associated with a tag. The sender is presented with
options to insert the tag associated with one of the at least one image
into the sender text.