The present media editing device generates media including messages in an easy
manner in a communication terminal such as a mobile terminal. Therein, a moving
image data storage part stores moving image data recorded by a user. A region extraction
part extracts any region including the user from the moving image data. A front
determination part detects whether or not the user in the extracted region is facing
the front. A sound detection part detects the presence or absence of a sound signal
of a predetermined level or higher. A frame selection part determines starting
and ending frames based on the results outputted from the front determination part
and the sound detection part. An editing part performs, for example, an image conversion
process by clipping out the media based on thus determined starting and ending
frames. A transmission data storage part stores the resultantly edited media as
transmission data.