An embedded device for playing media files is capable of generating a play list
of media files based on input speech from a user. It includes an indexer generating
a plurality of speech recognition grammars. According to one aspect of the invention,
the indexer generates speech recognition grammars based on contents of a media
file header of the media file. According to another aspect of the invention, the
indexer generates speech recognition grammars based on categories in a file path
for retrieving the media file to a user location. When a speech recognizer receives
an input speech from a user while in a selection mode, a media file selector compares
the input speech received while in the selection mode to the plurality of speech
recognition grammars, thereby selecting the media file.