A content provider system for enabling content providers to create voice pages
with audio files included for use in a network for voice page delivery through
which subscribers request a voice page and a voice page server system delivers
the voice page audibly to the subscriber. A content provider selects a voice page
into which the audio file is to be incorporated, selects the audio file and the
content provider system then transfers the audio file to a voice page server system
which generates a voice page with the audio file included using XML-based tags
designated for audio files. The audio files are uploaded from a number of user
devices including a telephony device, a web-based system and a PDA.