System and method for generating voice pages with included audio files for use in a voice page delivery system page

A content provider system for enabling content providers to create voice pages with audio files included for use in a network for voice page delivery through which subscribers request a voice page and a voice page server system delivers the voice page audibly to the subscriber. A content provider selects a voice page into which the audio file is to be incorporated, selects the audio file and the content provider system then transfers the audio file to a voice page server system which generates a voice page with the audio file included using XML-based tags designated for audio files. The audio files are uploaded from a number of user devices including a telephony device, a web-based system and a PDA.