A unified web-based voice messaging system provides voice application control
between
a web browser and an application server via an hypertext transport protocol (HTTP)
connection on an Internet Protocol (IP) network. The web browser receives an HTML
page from the application server having an XML element that defines data for an
audio operation to be performed by an executable audio resource. The application
server executes the voice-enabled web application by runtime execution of extensible
markup language (XML) documents that define the voice-enabled web application to
be executed. The application server includes a runtime environment that establishes
an efficient, high-speed connection to a web server. The application server, in
response to receiving a user request from a user, accesses a selected XML page
that defines at least a part of the voice application to be executed for the user.
The XML page may describe any one of a user interface such as dynamic generation
of a menu of options or a prompt for a password, an application logic operation,
or a function capability such as generating a function call to an external resource.
The application server then parses the XML page, and executes the operation described
by the XML page, for example dynamically generating an HTML page having voice application
control content, or fetching another XML page to continue application processing.
In addition, the application server may access an XML page that stores application
state information, enabling the application server to be state-aware relative to
the user interaction. Hence, the XML page, which can be written using a conventional
editor or word processor, defines the application to be executed by the application
server within the runtime environment, enabling voice enabled web applications
to be generated and executed without the necessity of programming language environments.