A communication architecture for delivery of grammar and speech related
information such as text-to-speech (TTS) data to a speech recognition
server operating with a wireless telecommunication system for use with
automatic speech recognition and interactive voice-based applications. In
the invention, a mobile client retrieves a Web page containing
multi-modal content hosted on a origin server via WAP gateway. The
content may include a grammar file and/or TTS strings embedded in the
content or reference URL(s) pointing to their storage locations. The
client then sends the grammar and/or TTS strings to a speech recognition
server via a wireless packet streaming protocol channel. When URL(s) are
received by the client and sent to the SRS, the grammar file and/or TTS
strings are obtained via a high speed HTTP connection. The speech
processing results and the synthesized speech are returned to the client
over the established wireless UDP connection.