The system is designed to interface with external devices and services, to
transcribe audio that may be stored elsewhere such as a wireless
phone'voice mail, or occurring between two or more parties such as a
conference call. An audio stream is separated into many audio shreds,
each of which has duration of only a few seconds and cannot reveal the
context of the conversation. A workforce of geographically distributed
transcription agents who transcribe the audio shreds is able to generate
transcription in real time, with many agents working in parallel on a
single conversation. No one agent (or group of agents) receives a
sufficient number of audio shreds to reconstruct the context of any
conversation. The use of human transcribers allows the system to overcome
limitations typical of computer-based speech recognition and permits
accurate transcription of general-quality speech even in acoustically
hostile environments.