A system and method are provided for detecting emotional states using
statistics. First, a speech signal is received. At least one acoustic
parameter is extracted from the speech signal. Then statistics or
features from samples of the voice are calculated from extracted speech
parameters. The features serve as inputs to a classifier, which can be a
computer program, a device or both. The classifier assigns at least one
emotional state from a finite number of possible emotional states to the
speech signal. The classifier also estimates the confidence of its
decision. Features that are calculated may include a maximum value of a
fundamental frequency, a standard deviation of the fundamental frequency,
a range of the fundamental frequency, a mean of the fundamental
frequency, and a variety of other statistics.