The present invention relates to method and system for distinguishing
speech from music in a digital audio signal in real time. A method for
distinguishing speech from music in a digital audio signal in real time
for the sound segments that have been segmented from an input signal of
the digital sound processing systems by means of a segmentation unit on
the base of homogeneity of their properties, comprises the steps of: (a)
framing an input signal into sequence of overlapped frames by a windowing
function; (b) calculating frame spectrum for every frame by FFT
transform; (c) calculating segment harmony measure on base of frame
spectrum sequence; (d) calculating segment noise measure on base of the
frame spectrum sequence; (e) calculating segment tail measure on base of
the frame spectrum sequence; (f) calculating segment drag out measure on
base of the frame spectrum sequence; (g) calculating segment rhythm
measure on base of the frame spectrum sequence; and (h) making the
distinguishing decision based on characteristics calculated.