A method of determining the time relation between an original or input
speech signal (10) and an output speech signal (15) affected by time
warping in a communications system, such as a VoIP (Voice over Internet
Protocol) system. Wherein corresponding speech bursts (11, 12; 16, 17) of
the input (10) and output speech signal (15) are located in accordance
with a predefined signal property thereof. The corresponding speech
bursts (11, 12; 16, 17) thus located and time aligned (10, 30) for the
correction of continuous and discontinuous warping effects. A performance
estimate is generated by comparing the time aligned input and output
speech signals (10, 30) applying cross-correlation techniques and PSQM
(Perceptual Speech Quality Measure) or PSQM+ (Enhanced Perceptual Speech
Quality Measure) techniques.