An improved template spotting technique may be implemented as part of text dependent
speaker verification system to authenticate a user of a wireless communication
device. This technique may be suitable for use in noisy environments and for wireless
communication devices with limited processing power. Endpoints of a test utterance
are identified by first computing local distances between test frames and a target
template. Accumulated distances are then computed from the local distances. Endpoints
of the utterance may be identified when one or more of the accumulated distances
is below a predetermined threshold. Once endpoints of a test utterance are identified,
a dynamic time warp (DTW) process may be used to determine whether the test utterance
matches a training template. One embodiment of the present invention aligns multiple
training templates to reduce the probability of failing to verify the identity
of a speaker that should have been properly verified.