An unsupervised adaptation method and apparatus are provided that reduce the
storage
and time requirements associated with adaptation. Under the invention, utterances
are converted into feature vectors, which are decoded to produce a transcript and
alignment unit boundaries for the utterance. Individual alignment units and the
feature vectors associated with those alignment units are then provided to an alignment
function, which aligns the feature vectors with the states of each alignment unit.
Because the alignment is performed within alignment unit boundaries, fewer feature
vectors are used and the time for alignment is reduced. After alignment, the feature
vector dimensions aligned to a state are added to dimension sums that are kept
for that state. After all the states in an utterance have had their sums updated,
the speech signal and the alignment units are deleted. Once sufficient frames of
data have been received to perform adaptive training, the acoustic model is adapted.