Sound source separation, without permutation, using convolutional mixing independent
component analysis based on a priori knowledge of the target sound source is disclosed.
The target sound source can be a human speaker. The reconstruction filters used
in the sound source separation take into account the a priori knowledge of the
target sound source, such as an estimate the spectra of the target sound source.
The filters may be generally constructed based on a speech recognition system.
Matching the words of the dictionary of the speech recognition system to a reconstructed
signal indicates whether proper separation has occurred. More specifically, the
filters may be constructed based on a vector quantization codebook of vectors representing
typical sound source patterns. Matching the vectors of the codebook to a reconstructed
signal indicates whether proper separation has occurred. The vectors may be linear
prediction vectors, among others.