A system and a method are disclosed for automatic question classification
and answering. A multipart artificial neural network (ANN) comprising a
main ANN and an auxiliary ANN classifies a received question according to
one of a plurality of defined categories. Unlabeled data is received from
a source, such as a plurality of human volunteers. The unlabeled data
comprises additional questions that might be asked of an autonomous
machine such as a humanoid robot, and is used to train the auxiliary ANN
in an unsupervised mode. The unsupervised training can comprise multiple
auxiliary tasks that generate labeled data from the unlabeled data,
thereby learning an underlying structure. Once the auxiliary ANN has
trained, the weights are frozen and transferred to the main ANN. The main
ANN can then be trained using labeled questions. The original question to
be answered is applied to the trained main ANN, which assigns one of the
defined categories. The assigned category is used to map the original
question to a database that most likely contains the appropriate answer.
An object and/or a property within the original question can be
identified and used to formulate a query, using, for example, system
query language (SQL), to search for the answer within the chosen
database. The invention makes efficient use of available information, and
improves training time and error rate relative to use of single part
ANNs.