An apparatus and method for efficiently constructing learning data
required in statistical methodology used in information retrieval,
information extraction, translation, natural language processing, etc.
are provided. The method includes the steps of: generating learning
models by performing machine learning with respect to learning data;
attaching tags to a raw corpus automatically by using the generated
learning models to thereby generate learning data candidates; calculating
confidence scores of the generated learning data candidates, and then
selecting a learning data candidate using the confidence scores; and
allowing a user to correct an error in the selected learning data
candidate through an interface and adding the error-corrected learning
data candidate to the learning data, thereby adding new learning models
incrementally.