Embodiments of the invention relate to improvements to the support vector
machine (SVM) classification model. When text data is significantly
unbalanced (i.e., positive and negative labeled data are in
disproportion), the classification quality of standard SVM deteriorates.
Embodiments of the invention are directed to a weighted proximal SVM
(WPSVM) model that achieves substantially the same accuracy as the
traditional SVM model while requiring significantly less computational
time. A weighted proximal SVM (WPSVM) model in accordance with
embodiments of the invention may include a weight for each training error
and a method for estimating the weights, which automatically solves the
unbalanced data problem. And, instead of solving the optimization problem
via the KKT (Karush-Kuhn-Tucker) conditions and the
Sherman-Morrison-Woodbury formula, embodiments of the invention use an
iterative algorithm to solve an unconstrained optimization problem, which
makes WPSVM suitable for classifying relatively high dimensional data.