Automatic charset detection using SIM algorithm with charset grouping

The invention relates, in an embodiment, to a computer-implemented method for automatic charset detection, which includes detecting an encoding scheme of a target document. The method includes training, using a plurality of text document samples, to obtain a set of machine learning models. Training includes using SIM (Similarity Algorithm) to generate the set of machine learning models from feature vectors obtained from the plurality of text document samples. The method also includes applying the set of machine learning models against a set of target document feature vectors converted from the target document to detect the encoding scheme.

Web www.patentalert.com

< Problem solving process based computing

< Semantic network methods to disambiguate natural language meaning

> System and method for automatic design of components in libraries

> Interactive, multi-user media delivery system

~ 00602