A computer method for representing a natural-language document in a vector form suitable for text manipulation operations is disclosed. The method involves determining (a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged word groups in the document, a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term, and optionally related to the inverse document frequency of that word in one or more libraries of texts. Also disclosed are a computer-readable code for carrying out the method, a computer system that employs the code, and a vector produced by the method.

 
Web www.patentalert.com

< Monitoring cardiac geometry for diagnostics and therapy

< Automated microscopic image acquisition, compositing, and display

> Subcutaneous electrode for transthoracic conduction with low-profile installation appendage and method of doing same

> Utilization of wolinella succinogenes asparaginase to treat diseases associated with asparagine dependence

~ 00241