Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to "topics", latent variables in the PLSA model, and "topics" to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.

 
Web www.patentalert.com

< System and method for predictive ophthalmic correction

< Apparatus and methods for a computer-aided decision-making system

> System and method for examining, calculating the age of an document collection as a measure of time since creation, visualizing, identifying selectively reference those document collections representing current activity

> Robot apparatus, and behavior controlling method for robot apparatus

~ 00298