Method and apparatus for identifying semantic structures from text page

A method and apparatus for identifying a semantic structure from an input text forms at least two candidate semantic structures. A semantic score is determined for each candidate semantic structure based on the likelihood of the semantic structure. A syntactic score is also determined for each semantic structure based on the position of a word in the text and the position in the semantic structure of a semantic entity formed from the word. The syntactic score and the semantic score are combined to select a semantic structure for at least a portion of the text. In many embodiments, the semantic structure is built incrementally by building and scoring candidate structures for a portion of the text, pruning low scoring candidates, and adding additional semantic elements to the retained candidates.

Een methode en een apparaat om een semantische structuur van een inputtekst te identificeren vormen minstens twee kandidaat semantische structuren. Een semantische score wordt voor elke kandidaat semantische structuur bepaald die op de waarschijnlijkheid van de semantische structuur wordt gebaseerd. Een syntactische score wordt ook voor elke semantische structuur bepaald die op de positie van een woord in de tekst en de positie in de semantische structuur van een semantische gevormde entiteit wordt gebaseerd van het woord. De syntactische score en de semantische score worden gecombineerd om een semantische structuur voor minstens een gedeelte van de tekst te selecteren. In vele belichamingen, wordt de semantische structuur gebouwd oplopend door de bouw van en kandidaatstructuren voor een gedeelte van de tekst te noteren, pruning lage noterende kandidaten, en extra semantische elementen toe te voegen aan de behouden kandidaten.