Audio-visual selection process for the synthesis of photo-realistic talking-head animations

A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).
Een systeem en een methode om photo-realistic spreken-hoofdanimatie van een tekstinput te produceren gebruiken een audiovisueel proces van de eenheidsselectie. De lip-synchronisatie wordt verkregen door veranderlijk-lengte videoeenheden van het mondgebied optimaal te selecteren en aaneen te schakelen. Het proces van de eenheidsselectie gebruikt de akoestische gegevens om de doelkosten voor de kandidaatbeelden te bepalen en gebruikt de visuele gegevens om de aaneenschakelingskosten te bepalen. Het beeldgegevensbestand wordt voorbereid op een hiërarchische manier, met inbegrip van eigenschappen op hoog niveau (zoals een volledige 3D modellering van de hoofd, geometrische grootte en de positie van elementen) en op pixel-gebaseerde, lage eigenschappen (zoals op APC-Gebaseerde metrisch voor de etikettering van diverse eigenschapbitmaps).

Web www.patentalert.com

< Non-endocrine animal host cells capable of expressing variant proinsulin and processing the same to form active, mature insulin and methods of culturing such cells

< Molecular tags for organic solvent systems

> Respiratory distress syndrome therapy with peptide analogs of human SP-B

> Organic-inorganic hybrid photocurable compositions

~ 00092