The techniques and mechanisms described herein are directed to a system
for stylizing video, such as interactively transforming video to a
cartoon-like style. Briefly stated, the techniques include determining a
set of volumetric objects within a video, each volumetric object being a
segment. Mean shift video segmentation may be used for this step. With
that segmentation information, the technique further includes indicating
on a limited number of keyframes of the video how segments should be
merged into a semantic region. Finally, a contiguous volume is created by
interpolating between keyframes by a mean shift constrained interpolation
technique to propagate the semantic regions between keyframes.