Media and gesture recognition apparatus and methods are disclosed. A
computerized system views a first printed media using an electronic
visual sensor. The system retrieves information corresponding to the
viewed printed media from a database. Using the electronic visual sensor,
the system views at least a first user gesture relative to at least a
portion of the first printed media. The system interprets the gesture as
a command, and based at least in part on the first gesture and the
retrieved information, the system electronically speaks aloud at least a
portion of the retrieved information.