Disney: Voice Search Helps Viewers Find Best Movie Scenes

Walt Disney Co. details in a patent applicaiton published today how viewers could quicly navigate to the most compelling action or love scenes in a movie by uttering voice commands.

“For example, the user may initiate a voice-based search requesting love scenes in a movie between a character and the name of the actress portraying the character’s love interest. Search results may therefore display an icon representative of the scene(s) relevant to the voice-based search at the forefront. Additionally, related scenes such as action scenes involving the character and the actress may be presented in the background as another representative icon,” Disney states in the patent application.

The invention is something that could work on both a cable TV system or an over-the-top video service, Disney notes.

Disney Senior Software Engineer Jing Wang is named as lead inventor on the patent application, titled, “Voice searching metadata through media content.”

Abstract: Systems and methods for voice searching media content based on metadata or subtitles are provided. Metadata associated with media content can be pre-processed at a media server. Upon receiving a vocal command representative of a search for an aspect of the media content, the media server performs a search for one or more portions of the media content relevant to the aspect of the media content being searched for. The media performs the search by matching the aspect of the media content being searched for with the pre-processed metadata.

Patent Application

Related articles:
Verizon Pushes Natural UI, Improved EPG
Comcast Designs Charging Cradle for Voice Remote
Amazon Parental Control System Relies On Voice, Facial, Body Scans
Cablevision Optimum App Gets Voice Search
Nuance Wins Patent For Voice-Powered Program Guides

Claims: 

1. A computer-implemented method, comprising: receiving vocal input from a user device; searching for at least one portion of media content based on the vocal user input; and providing access to the at least one portion of the media content via the user device.

2. The computer-implemented method of claim 1, wherein the vocal input comprises a search command containing one or more indicia representative of at least one aspect of the at least one portion of the media content.

3. The computer-implemented method of claim 2, wherein the searching for the at least one portion of the media content comprises matching the one or more indicia to one or more pieces of metadata associated with the media content.

4. The computer-implemented method of claim 3, wherein the one or more pieces of metadata are generated by at least one of an originator of the media content and at least one consumer of the media content.

5. The computer-implemented method of claim 1, wherein the searching for the at least one portion of the media content comprises a hierarchical search.

6. The computer-implemented method of claim 1, wherein the media content comprises a movie, and wherein the at least one portion of the media content comprises at least one scene from the movie or group of pictures (GOP) from the movie.

7. The computer-implemented method of claim 6, wherein the at least one scene or GOP is presented in conjunction with additional scenes or GOPs from the movie or from additional thematically-related media content.

8. The computer-implemented method of claim 7, wherein the at last one scene or GOP is presented in conjunction with the additional scenes or GOPs commensurate with the relevance of the at least one scene or GOP to the vocal user input relative to the additional scenes or GOPs from the movie or the additional thematically-related media content.

9. The computer-implemented method of claim 1, wherein the providing of the access comprises presenting visual indicators representative of the at least one portion of the media content via a media player implemented on the user device. a) The computer-implemented method of claim 9, wherein the visual indicators comprise a heat map based on relevancy of the at least one portion of the media content to the received vocal input.

10. The computer-implemented method of claim 9, wherein the visual indicators comprise thumbnail images.

11. An apparatus, comprising: a content database containing one or more media content files; a speech recognition engine configured to recognize voice commands representative of a search for at least one portion of the one or more media content files; and a search engine configured to search for the at least one portion of the one or more media content files based on the recognized voice commands.

12. The apparatus of claim 12, wherein the speech recognition engine comprises a speech-to-text engine.

13. The apparatus of claim 12, wherein the search engine searches for the at least one portion of the one or more media content files by searching for a metadata match to indicia representative of at least one aspect of the at least one portion of the one or more media content files, the indicia being determined by the translation of the voice commands.

14. The apparatus of claim 12, wherein the search engine is further configured to provide access to the at least one portion of the one or more media content files via a user device remotely located from the apparatus.

15. A device, comprising: a processor; and a memory including computer program code, the memory and the computer program code configured to, with the processor, cause the device to perform at least the following: display a user interface adapted to receive vocal input requesting a search for one or more portions of media content; forward the vocal input to a media content server configured to perform the search for the one or more portions of media content; and receive search results from the media content server for presentation on the device, wherein the search results are presented in a manner commensurate with the level of relevance to the vocal input.

16. The device of claim 16, wherein the vocal input comprises at least one keyword.

17. The device of claim 17, wherein the search results are obtained by attempting to match the at least one keyword with at least one instance of metadata associated with the media content.

18. The device of claim 18, wherein the at least one instance of metadata is associated with at least one thematic aspect of the media content.

19. The device of claim 19, wherein the search results are presented in conjunction with related portions of additional media content determined to be thematically relevant to the search results.