Amazon Developing ‘Smart Zoom’ For 4K, 8K Video

Amazon is developing a “smart zoom” feature that could allow viewers of 4K or 8K programming to magnify a particular object featured on a television screen, and track it throughout a TV show or movie.

Mountain View, Calif.-based Charles Benjamin Franklin Waggoner is named as lead inventor on the Amazon patent application, titled, “Object Tracking In Zoomed Video,” which was published on Thursday.

Abstract: A user can select an object represented in video content in order to set a magnification level with respect to that object. A portion of the video frames containing a representation of the object is selected to maintain a presentation size of the representation corresponding to the magnification level. The selection provides for a “smart zoom” feature enabling an object of interest, such as a face of an actor, to be used in selecting an appropriate portion of each frame to magnify, such that the magnification results in a portion of the frame being selected that includes the one or more objects of interest to the user. Pre-generated tracking data can be provided for some objects, which can enable a user to select an object and then have predetermined portion selections and magnifications applied that can provide for a smoother user experience than for dynamically-determined data.

Patent Application

Claims: 

1. A computing system, comprising: at least one processor; a touch-sensitive display; memory including instructions that, when executed by the at least one processor, cause the computing system to: display a first frame of a video on the touch-sensitive display; receive a user input selecting a portion of the first frame of the video that includes a representation of an object, wherein the selected portion is less than the displayed first frame of the video; receive a user input identifying a presentation size for the representation of the object; display the representation of the object at the identified presentation size; determine a size of the representation of the object in a second frame of the video; if the determined size of the representation of the object included in the second frame is different than the identified presentation size of the representation of the object, alter a magnification level of a displayed portion of the second frame that includes the representation of the object; and display at least a portion of the second frame including the representation of the object at the altered magnification level so that the representation of the object is displayed at the presentation size.

2. The computing system of claim 1, wherein the instructions when executed further cause the computing system to: determine a position of the representation of the object in the second frame of the video; and wherein the displayed portion of the second frame of the video presents the representation of the object in approximately a center of the touch-sensitive display.

3. The computing system of claim 2, wherein at least one of the size of the representation of the object or the position of the representation of the object are determined based at least in part on an edge detection algorithm, an object recognition algorithm, a facial recognition algorithm, an image tracking algorithm, a motion detection algorithm, a particle tracking algorithm, a tracking learning detection (TLD) algorithm, or a video codec bit stream motion vector algorithm.

4. The computing system of claim 1, wherein the instructions when executed further cause the computing system to: if it is determined that the size of the representation of the object included in the second frame is larger than the identified presentation size of the representation of the object, display the second frame without altering the magnification level.

5. The computing system of claim 1, wherein the instructions when executed further cause the computing system to: determine a position of the representation of the object in the second frame of the video; and if it is determined that the representation of the object is not included in the second frame of the video, present the second frame of the video without altering the magnification level.

6. The computing system of claim 1, wherein the instructions when executed further cause the computing system to: determine a second portion of the second frame that does not include the representation of the object; and display the second portion of the second frame while displaying the at least a portion of the second frame including the representation of the object at the altered magnification level.

7. A computer-implemented method, comprising: causing video content to be displayed on a display; receiving a selection of a representation of an object in the video content; determining a magnification level for display of the video content based at least in part upon the representation of the object corresponding to the selection; determining a portion of the video content to display, the portion corresponding to the determined magnification level and including the representation of the object; determining a movement of the representation of the object in the video content; and updating the portion of the video content to display in response to the determined movement of the representation of the object in the displayed video content, the updating including at least one of adjusting the magnification level proportionate to a change in a size of the representation of the object or adjusting the portion of the video content to display to keep the representation of the object at approximately a center of the displayed portion of the video content.

8. The computer-implemented method of claim 7, further comprising: sending, to a remote computer system, a request indicating the representation of the object corresponding to the selection; and receiving, from the remote computer system, information for use in determining the portion of the video content to display, the information including at least one of, magnification level, magnification information, or tracking information corresponding to the representation of the object in the video content.

9. The computer-implemented method of claim 7, further comprising: receiving a magnification level corresponding to the representation of the object, wherein determining the magnification level for the video content includes determining a current size of the representation of the object in the video content and applying the received magnification level.

10. The computer-implemented method of claim 7, further comprising: receiving a magnification level input corresponding to the object, wherein the magnification level input is at least one of a touch-based input received at the display, an audio input received from the user, a gaze input detected from a gaze direction of a user, or a gesture input received from the user.

11. The computer-implemented method of claim 10, wherein the audio input received from the user includes an audible command to alter the magnification level of the object.

12. The computer-implemented method of claim 7, further comprising: detecting two inputs at the display, wherein the selection corresponds to initial locations of the two inputs and the magnification level corresponds to a change in a relative location between the two inputs.

13. The computer-implemented method of claim 7, wherein the selection corresponds to at least one of a gaze input detected from a gaze direction of a user, an audible input from the user, touch-based input received at the display, or a gesture input from the user.

14. The computer-implemented method of claim 7, further comprising: determining that information relating to the representation of the object in the video content has been previously generated, the information relating to at least one of a magnification level for displaying the video content or tracking data for tracking the movement of the representation of the object in the video content; and providing, on the display, an indication that the information is available.

15. The computer-implemented method of claim 7, further comprising: applying at least one smoothing process to the portion of the video content to be displayed in order to limit a rate at which the portion of the video content to be displayed can be modified.

16. The computer-implemented method of claim 7, further comprising: enabling a user to share information about the portion of the video content via at least one social network.

17. The computer-implemented method of claim 7, further comprising: collecting data about portions of the video content selected by a plurality of users; and analyzing the collected data to determine one or more portions of the video content that are selected by a defined percentage of the plurality of users.

18. The computer-implemented method of claim 17, wherein the one or more portions of video content that are selected by a defined percentage of the plurality of users and identified to users when viewing the video content.

19. The computer-implemented method of claim 7, wherein the video content is segmented into a plurality of tiles configured to be concurrently displayed, the computer-implemented method further comprising: determining a subset of the plurality of tiles corresponding to the portion; and requesting the determined subset of the tiles for display.

20. The computer-implemented method of claim 7, further comprising: determining a quality level for the portion of the video content to display; and requesting that the portion of the video content to be displayed be delivered at the determined quality level.

21. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor of a computing system, cause the computing system to at least: receive a selection of a representation of an object in a video, the video including a plurality of frames; determine, for at least one of the plurality of frames, a respective portion of the frame to be displayed that includes the selected representation of the object, each respective portion being determined based at least in part upon at least one of a magnification level, a size of the representation of the object in the video, or a relative position of the representation of the object in the video; and display the determined at least one of the plurality of frames without displaying a non-selected portion of the frame.

22. The non-transitory computer-readable storage medium of claim 21, wherein the magnification level is applied to only the representation of the object.

23. The non-transitory computer-readable storage medium of claim 21, wherein the instructions when executed further cause the computing system to at least: determine a change in a size of the representation of the object; and display the representation of the object at the determined size.

24. The non-transitory computer-readable storage medium of claim 21, wherein the instructions when executed further cause the computing system to at least: determine, for each frame of the plurality of frames that does not include the representation of the object, a respective portion of the frame to be displayed.