Disney Eyes New Approach To Multiscreen Video Retargeting

ESPN and ABC parent Walt Disney Co. has developed technology that could help it improve the quality of video it streams to devices ranging from smartphones to IP-connected TVs, according to a patent published on Tuesday.

Pierre Greisen
Pierre Greisen

Former Disney lab associate Pierre Greisen is named as lead inventor on the patent, titled, “Video retargeting using content-dependent scaling vectors.” Disney Enterprises filed the application in October 2012. 

Abstract: Techniques are disclosed for retargeting images. The techniques include receiving one or more input images, computing a two-dimensional saliency map based on the input images in order to determine one or more visually important features associated with the input images, projecting the saliency map horizontally and vertically to create at least one of a horizontal and vertical saliency profile, and scaling at least one of the horizontal and vertical saliency profiles. The techniques further include creating an output image based on the scaled saliency profiles. Low saliency areas are scaled non-uniformly while high saliency areas are scaled uniformly. Temporal stability is achieved by filtering the horizontal resampling pattern and the vertical resampling pattern over time. Image retargeting is achieved with greater efficiency and lower compute power, resulting in a retargeting architecture that may be implemented in a circuit suitable for mobile applications such as mobile phones and tablet computers.

Related articles:
Disney Building Own Dynamic Ad Insertion System
Disney / ABC Developing Targeted Broadcast Overlays
Disney Bets On Visual Search
Disney Buys ‘Eye Vision’ Instant Replay Invention
Disney trademarks D-BEACON for entertainment, broadcasting, wireless services
DreamWorks Trademarks MF.MOBILE for Streaming Video Service

Patent

Claims:

1. A computer-implemented method for retargeting an input image, the computer-implemented method comprising: receiving the input image comprising a plurality of pixels; computing a two-dimensional saliency map based on at least a portion of the input image; projecting the two-dimensional saliency map in at least one of a first direction to create a horizontal saliency profile and a second direction to create a vertical saliency profile; scaling at least one of the horizontal saliency profile and the vertical saliency profile non-linearly to create at least one of a scaled horizontal saliency profile and a scaled vertical saliency profile; computing a two-dimensional resampling grid based on at least one of the scaled horizontal saliency profile and the scaled vertical saliency profile; and creating an output image based on the two-dimensional resampling grid.

2. The computer-implemented method of claim 1, wherein the scaling comprises: applying spatiotemporal filtering to the at least one of the scaled horizontal saliency profile and the scaled vertical saliency profile to create at least one of a filtered horizontal saliency profile and a filtered vertical saliency profile.

3. The computer-implemented method of claim 2, wherein applying spatiotemporal filtering comprises adapting one or more filter parameters based on the existence of a scene cut, a global camera motion, or a static background.

4. The computer-implemented method of claim 1, wherein creating the output image comprises: using an elliptical-weighted-average splatting technique to map the input image, based on the two-dimensional resampling grid, in order to generate the output image.

5. The computer-implemented method of claim 1, wherein computing the two-dimensional saliency map comprises: applying a quaternion Fourier transform to the at least a portion of the input image; and applying a two-dimensional Gaussian blur to the at least a portion of the input image subsequent to applying the quaternion Fourier transform.

6. The computer-implemented method of claim 1, further comprising: downsampling the at least a portion of the input image; and upsampling the at least one of the horizontal saliency profile and the vertical saliency profile.

7. A non-transitory computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform an operation to retarget an input image, the operation comprising: receiving the input image comprising a plurality of pixels; computing a two-dimensional saliency map based on at least a portion of the input image; projecting the two-dimensional saliency map in at least one of a first direction to create a horizontal saliency profile and a second direction to create a vertical saliency profile; scaling at least one of the horizontal saliency profile and the vertical saliency profile non-linearly to create at least one of a scaled horizontal saliency profile and a scaled vertical saliency profile; computing a two-dimensional resampling grid based on at least one of the scaled horizontal saliency profile and the scaled vertical saliency profile; and creating an output image based on the two-dimensional resampling grid.

8. The non-transitory computer-readable storage medium of claim 7, wherein the scaling comprises: applying spatiotemporal filtering to the at least one of the scaled horizontal saliency profile and the scaled vertical saliency profile to create at least one of a filtered horizontal saliency profile and a filtered vertical saliency profile.

9. The non-transitory computer-readable storage medium of claim 8, wherein applying spatiotemporal filtering comprises adapting one or more filter parameters based on the existence of a scene cut, a global camera motion, or a static background.

10. The non-transitory computer-readable storage medium of claim 7, wherein creating the output image comprises: using an elliptical-weighted-average splatting technique to map the input image, based on the two-dimensional resampling grid, in order to generate the output image.

11. The non-transitory computer-readable storage medium of claim 7, wherein computing the two-dimensional saliency map comprises: applying a quaternion Fourier transform to the at least a portion of the input image; and applying a two-dimensional Gaussian blur to the at least a portion of the input image subsequent to applying the quaternion Fourier transform.

12. The non-transitory computer-readable storage medium of claim 7, wherein the operation further comprises: downsampling the at least a portion of the input image; and upsampling the at least one of the horizontal saliency profile and the vertical saliency profile.

13. A computing system, comprising: a memory that is configured to store instructions for a program; and a processor that is configured to execute the instructions for the program to retarget an input image, by performing an operation that includes: receiving the input image comprising a plurality of pixels; computing a two-dimensional saliency map based on at least a portion of the input image; projecting the two-dimensional saliency map in at least one of a first direction to create a horizontal saliency profile and a second direction to create a vertical saliency profile; scaling at least one of the horizontal saliency profile and the vertical saliency profile non-linearly to create at least one of a scaled horizontal saliency profile and a scaled vertical saliency profile; computing a two-dimensional resampling grid based on at least one of the scaled horizontal saliency profile and the scaled vertical saliency profile; and creating an output image based on the two-dimensional resampling grid.

14. The system of claim 13, wherein the scaling comprises: applying spatiotemporal filtering to the scaled, horizontal and vertical saliency profiles in order to create a filtered horizontal saliency profile and a filtered vertical saliency profile; wherein creating the output image comprises: using an elliptical-weighted-average splatting technique to map the input image, based on the two-dimensional resampling grid, in order to generate the output image; wherein applying spatiotemporal filtering comprises adapting one or more filter parameters based on the existence of each of: (i) a scene cut; (ii) a global camera motion; and (iii) a static background.

15. The system of claim 14, wherein computing the two-dimensional saliency map comprises: applying a quaternion Fourier transform to the at least a portion of the input image; applying a two-dimensional Gaussian blur to the at least a portion of the input image subsequent to applying the quaternion Fourier transform; and applying a quaternion Fourier transform to at least a portion of one or more previous input images; wherein the operation further comprises: downsampling the at least a portion of the input image; and upsampling the at least one of the horizontal saliency profile and the vertical saliency profile.

16. The system of claim 15, wherein the two-dimensional saliency map comprises a plurality of distinct saliency values, wherein each saliency value of the plurality of saliency values represents relative importance of a corresponding pixel in the at least a portion of the input image; wherein the at least a portion of the input image comprises a plurality of rows within the input image, wherein each row within the plurality of rows spans the width of the input image; wherein the operation further comprises: applying a center bias to the horizontal saliency profile and the vertical saliency profile by reducing each saliency value of the plurality of saliency values where the corresponding pixel is near one or more borders of the input image; wherein the program comprises a video retargeting program that includes a saliency estimation unit, a scale estimation unit, and a rendering unit; wherein the saliency estimation unit includes a downsampling unit, a quaternion transformation unit, a plurality of fast Fourier transform cores, a Gaussian kernel, and a Gaussian storage buffer.

17. The system of claim 16, wherein the two-dimensional saliency map is projected in each of: (i) the first direction to create the horizontal saliency profile and (ii) the second direction to create a vertical saliency profile; wherein the horizontal saliency profile comprises a horizontal saliency vector, wherein the vertical saliency profile comprises a vertical saliency vector; wherein the scaled horizontal saliency profile comprises a horizontal scaling vector, wherein the scaled vertical saliency profile comprises a vertical scaling vector; wherein the output image is created based on the scaled, horizontal and vertical saliency profiles; wherein by using the scaled, horizontal and vertical saliency profiles, the input image is retargeted more efficiently than using linear scaling and more efficiently than using letter-boxing; wherein the scale estimation unit includes a max-block-mean unit, a non-linear scaling unit, a spatiotemporal filter, and an upsampling scaling unit; wherein the rendering unit includes a grid generator component, an elliptical weighted average setup unit, a bounding box stepping unit, a rasterizer component, an accumulation buffer, and an output normalization unit.

18. The system of claim 17, wherein the horizontal saliency profile is given by a first predefined equation comprising: .function..times..di-elect cons..times..function. ##EQU00008## wherein the operation further includes at least one of: minifying a first region of the input image based on a first horizontal scaling vector given by a second predefined equation comprising: .function..alpha..function..alpha..times..times..function..times..functio- n..alpha. ##EQU00009## magnifying a second region of the input image based on a second horizontal scaling vector given by a third predefined equation comprising: .function..alpha..function..alpha..times..times..function..times..functio- n..alpha. ##EQU00010##

19. The system of claim 18, wherein the spatiotemporal filtering includes applying: (i) an acausal finite impulse response filter in order to filter high-frequency saliency fluctuations, followed by (ii) an infinite impulse response filter in order to filter low-frequency saliency fluctuations; wherein the infinite impulse response filter is applied according to a fourth predefined equation comprising: s.sub.out[k]=as.sub.out[k-1]+(1-a)s.sub.in[k],0.ltoreq.a.ltoreq.1 Equation 4 wherein the input image is retargeted using elliptical-weighted-average splatting performed according to a fifth predefined equation comprising: .function..times..di-elect cons..times..times..times..pi..times..SIGMA..times.e.times..function..tim- es..SIGMA..function..function. ##EQU00011## wherein the elliptical-weighted-average splatting includes adaptive anti-aliasing performed according to a sixth predefined equation comprising: {tilde over (.SIGMA.)}(n,n)=max(.sigma..sub.a.sup.2,.sigma..sub.l.sup.2J.sub.k.s- up.2(n,n)),n=1, 2 wherein the fifth predefined equation is simplifiable to a seventh predefined equation comprising: .function..di-elect cons..times..times.e.times..SIGMA..SIGMA. ##EQU00012## wherein C.sub.k is given by an eighth predefined equation comprising: .times..function..function..times..function..function..times..pi..sigma. ##EQU00013##

20. The system of claim 13, wherein the operation further includes at least one of: upon determining that a first region of the input image satisfies a criterion pertaining to visual importance, minifying the first region based on a first horizontal scaling vector; and upon determining that a second region of the input image satisfies the criterion pertaining to visual importance, magnifying the second region based on a second horizontal scaling vector; wherein the operation further includes: upon determining that a third region of the input image satisfies the criterion pertaining to visual importance, preserving a size of the third region in the output image and relative to the input image; wherein the first, second, and third regions are distinct regions in the input image.