Cisco Machine Sees Viewer’s Future

Cisco may be looking to supply content owners and distributors with household “guidebooks” that can identify which family member consumes content on any screen, and predict what a viewer will watch next.

Relying on machine learning, the system could generate viewer profiles such as “ESPN_Fan” and “Greys_Anatomy_Fan” by monitoring channel surfing behavior, and hunting for patterns that emerge.

“That surfing activity might also be subjected to further analysis, as patterns in surfing may help identify individuals (i.e. repeating sequence of channels surfed at times of day, week etc.). Sustained viewing information is extracted from the collected raw data, and is aggregated per household,” Cisco states in a patent application published on Thursday.

Srinivasan
Srinivasan

Bangalore-based Cisco data scientist Prabhakar Srinivasan is named as lead inventor on the patent application, titled, “Audience Segmentation Using Machine-Learning.”

Abstract: A method and system for audience segmentation is described, the method and system including preparing a plurality of guidebooks of prior probability distributions for content items and user profile attributes, the prior probabilities and user profile attributes being extractable from within audience measurement data, receiving raw audience measurement data, analyzing, at a processor, the received raw audience measurement data using the prepared plurality of guidebooks, generating a plurality of clusters of data per user household as a result of the analyzing, correlating viewing activity to each cluster within an identified household, predicting a profile of a viewer corresponding to each cluster within the identified household, applying classifier rules in order to assign viewing preference tags to each predicted profile, and assigning each predicted profile viewing preferences based on the viewing preference tags assigned to that profile Related systems, methods, and apparatus are also described.

Patent Application

Claims:

1. A method for audience segmentation, the method comprising: preparing a plurality of guidebooks of prior probability distributions for content items and user profile attributes, the prior probabilities distributions and user profile attributes being extractable from within audience measurement data; receiving raw audience measurement data; analyzing, at a processor, the received raw audience measurement data using the prepared plurality of guidebooks; generating a plurality of clusters of data per user household as a result of the analyzing; correlating viewing activity to each cluster within an identified household; predicting a profile of a viewer corresponding to each cluster within the identified household; applying classifier rules in order to assign viewing preference tags to each predicted profile; and assigning each predicted profile viewing preferences based on the viewing preference tags assigned to that profile.

2. The method according to claim 1 wherein the guidebooks comprise at least: a guidebook comprising prior probabilities per viewer attribute; a guidebook comprising an assignment of viewer preference tags to individual users; and a guidebook comprising a list of probabilities of family types.

3. The method according to claim 1 wherein the generating a plurality of clusters of data per user household comprises: receiving the raw audience measurement data; extracting data concerning viewer habits; sorting the extracted data into categorical data and numerical data; transforming the sorted data into a high-dimensional vector representation of the raw data; detecting outliers in the high-dimensional vector representation; eliminating outliers from the high-dimensional vector representation; and correlating the high-dimensional vector representation into clusters of individuals per household.

4. The method according to claim 3 wherein data concerning the viewer habits comprises: viewing activity; content metadata; user data; user interface navigation data; and frequency response data.

5. The method according to claim 1 wherein the raw audience measurement data comprises, at least in part, collected viewing records of which content was consumed on devices associated with members of a household.

6. The method according to claim 5 wherein the viewing records include at least some of the following: viewing activity records; content metadata of consumed content; user data; user interface navigation data; and frequency response data.

7. The method according to claim 1 wherein the prepared plurality of guidebooks are used to define classifier rules to assign labels to the clusters of data.

8. The method according to claim 1 wherein aggregated sets of viewing activity correlate with an individual’s viewing habits.

9. The method according to claim 1 wherein each user in a household is associated with one of the clusters.

10. The method according to claim 1 wherein the classifier rules are determined based on the prepared plurality of guidebooks.

11. A system for audience segmentation, the system comprising: a plurality of guidebooks of prior probability distributions for content items and user profile attributes, the prior probabilities and user profile attributes being extractable from within audience measurement data; a receiver which receives raw audience measurement data; a processor which analyzes the received raw audience measurement data by using the prepared plurality of guidebooks; a generator which generates a plurality of clusters of data per user household as a result of the analyzing; a processor which correlates viewing activity to each cluster within an identified household; a profile predictor which predicts which profile of each viewer within the identified household corresponds to each cluster; a classifier which applies classifier rules in order to assign viewing preference tags to each predicted profile; and an assigner which assigns each predicted profile viewing preferences based on the viewing preference tags assigned to that profile.

12. The system according to claim 11 wherein the guidebooks comprise at least: a guidebook comprising prior probabilities per viewer attribute; a guidebook comprising an assignment of viewer preference tags to individual users; and a guidebook comprising a list of probabilities of family types.

13. The system according to claim 11 wherein the generator which generates a plurality of clusters of data per user household comprises: a raw audience measurement data receiver; a viewer habits data extractor; a sorter which sorts the extracted data into categorical data and numerical data; a data transformer which transforms the sorted data into a high-dimensional vector representation of the raw data; an outliers detector which detects outliers in the high-dimensional vector representation; an eliminator which eliminates outliers from the high-dimensional vector representation; and a correlater which correlates the high-dimensional vector representation into clusters of individuals per household.

14. The system according to claim 13 wherein data concerning the viewer habits comprises: viewing activity; content metadata; user data; user interface navigation data; and frequency response data.

15. The system according to claim 11 wherein the raw audience measurement data comprises, at least in part, collected viewing records of which content was consumed on devices associated with members of a household.

16. The system according to claim 15 wherein the viewing records include at least some of the following: viewing activity records; content metadata of consumed content; user data; user interface navigation data; and frequency response data.

17. The system according to claim 11 wherein the prepared plurality of guidebooks are used to define classifier rules to assign labels to the clusters of data.

18. The system according to claim 11 wherein aggregated sets of viewing activity correlate with an individual’s viewing habits.

19. The system according to claim 11 wherein each user in a household is associated with one of the clusters.

20. The system according to claim 11 wherein the classifier rules are determined based on the prepared plurality of guidebooks.