Verizon Tracks Emotions Triggered On Customer Service Calls

In a patent published on Tuesday, Verizon reveals how it could track the emotions of both subscribers and employees by analyzing recordings of calls to its customer service centers.

The telco details how its “aggregation server” could calculate the “number of times a particular keyword (e.g., bill, fee, resolved, etc.) is used with a particular emotion (e.g., anger, agitation, gratitude, etc.).”

While Verizon notifies its FiOS and Verizon Wireless subscribers that their phone calls may be recorded, it’s not clear if the technology detailed in the invention has already been deployed in its call centers. A Verizon spokesperson didn’t immediately respond to a request for comment on Tuesday.

Eric Sylves, manager of data science at Verizon, is named as inventor on the patent. Sylves notes on his LinkedIn profile that in 2013, when Verizon applied for the patent, he “coordinated with business management to develop data governance and security enhancements as well as a balanced scorecard for hundreds of Verizon Work Centers.”

The patent published on Tuesday contains examples of scorecards for Verizon customer service centers.

Abstract: A system is configured to receive voice emotion information, related to an audio recording, indicating that a vocal utterance of a speaker is spoken with negative or positive emotion. The system is configured to associate the voice emotion information with attribute information related to the audio recording, and aggregate the associated voice emotion and attribute information with other associated voice emotion and attribute information to form aggregated information. The system is configured to generate a report based on the aggregated information and one or more report parameters, and provide the report.

Patent

Claims:

  1. A system comprising: one or more server devices to: receive voice emotion information related to an audio recording, the audio recording containing a first vocal utterance of a first speaker and a second vocal utterance of a second speaker, and the voice emotion information indicating that the first vocal utterance is spoken with a particular emotion; associate a first word or phrase, within the first vocal utterance, with the first speaker and a second word or phrase, within the second vocal utterance, with the second speaker; associate the voice emotion information with attribute information related to the audio recording, the voice emotion information including information regarding at least one of the first word or phrase or the second word or phrase; aggregate the associated voice emotion and attribute information with other associated voice emotion and attribute information to form aggregated information; generate a report based on the aggregated information and one or more report parameters; and provide the report.
  1. The system of claim 1, where the one or more server devices are further to: analyze a vocal prosody characteristic of the first vocal utterance; and detect that a word or a phrase, within the first vocal utterance, is spoken with the particular emotion based on the vocal prosody characteristic, the particular emotion being a negative emotion or a positive emotion, and the voice emotion information including information regarding the detected word or phrase.
  1. The system of claim 1, where the attribute information includes information identifying at least one of: a location of the speaker, a product or service related to the speaker, or a subject of the first vocal utterance.
  1. The system of claim 1, where the one or more server devices, when associating the voice emotion information with the attribute information, are further to: join the voice emotion information with the attribute information by use of a unique identifier related to at least one of: the audio recording, the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with the first vocal utterance.
  1. The system of claim 1, where the one or more server devices are further to: receive the one or more report parameters from a user.
  1. The system of claim 1, where the one or more server devices, when generating the report based on the aggregated information and the one or more report parameters, are further to: generate a report including a count of words or phrases, within the aggregated information, associated with the particular emotion, relating to at least one of: the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with first vocal utterance.
  1. The system of claim 1, where the one or more server devices are further to: analyze a vocal prosody characteristic of the first vocal utterance; adjust, based on analyzing the vocal prosody characteristic, the vocal prosody characteristic to create an adjusted vocal prosody characteristic; and detect that a word or a phrase, within the first vocal utterance, is spoken with the particular emotion based on the adjusted vocal prosody characteristic.
  1. A non-transitory computer-readable medium storing instructions, the instructions comprising: a plurality of instructions that, when executed by one or more processors, cause the one or more processors to: receive voice emotion information related to an audio recording, the audio recording containing a vocal utterance of a speaker, and the voice emotion information indicating that the vocal utterance relates to a particular emotion; receive attribute information related to the audio recording; associate the voice emotion information with the attribute information; aggregate the associated voice emotion and attribute information with other associated voice emotion and attribute information to form aggregated information; generate a report based on the aggregated information and one or more report parameters, the report including a count of words or phrases, within the aggregated information, associated with the particular emotion and relating to at least one of: the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with the vocal utterance; and provide the report.
  1. The computer-readable medium of claim 8, where the plurality of instructions further cause the one or more processors to: analyze a vocal prosody characteristic of the vocal utterance; and detect that a word or a phrase, within the vocal utterance, is spoken with the particular emotion based on the vocal prosody characteristic, the particular emotion being a negative emotion or a positive emotion, and the voice emotion information including information regarding the detected word or phrase.
  1. The computer-readable medium of claim 8, where the speaker is a first speaker, the vocal utterance is a first vocal utterance, and the plurality of instructions further cause the one or more processors to: detect that the audio recording includes the first vocal utterance of the first speaker and a second vocal utterance of a second speaker; associate a first word or phrase, within the first vocal utterance, with the first speaker; and associate a second word or phrase, within the second vocal utterance, with the second speaker, the voice emotion information including information regarding the first word or phrase or the second word or phrase.
  1. The computer-readable medium of claim 8, where the attribute information includes information identifying at least one of: a location of the speaker, a product or service related to the speaker, or a subject of the vocal utterance.
  1. The computer-readable medium of claim 8, where one or more instructions, of the plurality of instructions, that cause the one or more processors to associate the voice emotion information with the attribute information, further cause the one or more processors to: join the voice emotion information with the attribute information by use of a unique identifier related to at least one of: the audio recording, the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with the vocal utterance.
  1. The computer-readable medium of claim 8, where one or more instructions, of the plurality of instructions, further cause the one or more processors to: receive the one or more report parameters from a user.
  1. The computer-readable medium of claim 8, where one or more instructions, of the plurality of instructions, further cause the one or more processors to: analyze a vocal prosody characteristic of the vocal utterance; adjust, based on analyzing the vocal prosody characteristic, the vocal prosody characteristic to create an adjusted vocal prosody characteristic; and detect that a word or a phrase, within the vocal utterance, is spoken with the particular emotion based on the adjusted vocal prosody characteristic.
  1. A method comprising: receiving, by one or more processors, voice emotion information related to an audio recording, the audio recording containing a first vocal utterance by a first speaker and a second vocal utterance of a second speaker, and the voice emotion information indicating that the first vocal utterance is spoken with a particular emotion; associating, by one or more processors, a first word or phrase, within the first vocal utterance, with the first speaker and a second word or phrase, within the second vocal utterance, with the second speaker; associating, by one or more processors, the voice emotion information with attribute information, related to the first speaker, within a data structure, the voice emotion information including information regarding at least one of the first word or phrase or the second word or phrase; aggregating, by one or more processors and within the data structure, the associated voice emotion and attribute information with other associated voice emotion and attribute information to form aggregated information; receiving, by one or more processors, one or more report parameters; generating, by one or more processors, a report based on the aggregated information and the one or more report parameters; and outputting, by one or more processors, the report for display.
  1. The method of claim 15, further comprising: analyzing a vocal prosody characteristic of the first vocal utterance; and detecting that a word or a phrase, within the first vocal utterance, is spoken with the particular emotion based on the vocal prosody characteristic, the particular emotion being a negative emotion or a positive emotion, and the voice emotion information including information regarding the detected word or phrase.
  1. The method of claim 15, where the attribute information includes information identifying at least one of: a location of the speaker, a product or service related to the speaker, or a subject of the first vocal utterance.
  1. The method of claim 15, where associating the voice emotion with the attribute information further comprises: joining the voice emotion information with the attribute information by use of a unique identifier related to at least one of: the audio recording, the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with the first vocal utterance.
  1. The method of claim 15, where generating the report based on the aggregated information and the one or more report parameters further comprises: generating a report including a count of words or phrases, within the aggregated information, associated with the particular emotion, relating to at least one of: the speaker, a location associated with the speaker, a product associated with the speaker, or a subject associated with the first vocal utterance.
  1. The method of claim 15, further comprising: analyzing a vocal prosody characteristic of the first vocal utterance; adjusting, based on analyzing the vocal prosody characteristic, the vocal prosody characteristic to create an adjusted vocal prosody characteristic; and detecting that a word or a phrase, within the first vocal utterance, is spoken with the particular emotion based on the adjusted vocal prosody characteristic.