Spotify to suggest music based on user emotions identified through speech
Spotify have registered a patent that would allow the company to pick up speech audio and background noise which would subsequently identify users’ emotional state, among other personal data, to suggest music accordingly.
The patent, titled “Identification of taste attributes from an audio signal”, was filed in February 2018 and was approved on January 12 this year.
The filing discusses a “method for processing a provided audio signal that includes speech content and background noise […], identifying playable content based on the processed audio signal content”.
A common approach for identifying what content a user should be suggested, notes the filing, is “to query the user for basic information such as gender or age, to narrow down the number of possible recommendations” which Spotify feels is outdated.
“One challenge involving the foregoing approach is that it requires significant time and effort on the part of the user. In particular, the user is required to tediously input answers to multiple queries in order for the system to identify the user’s tastes,” it states.
“What is needed is an entirely different approach to collecting taste attributes of a user, particularly one that is rooted in technology so that the above-described human activity (e.g., requiring a user to provide input) is at least partially eliminated and performed more efficiently.”
Instead, Spotify wants to identify users’ emotions through various speech factors including intonation and rhythm which it will collect, categorize and then suggest music accordingly.
“A more basic approach might simply categorize the emotion into happy, angry, afraid, sad or neutral. For example, prosodic information (e.g., intonation, stress, rhythm and the like of units of speech) can be combined and integrated with acoustic information within a hidden Markov model architecture, which allows one to make observations at a rate appropriate for the phenomena to be modeled,” it explains.
“Using this architecture, that prosodic information allows the emotional state of a speaker to be detected and categorized.”
This patent also seeks to determine “at least one of the emotional state, gender, age, or accent” of the user as a factor in the recommendation of certain artists or songs.
The details outlined in US Patent 10,891,948 contains numerous obvious privacy and ethical concerns, and should be met with opposition from digital rights watchdogs given its capacity to not only collect user audio but to formulate and store behavioral data.