Tuesday, 28 May 2013

Who said what now? | Xbox One and Kinect 2.0 Speaker Recognition

Photo courtesy of RockPaperShotgun.com

As the details regarding Microsoft's latest console 'Xbox One' remain hazy, my flatmate posed the query, as I am sure many have considered;

When you are playing it would be really annoying if someone in the room said "Xbox, go home" and you were taken out of your game to the Xbox One homepage.

My first thought was surely the multi billion dollar company, that is Microsoft, probably had someone point this idea out... otherwise they are going to be a bit in shock. The next thought progressed me into thinking about this from an audio engineer's point of view, (This being my training and not just a hobby of mine). If you consider your voice and all your friends voices, it's quite rare that you will have similar voices right? Therefore the Kinect should detect the increased frequencies in one persons voice compared to the next.

For example someone with a nasal toned voice has a boost in the mid-range frequencies (800Hz - 2kHz) where as his friend has a deep voice with prominents around the 250Hz - 500Hz range.

If the Kinect can analyse these voices as they're coming in and assign a voice to one of its persona slots it should be able to keep track of who is saying what. We were shown at the Microsoft unveiling that upon saying "Xbox On" the Xbox One will start up and log in the profile of whom said the command.

"The physiological component of voice recognition is related to the physical shape of an individual's vocal tract, which consists of an airway and the soft tissue cavities from which vocal sounds originate. To produce speech, these components work in combination with the physical movement of the jaw, tongue, and larynx and resonances in the nasal passages. The acoustic patterns of speech come from the physical characteristics of the airways." (Biometrics.gov | Speaker Recognition, 2006)

So as we look ahead to E3 (11th-13th June 2013) we can expect to see a lot more from the Kinect 2.0. I can only assume something along the lines of speaker recognition will be incorporated as the new Kinect can now monitor up to 6 bodies at any one time. Then again these things are never perfect, my brother and I both have quite similar voices so perhaps there could be some issues there...

Thanks for reading,

Rob Tyler

V | G | A

No comments:

Post a Comment