Speaker Recognition

Automatic speaker recognition systems identify humans by their voice.

Typical use cases are access systems in buildings. A person speaks a few words and the door opens if the person has the right to enter. Otherwise, the door remains closed. Persons with access rights register themselves beforehand by speaking a few sentences. Another use case is the area of forensic phonetics where a suspected person is to be identified given a speech recording.
So far, speaker recognition is based on frequency-domain features almost exclu-sively. At Empa, we will compute temporal information of the voice and use it to improve automatic speaker recognition.

/documents/56129/103159/Speaker_Recognition_Signal/8a3b3e26-4013-4aa5-8613-c5e32c29b23e?t=1447675835097
Segmentation of a speech signal in various temporal sections. Speech segments and pause segments are identified first. Voiced and unvoiced sections are distinguished within speech segments. Finally, voiced sections may contain stable and unstable regions. (violet: pause segment, green: unvoiced section, white: stable region, light pink: unstable region)

 

Areas of activity

  • Software development
  • Speech processing
  • Temporal information

 

Research and development projects

VoiceTime: Speaker recognition by temporal information“, Gebert Rüf Stiftung, 2014 – 2016

 

Project Partner

Prof. Volker Dellwo, Phonetics Laboratory, University of Zurich