Automatic voice feature extraction

Elliott D. Ross and colleagues have long studied the impact of particular right hemisphere neuropathologies on affective speech prosody, syndromes collectively termed the aprosodias. (See the Song, Speech, and Brain bibliography for some details). If we develop the tools for automatic extraction of voice features (ones that would be necessary to produce animated synthetic voices), it would be possible to see a future where audio recordings of patients speaking would become a normal part of a medical file. These audio recordings could be subject to automatic analysis and extraction of individual voice features. A comparison from such baseline recordings with post-event recordings could provide cues to identifying neuropathologies that might be otherwise missed. They could also serve as a method for the analysis and quantification of dysarthria and other voice affecting disorders.

These features as well must certainly play a role in voice identification and verification systems. The problem at hand is finding a way to automatically (and reliably) extract these features of voice (timing, pitch, phonemes/allophonic variation, timbre) and to classify them for analysis and comparison.

Such systems could go beyond medical applications as well. There is no reason why automatic extraction of features couldn’t be applied for military and intelligence applications, to quickly identify dialects and languages, or be able to recognize an impostor, someone speaking a non-native dialect or language. These systems could also be used for pedagogical purposes to assist learners in acquiring a near-native accent in a foreign language, by providing a better understanding of the features common to native speakers, and analysis and feedback on the learner’s production. This would be a giant stride forward from the overly simplistic acoustic language learning tools (that provide too literal a comparison from model to learner), which are currently available.

Is anyone working on developing these tools? I’ve heard nothing. Anyone interested?

Leave a Comment

You must be logged in to post a comment.

Register Login
Locations of visitors to this page