My firm, Perceptral LLC, is currently advertising three open research positions in speech technology, to support current and pending contracts for Research & Development. To read the full job announcements, please visit http://www.perceptral.com, and click on “Careers”.
Thanks to the diligence of Joseph Bauer (intern at Perceptral LLC), this website is now back up and running with all previous content restored! More to come.
Music and Language II:
A conference in celebration of the 25th Anniversary of Lerdahl and
Jackendoff’s “A Generative Theory of Tonal Music”
July 10-13, 2008
Tufts University Perry and Marty Granoff Music Center
This conference follows the successful conference on Music and Language
held at Cambridge University in summer 2007. The conference will be
hosted by Provost Jamshed Bharucha and the Office of the Provost,
Professor Joseph Auner and the Department of Music, and Professor
Robert Cook and the Department of Psychology. We invite participants
and presenters from all fields (music, psychology, linguistics,
cognitive science, anthropology, etc.)
Paper and poster submissions will be due by December 1, 2007. Further
details about the conference and the paper/poster submission form are
available on our website at: http://musicandlanguage.tufts.edu/
We will also be honoring Ray Jackendoff, Seth Merrin Professor of
Philosophy at Tufts, and Fred Lerdahl, Fritz Reiner Professor of Music
at Columbia University. This year marks the 25th Anniversary of their
seminal work, “A Generative Theory of Tonal Music.”
Eric Clarke, Oxford University
Lola Cuddy, Queen’s University
Peter Culicover, Ohio State University
Ray Jackendoff, Tufts University
Fred Lerdahl, Columbia University
Betsy Marvin, Eastman School of Music
Lawrence Parsons, University of Sheffield
Aniruddh Patel, Neurosciences Institute
Isabelle Peretz, University of Montreal
Jamshed Bharucha, Tufts University
Gottfried Schlaug, Harvard University /Beth Israel Hospital
Mark Hauser, Harvard University
Ellen Winner, Boston College
Tod Machover, Massachusetts Institute of Technology
Mark Tramo, Harvard University
If you have any questions, please contact the Office of the Provost at
Bruce Richman of Penn Hills, Pennsylvania, passed away unexpectedly on October 4, 2007. He died of complications related to cardiac bypass surgery while on vacation in the Pacific Northwest. He was 61 years old. A native of New York City, Bruce lived for many years in the San Francisco Bay Area and in Cleveland, Ohio. He attended Princeton University and earned an M.A. in English from Antioch University. Most recently he moved to Pennsylvania to build a new life with his fiancée.
Bruce was a committed father and loyal friend. He is loved by his family and friends for his beautiful mind, passionate soul, non-judgmental acceptance of others, unfailing kindness, and honesty. He filled any room with his booming voice (often in song), big gestures, and his unique, endearing, sometimes quirky personality. An unconventional person, he proudly retained his “hippie sensibilities” throughout his life.
Bruce was a brilliant man whose life’s work focused on the origins of language. He conducted ground-breaking study on the vocalizations of gelada monkeys, and theorized that there was a singing stage in the evolution of human speech and language. Bruce’s research was published in a number of professional journals, and he contributed to several books on language development. He loved teaching, and was equally comfortable helping his students with English, math, and science. He tutored, taught high school classes, and worked as a community college instructor for most of his life. In the past few years he enjoyed teaching English as a Second Language. He was in the process of writing a textbook to use songs and poetry to develop conversational English speaking skills.
Bruce was a broad thinker and had an incredible curiosity about the world around him. He started each day gathering current news from the internet, newspapers, and political talk shows. Before he finished his morning coffee he was filled with conversation topics – and opinions – for the day. And he shared them widely.
For Bruce, intellectual questions weren’t abstract, but charged with passion and full of real-world implications. He delighted in a range of music from Mozart to Fred Astaire to Aretha Franklin. He had the perfect lyrics at hand for nearly any situation life presented him. Though a complex person, Bruce savored the simple pleasures of everyday life. He could also be moved deeply by an opera aria, a newly-discovered section of a city, or a collection of outsider art.
Bruce found refuge in books of all kinds, in the eyes of his daughter, and in the arms of his fiancée. He was passionate about social justice. He embraced any belief he held or activity he undertook with a full measure of enthusiasm. As self-described “cockeyed optimist”, he thought that the world could someday become a better place. And he believed in new beginnings and the power of love.
Bruce is survived by his fiancée, Deb Trevellini of Penn Hills, PA and owner of Morninglory in Murrysville, PA; Debbie Pearl and his daughter Susanna Richman of Cleveland, OH; his sister and brother-in-law, Barbara and Ted Turk of Ambler, PA; his nieces and nephews; and life-long friends on the West Coast.
The family is very grateful for the expert and loving care given Bruce by the Oregon Health and Science University (OHSU) Hospital staff in Portland. It requests that memorial donations be made to a special needs trust for Bruce’s daughter: Trustee, Susanna Richman, SNT 3805 Woodridge Rd, Cleveland Heights, OH, 44121.
Aims of the symposium: A fundamental trait of the communication system of all mammals is to convey emotions. Emotions are transmitted by non-verbal acoustic communication in all mammals. In addition, humans can make use of speech and music to transmit emotions. A central and as yet unresolved question is whether there exists an underlying set of rules holding across species, governing production and perception of acoustically conveyed emotions.
This interdisciplinary symposium provides a framework for discussing ongoing research in the field of behavioral, cognitive and evolutionary neurosciences. It aims at deepening our understanding of shared and unique principles important to reconstruct evolutionary pathways for emotional communication in the acoustic domain.
Organisers of the symposium: Prof. Elke Zimmermann and Dr Sabine Schmidt, both at the Hanover Veterinary University, and Prof. Eckart Altenmüller (Hanover University for Music and Drama.
Abstracts and Registration: Abstract submission deadline is May 31, 2007. For details regarding registration and the scientific programme, please see our webpage at http://www.eec2007.de
A new California chapter of the Applied Voice Input/Output Society is currently forming. Anyone interested in becoming involved, or in being informed of upcoming events, please contact jonathan at)musiclanguage.net.
The Peter Wall Institute for Advanced Studies at the University of British Columbia (www.pwias.ubc.ca) is hosting a 3-day interdisciplinary Exploratory Workshop June 21-23 in conjunction with the Vancouver International Song Institute (www.visi.ca), a new and unique interdisciplinary professional training program for the study of Art Song, at the UBC School of Music June 17-23. The title of the workshop is “Art Song Anima: Ambiguity, Authenticity, Augury”, convened by Professor Rena Sharon, Artistic Director of VISI, and Drs. Eric Vatikiotis-Bateson, Linguistics, and Laurel Fais, Psychology. Its topics flow from an arts/humanities starting point on the first day (ambiguities and specificities in the setting of poetry to music), into discussion on day two of the phenomenology of speech/song intersections, comprising linguistics, vocal physiology, cognition, neuroscience. The final day will include consideration of song from a biocultural perspective, with presentation of data about the use of song in therapeutic environments such as Alzheimer’s’ care, and its evolutionary role in individual development of parent/infant communication and collective social ritual.
I spoke with a prominent sound designer for animated features last night. He posed a rather intriguing problem: How do you make a talking moose sound organically like a talking moose? How do we create a voice that would represent a talking moose? How do we put the acoustic filters in place to take a voice and make it sound as if the human speech organs were inside the resonant cavity of a moose?
The point is, what’s needed at the moment is to devise for sound the same sorts of tool set that computer graphic designers have at their disposal. We need to develop the tool set for sound manipulation that produces true organic-sounding products. We don’t need to create the sounds wholecloth. Think of photo-manipulating software. We’ve got things to start with. We can make the recordings. The problem is how do we manipulate the sound without creating all sorts of digital noise? How do we make the filters that change a moose into a goose into a hedgehog, and how do we take a fast-speaking New Yorker, and make them sound like a Georgian, or better yet, how do we produce a filter to speak French with a Russian accent?
It is a problem whose resolution will depend on pulling together the right team of people, from a variety of backgrounds, using a variety of approaches. We need to understand what goes into the sounds in the first place that creates the identity of a fast-talking, angry, New York cabbie or a slow-talking, treacly Atlanta land salesman. What are the features of a Russian speaking French that differ from those of a native speaker? I’ll give you a hint: It’s not as simple as the phoneme set. So, we need some people to take apart the real organic sounds, while we’ve got others working on putting them back together. There’s a great deal of work being done on the latter half, but very little on the former. It’s time to put them together.
This will be done. It’s just a question of who, and when.
It would seem a whole new realm of ethical and legal considerations will arise out of the development of synthetic voices based, at least in part, on sampling of natural speech. One easy way to avoid these concerns, I suppose, would be to hire speakers, or use insider voices for the immediate needs of production, having all rights waived under contract. But these issues may arise anyhow from the capture and analysis end. That is, for instance, if one wishes to analyze a great deal of data from a particular region or dialect, it would be necessary to capture a range of speakers. Since the interest is in capturing natural data, this might best be accomplished if the speakers are unaware that they are being recorded. But would such eavesdropping be ethical, and would it be legal?
What if snippets of actual speakers were used for the later development of voices? Would the original speaker be recognizable? It would almost assuredly be possible to modify the resultant sound such that the speaker would not be recognizable. But would the product still be in some way legally tied to the original speaker? Does intellectual property extend to the products of our own voices? What if the sound was captured in public rather than clandestinely? Wouldn’t it be akin to publishing pictures of famous people who appeared in public? That is, would public presentation render moot any claims to intellectual or personal property rights? The problem of course would be ensuring the requisite sound quality under such conditions.
The ethics of this come up, even if the voice could be altered to mask the identity (i.e. by significantly changing the timbre and other prosodic qualities). I think these issues will have to be dealt with at some point. I recall a composers’ presentation a few years ago, in which he admitted to sampling some performance, which was later manipulated and modified to the extent that perhaps only he knew that the original had been used. Nonetheless, he felt it necessary to say something about, to acknowledge his guilt regarding the matter. Just something to think about.
Elliott D. Ross and colleagues have long studied the impact of particular right hemisphere neuropathologies on affective speech prosody, syndromes collectively termed the aprosodias. (See the Song, Speech, and Brain bibliography for some details). If we develop the tools for automatic extraction of voice features (ones that would be necessary to produce animated synthetic voices), it would be possible to see a future where audio recordings of patients speaking would become a normal part of a medical file. These audio recordings could be subject to automatic analysis and extraction of individual voice features. A comparison from such baseline recordings with post-event recordings could provide cues to identifying neuropathologies that might be otherwise missed. They could also serve as a method for the analysis and quantification of dysarthria and other voice affecting disorders.
These features as well must certainly play a role in voice identification and verification systems. The problem at hand is finding a way to automatically (and reliably) extract these features of voice (timing, pitch, phonemes/allophonic variation, timbre) and to classify them for analysis and comparison.
Such systems could go beyond medical applications as well. There is no reason why automatic extraction of features couldn’t be applied for military and intelligence applications, to quickly identify dialects and languages, or be able to recognize an impostor, someone speaking a non-native dialect or language. These systems could also be used for pedagogical purposes to assist learners in acquiring a near-native accent in a foreign language, by providing a better understanding of the features common to native speakers, and analysis and feedback on the learner’s production. This would be a giant stride forward from the overly simplistic acoustic language learning tools (that provide too literal a comparison from model to learner), which are currently available.
Is anyone working on developing these tools? I’ve heard nothing. Anyone interested?