Archive for In Progress

How to make a talking moose

I spoke with a prominent sound designer for animated features last night. He posed a rather intriguing problem: How do you make a talking moose sound organically like a talking moose? How do we create a voice that would represent a talking moose? How do we put the acoustic filters in place to take a voice and make it sound as if the human speech organs were inside the resonant cavity of a moose?

The point is, what’s needed at the moment is to devise for sound the same sorts of tool set that computer graphic designers have at their disposal. We need to develop the tool set for sound manipulation that produces true organic-sounding products. We don’t need to create the sounds wholecloth. Think of photo-manipulating software. We’ve got things to start with. We can make the recordings. The problem is how do we manipulate the sound without creating all sorts of digital noise? How do we make the filters that change a moose into a goose into a hedgehog, and how do we take a fast-speaking New Yorker, and make them sound like a Georgian, or better yet, how do we produce a filter to speak French with a Russian accent?

It is a problem whose resolution will depend on pulling together the right team of people, from a variety of backgrounds, using a variety of approaches. We need to understand what goes into the sounds in the first place that creates the identity of a fast-talking, angry, New York cabbie or a slow-talking, treacly Atlanta land salesman. What are the features of a Russian speaking French that differ from those of a native speaker? I’ll give you a hint: It’s not as simple as the phoneme set. So, we need some people to take apart the real organic sounds, while we’ve got others working on putting them back together. There’s a great deal of work being done on the latter half, but very little on the former. It’s time to put them together.

This will be done. It’s just a question of who, and when.

Comments

Realistic Voice synthesis and natural speech comprehension

Here is a question out to my readers: Is anyone developing a realistic system of voice synthesis, that takes into account the prosody, especially the melody and rhythm, of natural speech? On the other end, what work is being done to facilitate machine comprehension of natural speech, in particular the meaning of speech prosody?

Read the rest of this entry »

Comments

Infant Sound Environment Project (ISEP)

The Infant Sound Environment Project (ISEP) is a longitudinal study of the sound inputs to infants and the relationship of these inputs to the sound production of these children as they emerge from infancy. Follow on research will address aspects of perceptual equivalence, to better understand this relationship. While previous studies have addressed the acquisition of words and grammar—how meaning and form emerge in the human mind—the present study will address a different aspect of this experience, namely the melodic and rhythmic elements of human vocal sounds, which play a major part in the expression and comprehension of emotions and attitudes. [1] These aspects of social and communicative behaviours, in language and music, carry a fundamental layer of meaning that has heretofore gone largely unexplored. Their foregrounding in this study will permit us to explore patterning, imitation, and creativity, without unduly prejudicing our assumptions regarding the nature of these vocal sounds.

It is often posited that language is unique to our species, and that what it contributes to our being is nothing less than defining of our nature. [2] The intent of this study is not directly to challenge this notion, but rather to put question to what fundamentally characterizes language in this regard. If language is the defining element of humanity, what is language? The answer to this question underlies the present research programme. It is quite possible that the “what” that will emerge is not exclusive to the domain of language, but rather more generally applicable to human social and communicative interactions, in the human capacity for pattern recognition within our natural environment. While clearly there are aspects of language that are outside the domain of sound production and perception (visual cues and gesture, as well as sign languages which are entirely exclusive of sound), it is my contention that the systems of pattern recognition and imitation that will be in evidence through this study are likely generalizable and comparable to other behaviours, rather than of a different nature. [3] Read the rest of this entry »

Comments

Denoting the Voice: Text and Context in Music and Language

Denoting the Voice: Text and Context in Music and Language

Jonathan G. Secora Pearl
Fellowship proposal, submitted to the NEH

The Problem

Charles Darwin was wrong, at least about music. In “The Descent of Man,” he wrote: “As neither the enjoyment nor the capacity of producing musical notes are faculties of the least use to man in reference to his daily habits of life, they must be ranked amongst the most mysterious with which he is endowed.” (Darwin, C. 2004 [1879]: 636) One might have expected more, knowing his wife Emma was a fine pianist, who in her youth had studied in Paris with Frédéric Chopin. Generations of scholars, from outside the field of music, have compared it to other human behaviors, and found it lacking, a mere artifice, insubstantial, ornamental, irrelevant. Some have dismissed it as a byproduct of something ostentibly more useful to the species, like language. (Pinker, S. 1997: 528) To hold that music is useless, but that language is not, one must understand how they differ. It is a simple thing to claim they are not alike, but far harder in practice to define the ways. Music and language remain twin aspects of civilization, found in all known human cultures, across time and place, embracing us from our earliest days until the ends of our lives. Speaking and singing are found everywhere and everywhen. Wherein lies the distinction?

The greatest difficulty in answering this foundational question is that we are often deceived by written forms of music and language into believing our object dwells within them, rather than in the sounds that inspire them. On the page, they appear far more distinct than they do in sound.Text without context is a world without air; yet context alone remains the unanalyzable chaos of everyday experience. The trick is to find the balance between too much detail, and too little. Most important is a self-reflective understanding of the specifics regarding what each system captures and what it leaves out. Standard Western music notation gives preference to pitch classes and length, dealing more with intention than with execution. Written language may highlight phonetic details and word order at the expense of intonation and timing. Comparing music and language in these forms is speaking at cross-purposes. Read the rest of this entry »

Comments

In Progress updated

The In Progess pages have been update to include a recent NEH fellowship proposal, Denoting the Voice: Text and Context in Music and Language.

Comments

Foreign accent syndrome

[Update pending. Look for review of Kurowski, Blumstein, and Alexander (1996).]

What has been dubbed foreign accent syndrome was first described by Monrad-Krohn in 1947,1 in which he presented the case of a woman who suffered a shrapnel wound in WWII, that damaged portions of the left hemisphere of her brain. Her ability to produce and comprehend language was mostly spared, except for the odd effect to her speech prosody that others perceived as a foreign accent. In that particular case, sounding German in Oslo just following WWII was not an easy thing.

What must be pointed out however is that no one ever has been reported in the neurological literature spontaneously, or as a result of head injury, to have begun speaking a foreign tongue. The term foreign accent syndrome, as well as some of the descriptions that have accompanied the term, is a bit of a misnomer, in that it implies the patients of FAS somehow acquire the accent of a particular foreign language. Rather, the perception of hearers is that the prosody is somehow off, leading them to entertain the theory that the speaker is non-native in the language. Read the rest of this entry »

Comments

Priming and polysemy

It has been observed in psychological studies of lexical priming, that polysemous words (in English, “bug”, “pit”, “fall”) sometimes force a processing delay, as the mind entertains several meanings. It would be interesting to consider these polysemous words as pivots. In musical modulation, pivot chords often serve similarly ambiguous (polysemous) roles. They are valid (diatonic) chords in the two keys. It is their dual roles that permits them to serve as pivots. A study of polysemous words in discourse as serving such a modulating role, in with the polysemous nature of pivot chords in musical modulation, might be a fruitful avenue to pursue.

Comments (1)

Thoughts on child-directed speech

It has been noted that child-directed speech (CDS) is often characterized by higher pitch and wider pitch range. However, these features are not universal. In Mayan society for example it has been reported that child-directed speech is characterized by a low-whispery murmuring quality. What could explain this difference?

One factor that has been largely unconsidered is the influence of sound environment on the specific choices made by speakers. An empirically verifiable theory would be that for instance the sound environment of Mayan children is more characterized by sounds within the higher pitched range, thus leading caregivers to modify their vocalisms toward the lower (and whispery) ends. It is a question of perceptual salience. The child is naturally equipped with the ability to pick out the human voice from its surrounding (cf. Belin, et al., 2000), likely by means of timbre recognition. Auditory scene analysis permits the child’s mind to pick out these features. However, if the sound environment muddies the soundscape in a particular range of frequencies, caregivers will likely veer in a different direction in order to aid the child in isolating the voice from surrounding sounds.

Reduplication and use of diminutives is often noted in CDS. Various explanations have been proposed. However, one that has yet to gain prominence regards the value of extra syllables in permitting intonational variety and contrast. What do we gain, in English for example, in transforming dog into doggy; in Czech, by rendering chlapec into chlapeček? Are we not adding greater phonetic complexity? Shouldn’t this be more difficult for the child? But this is in line with Slobin’s proposition that morphemes are more easily acquired which contain more than one phoneme. Further, we can not discount the value of intonation as contributing to the signal. It can be observed in Taiwanese Mandarin, for instance, that CDS not only contains reduplication, but that the tone is modified as well. An example that was given to me was the word /gu/ (high level tone) meaning brother. The CDS version is often /gu-gu/ (with a low followed by a high tone). Thus there is modification not only in the reduplication, but also in the change in tone, which is not lexically or phonetically determined.

Comments

On Methodology page added

Essay On Methodology added to About pages

Comments

In progress pages added

Added the pages Melodic ambiguity in song and speech and Anticipating Dolly, describing two new projects, under the heading In Progress.

Comments

Evolution

I will be creating a bibliography on the phylogenetic development of skills and capacities both within the hominid line and comparatively within other species.

Comments

Nathani, Oller, and Cobo-Lewis (2003)

Nathani, Suneeti, D. Kimbrough Oller, and Alan B. Cobo-Lewis. “Final Syllable Lengthening (FSL) in infant vocalizations,” Journal of Child Language 30 (2003): 3-25.

Comments

« Previous entries Next Page » Next Page »
Register Login
Locations of visitors to this page