Jonathan G. Secora Pearl
Department of Linguistics
University of California, Santa Barbara
Music & Language Studies
7220 N. Rosemead Blvd., Suite 202-10
San Gabriel, CA 91775
The emerging field of music and language studies draws on the traditions and techniques of linguistics and musicology, with an empirical and cognitive bent. The present paper examines the relevance of the Competition Model from psycholinguistics on research that straddles the territories of speech prosody and music, in particular addressing the production and perception of the musical aspects (pitch, timing, amplitude, and timbre) of human vocal sounds.
The Competition Model is an emergentist model for human language. It assumes that human brains develop according to a genetically-specified though plastic plan, which includes certain preferences in computing style arising in particular regions or pathways of the brain, as a result of native architectural and timing mechanisms. This is in contrast with nativist theories that implicitly presume innate representations, of grammar for instance, at the cortical level. According to proponents of the Competition Model, evidence for domain-specific language modules is grossly exaggerated, and most localization of language processing that does exist is domain-general in nature and likely emerges as a result of the interaction between the sensory environment and the brain’s uneven computational playing field, rather than being specified in the genes.
It is argued that although grammar is not given in the world, neither is it provided for in the human genome. This approach in particular explains why brain damage in infants and children does not result in long-term deficits which appear as a result of analogous damage to adult brains. Adults have a life-long history of experience neurologically calcified, as a result of Hebbian learning. Children on the other hand have less experience from which to have solidified brain connectivity through stimulus/response-styled strengthening and weakening; in addition, continuing neurogenesis and synaptogenesis permit greater flexibility in attending to novel experiences, even if the resultant pathways may be computationally less efficient than in normals. For these reasons, maturation and learning are considered two aspects of the same events.
The Competition Model presumes that languages differ in the means by which linguistic information is encoded, and further that such differences are as likely quantitative as qualitative. Not only do they differ in their use of specific linguistic features (i.e., lexical tone, morphological inflections) but also in the degree to which various items bear relevant information for listeners. This is shown in cross-linguistic differences in relevance weightings and costs to processing for particular features in conflict with one another (for example: word order, animacy, subject-verb agreement, and gender and number markings used in decisions regarding transitivity). In support of the theory, it appears that the most cost efficient of these features—which can differ significantly from language to language—in terms of processing load and relevance (dubbed cue costs and cue validity), are the least susceptible to disturbance under brain damage, meaning they are most likely to be encoded reduplicatively in the brain. Since aphasic syndromes differ cross-linguistically in the specific deficits they engender—in particular, that these differences reflect the inherent qualitative and quantitative variety among languages—this is taken as evidence that grammar is not innately and universally encoded, but rather based in the brain’s experience of the world.
RELEVANCE TO SPEECH/SONG COMPARISONS
It appears that much of the research involving aphasias has been grossly flawed by preconceived notions regarding the nature of these deficits, as well as over-reliance on generative theories of language. In the literature on prosodic and musical deficits, strikingly these studies are largely based on presumptions of evidence from the more abundant literature on aphasias. If those are flawed then a great deal of the latticework upon which studies regarding neurologically-based deficits in linguistic prosody and the various amusias may collapse.
From the stance that any questions regarding the nature of language and music must be empirically tested, how would research regarding speech prosody and song fit into the scheme of the Competition Model? The literature is littered with hasty conclusions and crass simplifications of the nature of music. Music however, no less than language, appears to be a uniquely human attribute. It is ubiquitous across cultures, and throughout known history, and perhaps more primitive phylogenetically.  Just as no chimpanzee has spontaneously begun a dialogue on the nature of altruism, no bonobo has ever played so much as a hollow log or a blade of grass. Fruitless analogies between human song and whale or bird song aside, any continuity between human music and the behaviors of other animals is likely to be found in those aspects of human behavior that are common to both music and language. In particular, I would argue that it is in finding the commonalities between speaking and singing that we are likely to find a large part of the gulf that divides humanity from the rest of nature. And in those features, we will understand the cognitive roots that evolutionarily gave rise to both language and culture.
If adaptations that are claimed for language are not domain-specific, we are likely to find further evidence for this in attempting to define the difference between speech and song. Both are human vocal behaviors. Both leave an acoustic signature, and provide imperfect data to the perceptual apparatus of listeners. In each case, the behavior is most often directed towards or for the benefit of other humans, with an intent to express or communicate ideas or emotions. Further, there are cultural differences regarding which cues carry the most relevant information (i.e., rhythm, melody, divisions of the octave, timbre) that can be analyzed and reliably perceived (though in different ways cross-culturally). Each has aspects of grammar and syntax that are more or less clearly definable. Just as the local choice of phoneme sets varies in arbitrary ways, so too aspects of musical vocabulary vary according to seemingly arbitrary choices. Which features of the acoustic signal segment categorical boundaries vary as much for music as they do for language.
However, there are distinct contrasts between these two domains of human behavior. For instance, language contains a lexicon of semantically-grounded words, whereas music can be, and often is, entirely devoid of propositional meaning. The music in song is apart from the meaning of the words, sometimes independent, at times reinforcing, often contradicting. The musical contribution to song serves in a way to replace the natural prosody of speech. But prosodic aspects of speech contain and convey a great deal of information that is outside the grammar and lexicon of language.
In addition, there is some evidence in the literature for a dissociation between spoken prosody (both lexical and affective) and singing. These studies have used a variety of methodologies (experimental and clinical), and have implicated a multitude of brain regions, from left frontal lobe for lexical prosody (Monrad-Krohn 1947; and Buchanan et al 2000), to right tempoparietal regions (Ross & Mesulam 1979; Ross 1981) for affective prosody, to cerebellum and bilateral motor cortex/posterior inferior frontal gyri for dissociations between speaking and nonverbal singing of melody and rhythm (Riecker et al 2000). Clearly a great deal of study remains to be done.
POINTS FOR FUTURE RESEARCH
How is meaning altered when speech is sung? How do the musical aspects of song figure into the calculations of a listener? Can cue validity and cue cost be separately defined in musical terms? Might this provide further evidence for the case that language processing is in large-part domain-general? Why is it that some aphasics, unable to utter a word of speech, can sing? Is it merely a matter of defining in finer detail the subtle aspects of these deficits? Is there any evidence to sustain dissociations between speaking and singing in comprehension? If there are, I have not yet found any in the literature. If not, it would be rather strange that the production of song, but not its reception, would dissociate from speech.
Likely the anecdotal evidence is skewed by flawed assumptions. Primarily, the issue is confounded by the fact that no one has sufficiently defined the subject matter under investigation. What does it mean to speak, that is different from what it means to sing? If anecdotal evidence supports the claim that brain damaged individuals are able to engage in one but not another of two similar activities, both including the expression of words by the voice, encoded by means of manipulating pitch, duration, amplitude, and timbre, then we need to understand better how these two behaviors differ. Are they two ends of a continuum, or is there a disjunction that divides up the otherwise shared behavior space? How can these matters be tested empirically?
Difficulty arises even in the simplest stages of such research. For instance, there is the nativist argument that brain structures have evolved solely for speech. However, nowhere in the literature is there a clear definition of speech as a solitary act. In fact, speech, like many human behaviors, is a complex of many parts. Without better definitions of the matter under investigation, claims one way or the other are unfalsifiable. Although the necessary distinction between production and perception is normally stipulated, even accounting for this distinction, the remaining behaviors are not simple acts. The perception of speech for instance involves acoustic input to the ears, sent to the primary auditory cortex. A great deal of calculating must go on, however, before the brain will recognize the auditory input as a meaningful signal. Interestingly, there is evidence that the brain early on recognizes human vocal sounds as special (Belin et al 2000), yet this only serves further to link speaking and singing in their uniqueness as stimuli, rather than to distinguish them from each other.
Here is a hypothetical, if entirely speculative, sequence of events: First there is the segmentation of the signal by sources (the “cocktail party effect”). The signal may likely include not only other voices, but environmental sounds as well, which must be filtered out as irrelevant. Next, the signal is parsed into phonemic units, which are further recalibrated based on context (i.e. coarticulation effects, nasalization). Allowances must be made for dialectic and idiolectic variation, for proper categorization of these sounds. In parallel, there will be processing of pitch, intensity and timing. Calculations will go on to determine which aspects of the pitch are local, some relevant for phonemic categorization and others for lexical prominence, and which are more global, and therefore relevant for affective determinations of attitude or judgments on the encoded meanings. Some allowances must be made for individual differences of voice quality, perhaps based on style of speaking or physiological issues such as hoarseness, or lack of muscular control (dysarthria) due to aging or disease. It becomes quickly clear that to speak of a speech act is a polite fiction, if the implication is that such an utterance can be easily qualified and quantified.
For this reason, many of the deficits that appear to affect specific grammatical or lexical processing, may in fact be the result of problems higher along one or another secondary processing pathways. As Bates et al (1998) note: “If we experience two stimuli in exactly the same way, then (by definition) we do not know that they are different.” (p. 599) It follows then that what can be distinguished in normals, or dissociated in pathologies are somehow different in terms of brain processing. Surely, there are many distinctions that the brain is incapable (or disinclined) to notice. For instance, sharp boundaries do exist in perception for graded acoustic events, such as the categorical boundary for the phonemes /b/ and /p/; and as noted in Bates (in press, p. 8 ), this appears not to be a species-specific phenomena. The same is likely true for categorical perception of colors.
The point is: graded phenomena in the world can be perceived as disjunct by living brains. Where brains fail to make a distinction, the phenomena are for our purposes categorically the same. It is by identifying and quantifying the features used by brains that we will come to understand how seemingly equivalent behaviors do in fact differ, likewise how apparently different behaviors may utilize shared processes in the brain. Therefore the task of specifying dissociations is largely a matter of determining the level of processing at which each dissociation occurs. If these levels are consistent across subjects, they can be viewed as universal brain mechanisms (without regard at this point for whether they are innate or emergent). Where they differ, it is likely the result of individual differences (perhaps based in experience or native abilities) or failure to specify the stimuli with sufficient detail. In many cases, the technology for such fine-grained distinctions may not yet exist.
 This is a contentious point. Some have argued that music is not universally understood and appreciated by individuals across cultures. Others have noted that not all cultures have a native music. For example Southern Popaluca has been cited in this regard. Southern Popalucan music is all borrowed from Spanish and popular Mexican traditions. On the one hand, such cases may be the exceptions that prove the rule. However, and more deeply indicative is the question regarding what features distinguish music from language. Inherent in all spoken languages are manipulations of timing, intonation, and timbre, which are features shared in common between musical and linguistic phenomena. Arguably, even signed languages, while lacking sound, contain similar and analogous features, as has been argued by Sherman Wilcox among others.
BATES, E. “On the nature and nurture of language.” (in press). In R. Levi-Montalcini, D. Baltimore, R. Dulbecco, & F. Jacob (Series Eds.) & E. Bizzi, P. Calissano, & V. Volterra (Vol. Eds.), Frontiere della biologia [Frontiers of biology]. The brain of homo sapiens. Rome: Giovanni Trecanni. [Prepublication version].
BATES, E., DEVESCOVI, A., & WULFECK, B. (2001). Psycholinguistics: a cross-language perspective. Annual Review of Psychology. Chippewa Falls, WI: Annual Reviews.
BATES, E., et al (1998). “Innateness and emergentism.” In W. Bechtel & G. Graham (Eds.), A Companion to Cognitive Science (pp. 590-601). Malden, MA and Oxford: Blackwell Publishers.
BELIN, P., et al. 2000. “Voice-selective areas in human auditory cortex.” Nature 43 (20 January 2000): 309-312.
BUCHANAN, T. W., et al. 2000. “Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cognitive Brain Research 9: 227-238.
MONRAD-KROHN, G. H. (1947). “Dysprosody or altered ‘melody of language’.” Brain 70, 405-415.
RIECKER, A., et al. (2000). “Opposite hemispheric lateralization effects during speaking and singing at motor cortex, insula and cerebellum.” NeuroReport 11 (9), 1997-2000.
ROSS, E. D. (1981, Sep). “The aprosodias: Functional-anatomic organization of the affective components of language in the right hemisphere.” Archives of Neurology 38, 561-569.
ROSS, E. D. & MESULAM, M.-M. (1979). “Dominant language functions of the right hemisphere? Prosody and emotional gesturing.” Archives of Neurology 36, 144-148.