2.4. The root cause: misallocation of attention

In Section 2.1, I have written that stuttering is caused by a disruption of auditory feedback in the back part of the speech unit preceding the stuttering event. What causes the disruptions? They may be caused by insufficient processing of auditory feedback; for example, the feedback information may be incompletely transmitted or kept in working memory. Poor processing, in turn, may result from a habitual misallocation of attention, that is, of perceptual and processing capacities, during speech.

2.4.1. Speech processing requires attention

Behavioral and neuroimaging studies have shown that the processing of heard speech is very limited without sufficient attention to the speech signal. In a dichotic listening experiment, Cherry (1953) presented two different spoken messages simultaneously, one to each ear. The participants were asked to pay attention to just one of the messages and to repeat it aloud. Later, when asked to describe the content of the ignored message, the participants were not only unable to do this; they even appeared not to have noticed a change in language (from English to German) or a change to speech played backward. They only noticed gross physical changes in the stimuli, such as if the unattended speaker switched gender, or if unattended speech was replaced by a 400-Hz tone.

Cherry’s observation has become well known as the ‘cocktail party effect’. It indicates an automatic mechanism of selective attention that allows us to listen to someone and to comprehend his or her speech in a hubbub of conversations. It makes it possible to sift out an individual voice and, at the same time, to ignore all others by focusing on the selected voice. Ignored voices are perceived as ambient noise; this acoustic information seems to be processed only at a low level. In turn, without attention to a speaker’s voice, you don’t understand his or her speech, except for a few words that function as signals and attract your attention, e.g., your name or calls such as “Help!” or “Attention!” (Moray, 1959; Wood & Nelson, 1995)

Cherry’s results have been confirmed by neuroimaging studies. Jäncke, Mirzazade, and Shah (1999) found that activations in the primary and secondary auditory cortex were greatest when the participants’ attention was focused on the stimuli (to detect a specific target syllable in a speech recognition task) and weakest when the participants ignored the stimuli. Likewise, Hugdahl et al. (2003) found increased activations in auditory association areas (middle and superior temporal gyrus) when the participants attentively listened to verbal stimuli, compared to passive listening without focusing attention on the stimuli. The authors conclude that attention may facilitate speech perception by modulating the neuronal activation.

Using fMRI, Sabri et al. (2008) found that activations in auditory association areas were significantly greater when participants attended acoustic stimuli, and smaller when the stimuli were ignored. Moreover, some cortical areas that probably are involved in speech processing showed activation only when participants listened to the verbal stimuli, but not when the stimuli were ignored. The authors conclude that the processing of phonetic and lexical-semantic information is very limited without attention to the auditory channel.

Own speech is processed like that of others

This might be true not only for the processing of the speech of others, but for the processing of the auditory feedback of one’s speech as well, since both models (Levelt, 1995) and neuroimaging findings (McGuire, Silbersweig, & Frith, 1996; Price et al., 1996) suggest that the auditory feedback of one’s speech is processed and comprehended in nearly the same way by the same speech comprehension system as the speech of someone else.

This view is supported by Scheerer, Tumber, and Jones (2016), who for the first time investigated the impact of attention on the utilization of auditory feedback, though they only measured the response to a feedback manipulation in pitch, not the impact of attention on phonological or lexical processing. They conclude that their results “suggest that attention is required for the speech motor control system to make optimal use of auditory feedback for the regulation and planning of speech motor commands” (p. 826).


to the top


2.4.2. The causal chain of stuttering

We can thus assume that the proper processing of the auditory feedback of speech requires sufficient attention to the auditory channel. When too much attention is directed to speech planning or volitional motor control and too little attention is paid to hearing one’s speech, auditory feedback is poorly processed; this can cause invalid error signals in the monitoring system, which result in stuttering. This causal chain is shown in the figure below.

Stuttering theory, causal chain

Figure 7: The causal chain of stuttering. AF = auditory feedback.

It is important to note that only speech comprehension depends on attention to the auditory channel, but not the monitor’s response to a mismatch between expectation and feedback. The mismatch-related auditory-evoked brain potential is detectable in EEG even when the subject’s attention is distracted from the auditory stimuli (Näätänen et al., 2007) (read more). Therefore, such a mismatch may occur because of insufficient attention to the auditory channel and, resulting from that, poor processing of the auditory feedback of speech.

Attentional imbalance

In Section 2.1, I have explained why disruptions of auditory feedback cannot occur at the onset or in the initial part of a word or phrase without impairing error detection (because correct expectations, which are necessary for error detection, couldn’t be generated; see also Section 1.4). For error detection to work at all, insufficient attention to auditory feedback is only possible at the ends or in the back parts of words or phrases. But why should a speaker behave so?

First, there is a normal, automatic modulation of auditory attention that leads to the initial portions of words or familiar phrases being preferentially processed (Astheimer & Sanders, 2009, 2012). That is, attention at the ends of speech units is normally reduced, even in normal speakers. If now a general attentional imbalance comes in addition—an imbalance in favor of speech planning or motor control at the cost of auditory feedback—then attention (perceptual and processing capacity) at the ends of words and phrases can easily become too little.

Maybe that, at the ends of words or phrases, attention is already directed forward to speech planning, either because the speaker strongly focuses on the intended message, or in the attempt to avoid stuttering by word substitution or rephrasing. Furthermore, uncertainty or worry about the content of an intended utterance, e.g., to a superior, may take up most of a speaker’s attention and reduce the capacity for feedback processing (read more). In normal speakers, but apparently not in stutterers, attention regulation during speech is sufficiently resilient to prevent such an excessive imbalance.

Empirical evidence

In fact, Lazzari et al. (2024) found that normal speakers could not ignore their auditory feedback in a finger-tapping task, but stutterers could. We can assume that normal speakers always automatically pay sufficient attention to auditory feedback when producing something audible by movement. By contrast, stutterers can ignore auditory feedback in such conditions, and I think they often do so when speaking. The study further showed that attention to auditory feedback determines whether auditory feedback is integrated in motor control. Read more in the Blog.

Are there any empirical findings suggesting that stutterers direct too little attention to auditory feedback when speaking? In several neuroimaging studies (Braun et al., 1997; Fox et al., 1996; Ingham et al., 2003), the auditory association areas of the cortex, which are responsible for speech monitoring (mainly the left superior temporal gyrus, BA22; see Indefrey 2011; Indefrey & Levelt, 2004), were found to be less activated during habitual speech in stutterers, compared to normally fluent speakers (see also Table 1). Some researchers have concluded from this that there is a deficit in the self-monitoring of speech in stutterers (read more).

By contrast, in fluency-inducing conditions such as metronome-paced speaking, chorus reading, or singing, reduced stuttering was found to be associated with greater activations in auditory association areas (Braun et al., 1997; Fox et al., 1996; Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011). Some of the researchers have assumed that the enhanced fluency had to do with improved self-monitoring (read more).

Moreover, a meta-analysis by Budde, Barron, and Fox (2014), in which all relevant previously published neuroimaging studies were included, revealed that the lack of activations in the auditory cortex was one of the very characteristics of the brain activation pattern of stutterers. They further revealed that only auditory areas were consistently greater activated with induced speech fluency (in choral reading, metronome-paced speech, singing, etc.) compared to habitual, stuttered speech.

Admittedly, all these findings only suggest reduced perception or processing of the auditory feedback of speech, not necessarily reduced attention to it. But remember the studies reported on above, in which a relationship was found between (1) attention to auditory speech stimuli, (2) activation in auditory cortical areas, and (3) receptive speech processing. It is at least plausible to assume that the increased activation in auditory association areas in fluency-enhancing conditions and after fluency-shaping therapies reflects a more intensive processing of auditory feedback, and that this, in turn, is caused by increased attention to auditory feedback. This view is supported by the fact that it allows us to explain in a unified manner how fluency-enhancing conditions work (see Section 3.1).

In a series of EEG studies using auditory-evoked potentials, Max and Daliri (2019) found that adults who do not stutter consistently showed a modulation of the auditory system prior to speech onset. They called this pre-speech auditory modulation (PSAM). In adults who stutter, PSAM was greatly reduced or absent. Max and Daliri have hypothesized that PSAM plays a role “in engaging or even enhancing processes involved in sensory feedback monitoring” (p. 3074). I think, PSAM can be interpreted as a modulation of the allocation of attention to facilitate the processing of the auditory feedback of speech (read more in this blog entry from 2019).

Furthermore, there are some studies in which correlations between auditory-evoked brain potentials—a measure of central auditory processing—and stuttering severity were found (read more). It is, however, unlikely that a (relatively constant) problem with central auditory processing immediately causes stuttering symptoms, which can come and go from one moment to the next. It is more likely that problems with auditory processing in stutterers are related to (result from or cause) problems with auditory attention.

The Vasic and Wijnen experiment

In our context, the experiment conducted by Vasic and Wijnen (2005) is interesting. They proposed a variation of the Covert Repair Hypothesis (Postma & Kolk, 1993; however, in my view, it is more of a derivative of Maraist & Hutton, 1957; see last section). They hypothesized an overly sensitive monitoring of normal disfluencies in stutterers, and suspected the attempt to avoid these normal disfluencies to cause stuttering. This hypothesis is not supported by empirical findings: stutterers exhibited weaker responses to unexpected time-varying perturbations of auditory feedback (Cai et al., 2014) and seem to have a rather poor auditory temporal resolution (Devaraju et al., 2020).

Anyway, to test their hypothesis that stuttering is reduced by distracting attention from monitoring one’s own speech, Vasic and Wijnen examined two conditions: (1) distraction from overall auditory feedback by playing the computer game Pong (virtual table tennis) while speaking, and (2) distraction from discontinuities in thier speech flow by focusing on the lexical aspect of one’s speech (the participants had to monitor the occurrence of a specific function word).

It came out that Condition 1 indeed reduced the number of stuttering blocks, but the effect was much greater when, in Condition 2, the participants attentively monitored the lexical aspect of their speech. The authors took these results as confirming their hypothesis.

However, there is an alternative interpretation of the results. In Condition 1, the acoustic signals associated with each hit and goal in Pong (see video) may have acted as an auditory stimulus that drew the participants’ attention to the auditory channel and, with that, to the auditory feedback of their speech as well. This is likely because even a permanently presented ‘side tone’ (Dayalu et al., 2011) or click sounds evoked by self-timed syllable starts were found to reduce stuttering (Howell & El-Yaniv, 1987). The greater effect of the Condition 2 probably resulted from intensive lexical self-monitoring via auditory feedback.

 

to the top

next page

<

Footnotes

Mismatch negativity

Mismatch negativity (MMN) is a component of the event-related brain potential (ERP). It occurs ca. 100–150 ms after stimulus onset as a response to an unexpected stimulus, i.e., a stimulus deviant from an expectation previously formed. MMN indicates a basic stage of sensory processing, namely automatic scanning and change detection, and serves for an involuntary orienting of attention to unexpected changes. A mismatch negativity also occurs when an expected stimulus, for instance, a beat in a rhythm, is suddenly lacking.

The function of the brain process eliciting a MMN is to direct attention to things that do not match a certain expectation. Hence, this process is also useful for drawing attention to errors in an automated sensorimotor sequence. What we, from an observer’s perspective, call an error is an unexpected event, deviant from the expected correct execution of the motor sequence. The fact that MMN is independent of attention is essential for error detection, since errors are more likely to occur when attention is distracted.

It should now be clearer why, in the self-monitoring of speech, the automatic monitor is unable to distinguish between a real speech error and a deviation from the correct expectation because of poorly disruption. The automatic monitor in the brain doesn’t detect errors at all, but only deviations from what is expected. (return)

Emotions and stuttering

Van Riper (1979) reported the following case: One of his clients was an Italian boy who began to stutter severely after his mother had died, and his father had married another woman. After a while, Van Riper gleaned that this woman permanently bullied and abused the boy, and he persuaded the father to divorce. After the wicked stepmother was away, the boy’s stuttering disappeared after a short time of treatment.

I think, the boy was excited in his communication with the stepmother and also with his father, who, after all, had married that woman. Probably, the boy considered and planned his utterances carefully to avoid saying anything wrong, provoking the stepmother, or hurting the father. And perhaps, he anxiously observed their emotional responses while speaking. Such behavior affects the allocation of attention during speech and may have triggered the boy’s stuttering.

A theory of stuttering should define the ‘interface’ between the physiological pathomechanism, on the one hand, and psychological and environmental factors that obviously exacerbate the disorder, on the other hand. It is not enough only to claim that stress entails higher demands for speech control; some stutterers are fluent when speaking in front of a large audience, but stutter when talking to a friend. I think, the interface is the allocation of attention during speech; however, I do not assume that psychological and environmental factors alone cause persistent stuttering. A physical predisposition might be necessary. The boy reported on by Van Riper had a slightly younger brother who was bullied and beaten by the stepmother as well, but who did not develop stuttering. (return)
 

Some quotations

Fox et al. (1996) wrote:
“Left superior temporal activations, observed in the controls and attributed to self-monitoring, were virtually absent during stuttering. Deactivations during stuttering were also distinctive. Not only did left superior temporal cortex fail to activate (above), but left posterior temporal cortex (BA22) showed significant deactivations not seen in the controls.” (161) “The neural systems of stuttering have been isolated and include […] lack of normal 'self-monitoring' activations of left, anterior, superior temporal phonological circuits …” (161)

Braun et al. (1997) wrote:
“...our results suggest that when they are dysfluent, stuttering subjects may not be monitoring speech-language output effectively in the same fashion as controls. Perhaps an inability to monitor rapid, spontaneous speech output may be related, at some level, to the production of stuttered speech.” (774) “...the data suggest that, during the production of stuttered speech, there appears to be a functional dissociation between activity in post-rolandic regions, which play a role in perception and decoding of sensory (particularly auditory) information, and anterior forebrain regions, which play a role in the regulation of motor function. Anterior regions were disproportionately active in stuttering subjects while post-rolandic regions were relatively silent. The posterior regions may somehow fail to provide the integrated sensory feedback upon which the anterior regions depend for efficient coordination of speech output.” (780)

Ingham et al. (2003) wrote:
“Thus it may be, [...] that persistent stutterers show poor responsiveness to their own speech signal and probably have an impoverished capacity to monitor their own speech.” (312) (return)
 

Some quotations

Fox et al. (1996) wrote:
“Induced fluency markedly reduced the abnormalities seen in stuttering […] Deactivations of left inferior frontal and left posterior temporal cortex were eliminated, and lack of activation in left superior temporal cortex was substantially reduced.” (161)

Stager, Jeffries, and Braun (2003) wrote:
“...a much wider array of areas that appear to participate in self-monitoring of speech and voice were more active during fluency-evoking than during dysfluency-evoking conditions, in both PWS and control subjects. These regions include, in the right and left hemispheres, both anterior and posterior auditory association areas as well as core and belt areas surrounding the primary auditory cortex. These regions encompass those that are activated by voice and intelligible speech and those that are activated when subjects monitor their speech output under conditions in which auditory verbal feedback is altered.” (332) “… the direct comparison of responses in PWS and controls pinpointed a number of regions in which fluency-evoking conditions evoked a more robust response in stuttering subjects. These included the anterior MTG and anterior STG – regions that appear to be selectively activated by voice and intelligible speech – suggesting that the fluency-evoking conditions may enhance self-monitoring to a greater degree in PWS than in controls.” (333) (return)
 

Auditory processing and stuttering

When speaking a word with natural auditory feedback, the M100 latency (M100 = event-related electromagnetic brain potential at about 100 ms after stimulus onset) on the right brain hemisphere was correlated with stuttering severity (Beal et al., 2010). While speaking the vowel /a/ with natural auditory feedback, children who most severely stuttered had, on average, the smallest M50 amplitude on the left hemisphere (Beal et al. 2011); however, this correlation was not statistically significant.

Using a sound discrimination task, Jansson-Verkasalo et al. (2014) found that stuttering children, as a group, had a smaller amplitude of the mismatch negativity (MMN) than their normally fluent peers. The MMN amplitude at central scalp positions correlated positively with stuttering severity. The authors concluded that children who stutter may have difficulties receiving sufficient auditory support for speech production.

Maxfield et al. (2010; 2012) found group differences between adult stutterers and controls in the N400 response to verbal stimuli. In their 2010 paper, they conclude that adults who stutter possibly allocated attentional resources differently than controls during the task. In a dual-task experiment, Maxfield et al. (2016) found in stutterers, but not in controls, higher demands on speech planning (word selection) to be associated with reduced capacity (P3 weaker or not detectable) for auditory perception.

Liotti et al. (2010) found a significant but moderate correlation between inter-hemispheric imbalance and stuttering frequency in adult stutterers who were listening to the vowel /a/. Blood (1996) did not find a statistically significant correlation, but the participants who stuttered more severely displayed the poorest scores in three tasks of a battery of auditory perception tests. (return)
 

to the top

next page