In Hesse (2024), you find a concise and peer-reviewed version of my theory about fluency-enhancing conditions. Here, by contrast, I will cover the topic more broadly and embedded in the research history.
One of the most peculiar and intriguing features of developmental stuttering is the fact that it immediately decreases or even disappears in some specific conditions, which are commonly referred to as fluency-inducing or fluency-enhancing conditions (FCs). Well-known FCs are choral speech, speaking in time with a regular rhythm, or altered auditory feedback; however, as Bloodstein and Bernstein Ratner (2008) stated, “virtually any change that can be made in the way a person normally talks is apt to result in much improved or essentially fluent speech in the majority of stutterers” (p. 268). Furthermore, even a change in the way stutterers hear themselves speak can markedly reduce stuttering. This was demonstrated with delayed auditory feedback (DAF), frequency-altered auditory feedback (FAF), and masking the speaker’s voice with white noise (for an overview, see, e.g., Bloodstein & Bernstein Ratner, 2008).
Many studies were conducted, mainly in the last century, to find out why and how FCs reduce stuttering. An answer to this question could be a key to understanding the nature of stuttering. When we understand why stuttering disappears in those specific conditions, we will probably also better understand why it occurs in normal conditions.
First, I will recall some important theories that aimed to explain the effect of not only one but all, or at least most, FCs. Then, I will propose a new theory and, in the form of an overview, examine if this theory is applicable to all well-known FCs and can be considered a unifying explanation for the effect of FCs.
Distraction has probably been the most common account for the effect of many FCs—but distraction from what? One possibility is that FCs distract from hearing one’s own stutter or from the anticipation of stuttering, thereby preventing responses such as excitement, fear, tension, voluntary motor control, or reformulation. However, anticipation of stuttering and the responses to it are merely secondary symptoms, that is, consequences of the disorder; their reduction by distraction can hardly account for the nearly complete fluency of almost all stutterers in some FCs.
Another possibility is that FCs distract from auditory feedback, that is, from hearing one’s own voice and from the (conscious) self-monitoring of speech. Some observations seem to confirm this view. For instance, stuttering is reduced or disappears when the speaker’s voice is masked with white noise (e.g., Cherry & Sayers, 1956; Maraist & Hutton, 1957; Webster & Lubker, 1968). Furthermore, stuttering is rare in deaf people; it disappeared in adults with lifelong stuttering after hearing loss (see, e.g., Van Riper, 1982, for an overview). These facts seem to suggest that hearing one’s own speech worsens or even causes stuttering.
Some researchers have assumed that the auditory feedback of speech or its use in speech control is anyway disturbed in stutterers. So, Maraist and Hutton (1957) supposed a misevaluation of auditory feedback in the control system: it “finds error where, in reality, no error exists” (p. 385). Webster and Lubker (1968) hypothesized an aberrant middle ear muscle activity as a possible cause; however, Howell, Marchbanks, and El-Yaniv, N. (1986) found no difference between normal speakers and stutterers in the middle ear muscle activity (read more).
A similar but more specific hypothesis was proposed by Vasic and Wijnen (2005). They assume that over-sensitive monitoring in stutterers takes normal discontinuities in speech flow for errors (more about their study in Section 2.4). However, empirical findings did not support this hypothesis. Adults who stutter exhibited weaker responses to unexpected time-varying perturbations of auditory feedback (Cai et al., 2014a) and seem to have a rather poor auditory temporal resolution (Devaraju et al., 2020).
A further kind of distraction hypothesis was proposed by Max et al. (2004). They assumed that some FCs, particularly altered auditory feedback, help stutterers overcome their over-reliance on auditory feedback in controlling their speaking. However, empirical findings do not support the idea that auditory feedback plays too big a role in stutterers’ speech control. Adults who stutter exhibited weaker and/or delayed compensatory responses to unexpected perturbations of auditory feedback as compared with normally fluent controls (Bauer et al., 2007; Cai et al., 2012, 2014a; Daliri et al., 2018; Loucks, Chon, & Han, 2012; Nudelman et al., 1992; Tourville, Cai, & Guenther, 2013).
Furthermore, neuroimaging studies have shown that secondary auditory areas are mostly underactivated during speech in stutterers, compared with normally fluent controls (see, e.g., meta-analyses by Brown et al., 2005, and by Budde, Baron, & Fox, 2014). These cortical areas, located mainly in the left superior and middle temporal cortex, are assumed to be responsible, among others, for the self-monitoring of speech (Indefrey, 2011; Indefrey & Levelt, 2004; McGuire, Silbersweig, & Frith, 1996; Price et al., 1996).
In other studies, reduced auditory-motor coupling was found in adults who stutter during speech (Kell et al., 2018) and in children who stutter during resting state (Chang & Zhu, 2013). Several authors therefore concluded that stutterers seem to poorly monitor their speech (e.g., Braun et al., 1997; Fox et al., 1996; Ingham et al., 2003; Kell et al., 2018).
In some functional neuroimaging studies, the effects of FCs on brain activation in stutterers were investigated (Braun et al., 1997; Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011). The secondary auditory areas were found to be greater activated in FCs compared to normal, stuttering-evoking conditions. Stager and colleagues conclude that “the hypothesis that these maneuvers [choral speech and metronome-timed speech; T.&mbs;H.] result in decreased attention to vocal output (inducing fluency in this fashion) is likely to be wrong” (p. 332), and “a common fluency-evoking mechanism might relate to more effective coupling of auditory and motor systems—that is, more efficient self-monitoring” (p. 319).
Taken together, distraction, and particularly distraction from the auditory feedback of speech, can hardly be a general explanation for the effect of FCs. Boers-Van Dijk (1973) has already refuted the distraction hypothesis with logical arguments. She referred (1) to the fact that distraction, over time, becomes weaker by adaptation, but no adaptation effect was found with shadowing and with masking noise, and (2) to the fact that choral speech and metronome-paced speech require attention to one’s own speech to monitor the synchronicity with co-speaker(s) or metronome (read more).
The facts about the role of auditory feedback in stuttering appear conflicting. On the one hand, reduced auditory feedback (due to masking noise, whispering, or deafness) seems to reduce stuttering, which suggests that the influence of auditory feedback is adverse or too great in stutterers. On the other hand, neuroimaging and behavioral data suggest that the influence of auditory feedback is too little in stutterers in normal speaking conditions and enhanced or normalized in at least some FCs.
A further explanation for the effect of FCs was inspired by the observation that speaking in FCs is often associated with an increase in phonation and a decrease in rate. This, Wingate (1969b, 1970) argued, could be responsible for the amelioration of fluency. However, the effect of choral speech is neither attributable to an increase in loudness nor to a decrease in speech rate (Adams & Ramig, 1980; Andrews et al., 1982; Ingham & Packman, 1979). Delayed auditory feedback (DAF) typically leads to a slowed speech rate, but it reduces stuttering to nearly the same degree when speaking at a high rate (Kalinowski et al., 1993, 1996; MacLeod et al., 1995). Masking by noise usually leads to an increase in voice intensity, but its fluency-inducing effect is equal or even greater when speaking in a lower voice (Garber & Martin, 1977).
Wingate (1981) proposed that a change in prosody may enhance fluency in choral reading, rhythmic speech, shadowing, and singing, since “a monotone quality is common to all” (p. 102). However, Ingham and Carroll (1977) and Ingham and Packman (1979) found that stutterers’ fluent speech in solo and chorus reading could not always be differentiated by listeners. This suggests that the prosodic change is rather subtle, if it even exists. Taken together, different FCs cause or are associated with different changes in the manner of speaking, but there is no specific change in speaking that is common to all FCs and could be responsible for their effectiveness.
Kalinowski and Dayalu (2002) emphasized the fact that some FCs, e.g., choral speech and altered auditory feedback, induce effortless, natural-sounding fluent speech. This, they assume, is due to the presence of a sensory stimulus, a second speech signal. Based on this assumption, Kalinowski and Saltuklaroglu (2003a, 2003b) and Saltuklaroglu, Kalinowski, and Guntupalli (2004) proposed a theory according to which “all forms of stuttering inhibition primarily occur as a result of sensory stimuli” (Saltuklaroglu et al., 2004, p. 444).
Those sensory stimuli, the authors assume, can come about in two ways. Either, they can be present externally, as in chorus speech or speaking with altered auditory feedback, or speakers themselves can produce these sensory stimuli by speaking in a different manner than usual, as in whispering, adopting a foreign dialect, or singing. Such sensory auditory stimuli, the authors propose, activate the mirror neuron system, which supports fluent speech by “the matching of sensory targets via mirror neuronal activity, either to an exogenous model or to an endogenously imposed model” (Saltuklaroglu et al., 2004, p. 445). Saltuklaroglu and colleagues further claim that the effects of FCs without an external second speech signal (e.g., whispering, speaking in a foreign dialect, or singing) result from the fact that an endogenously imposed model is imitated by mirror neuronal activity.
The mirror neuron theory (Kohler et al., 2002) is still a matter of debate; it is unclear whether mirror neurons exist. However, even if they exist, the above hypothesis is not convincing. As is well known, natural-sounding fluent speech can also be induced by the presentation of white noise (see the section about auditory masking); in this FC, neither an external speech signal nor an internal model is available for the mirror neuron system.
Furthermore, the question arises: If the imitation of the internal model of a song or an unusual manner of speaking reduces stuttering—why do stutterers not imitate their undoubtedly existing internal model of normal, fluent speech? Stuttering should be easily treatable this way if the mirror neuron hypothesis were true. Obviously, this hypothesis is not convincing as a unifying account for the effect of FCs. However, I agree with Kalinowski and Dayalu’s basic assumption that the effect of FCs has to do with auditory stimulation.
A new hypothesis should take into account that there are many similarities between seemingly different FCs. For instance, frequency-altered auditory feedback (FAF) can be considered a simulation of choral speech, and delayed auditory feedback (DAF) a simulation of “inverse” shadowing (with the stutterer being the lead speaker); each alteration in the manner of speaking (e.g., altered voice, foreign dialect) produces an unfamiliar auditory feedback; no external auditory feedback is available with complete auditory masking, during silent mouthing, and during inner speech. The manifold relations between FCs and the claim for theoretical parsimony suggest a unifying explanation for the effect of FCs.
Just as does the distraction theory, I assume that FCs modulate the allocation of the speaker’s attention. However, they do not distract attention from auditory feedback, but draw it to the auditory feedback of speech. This improves the processing of auditory feedback and its integration in speech control, which reduces stuttering. The hypothesis is shown as a causal chain in the Figure below.
Figure 12: The way fluency-enahncing conditions work, as a causal chain.
FCs in which external auditory feedback is available draw the speaker’s attention to the external auditory feedback. When external auditory feedback is not available, one’s own speech is ‘heard’ internally (internal auditory feedback). Importantly, internal auditory feedback is available only with sufficient attention; it requires listening to the “voice inside our head”. This ensures proper processing and integration of internal auditory feedback, which explains why most stutterers are totally fluent in these FCs.
FCs in which external auditory feedback is available can be subdivided into those that require active listening to it and those that merely attract the speaker’s attention by altered, odd-sounding auditory feedback or by an additional, unfamiliar acoustic stimulus.
Figure 13: Types of fluency-enhancing conditions.
The figure gives an overview of well-known FCs and the way they work. There is some overlap between the FCs included in the figure; for instance, whispering is also an altered manner of speaking and generates an altered auditory feedback. In singing and prolonged speech, audio-phonatory coupling contributes to their effects (read more). Self-talk has not been included in the figure; it naturally implies listening to oneself. For the first time, inner speech (silent reading and verbal thinking) is included, as we assume that the reason stuttering does not occur in inner speech, silent mouthing, and under complete auditory masking might be the same.
I claim that the proposed explanation is applicable to all FCs. This may appear doubtful, given the diversity of FCs. On the following pages, the explanation is therefore evaluated for each type of FC individually.
The belief that stuttering would be caused or triggered by hearing one’s own speech has a long history. In the 19th century, researchers assumed that hearing one’s stutter would excite and alarm the speaker’s mind, and that would increase and reinforce the stuttering. For example, the German medicine L. Sandow, who stuttered himself, called the disorder “sensory echo stuttering” and recommended: “Either plug your ears with cotton wool, or speak lower! In both cases, the acoustic irritant will become weaker, and you will immediately pull the carpet from under that despicable spluttering.” (Sandow,1898, p. 67).
And more than seventy years later, Van Riper wrote: “Our position is that some of the stutterer’s difficulties seem to originate in the auditory processing systems. We feel that if we can get him to concentrate upon proprioceptive feedback we can bypass these difficulties. Accordingly, we use masking noise, DAF, and other methods for facilitating motor control through proprioception. We want the stutterer to stop listening to the gaps and abnormalities in his speech when they occur and when he expects them.” (Van Riper, 1973, p. 211)
In the late 1950s, the idea came up that auditory feedback could be distorted permanently in stutterers, e.g., due to inter-aural phase disparity or interference between air-conducted and bone-conducted auditory feedback (Stromsta, 1959; 1972; Webster & Lubker, 1968). Apart from the lack of empirical evidence, the problem with such simple physical accounts is that they cannot explain the variability of stuttering, that is, the impact of situation, environment, emotions, or the anticipation of stuttering on certain word-initial sounds.
(return)
Margriet Boers-Van Dijk wrote about choral speech:
“… to speak in unison with others, a constant adaptation to the speech manners of the co-speakers is demanded. The stutterer will have to wait when the others wait, and to continue the sentence together with them. He will have to adjust his speech rate to that of the whole chorus. Accordingly, he will have to concentrate on his own speech in order to synchronize his words with those uttered by the chorus. It is impossible to think that he would be distracted from his own speech when on the contrary he has to pay very special attention to it.” (Boers-Van Dijk (1973), p. 4)
(return)