For a long time, some observations have been an obstacle in understanding the relationship between stuttering and auditory feedback: (1) Stuttering does not occur during inner speech (verbal thinking) and during speaking in silent mouth movements (mouthing); (2) Stuttering is very rare in deaf people; it disappeared in persistent stutterers when they lost hearing (see, e.g., Van Riper, 1982); (3) Stuttering also mostly disappears or is markedly reduced by total auditory masking (by means of loud noise through headphones, see, e.g., Maraist & Hutton, 1957; Webster & Lubker, 1968); (4) tuttering often disappears or is reduced when one’s own voice is difficult to hear, e.g., in whispering, with plugged ears, or with partial masking by noise (see altogether Bloodstein & Bernstein Ratner, 2008, for an overview).
These facts seem to suggest that the involvement of auditory feedback in speech control increases stuttering, and that the suppression of auditory feedback, e.g., by distraction from hearing one’s own speech, reduces stuttering (read more). That, however, would be diametrically opposed to the present theory. To solve this problem, we must scrutinize the term ‘auditory feedback’: In the Perceptual Loop Theory , Levelt (1995) assumed two feedback loops, an external and an internal loop (see figure). Only one of them, namely the external loop, i.e., hearing one’s own speech by the ears, is usually referred to as auditory feedback. But what about the internal loop?
Via the internal loop, we ‘hear’ our own speech internally, e.g., during inner speech (thinking in words). Inner speech has widely the same features as overt speech, including syllable rhythm, linguistic stress, intonation, and vocal sound: You can ‘speak’your thoughts internally in a higher or lower pitch, or even in a disguised voice (McGuire et al., 1996; see also the footnote in Section 1.3). In other words: Inner speech is perceived in the auditory modality – even more than one’s own overt speech, in which tactile feedback may play a certain role in the self-perception of consonants. Additionally, the perception of inner speech seems to be a function of the same auditory association areas that are responsible for the comprehension of overt speech (Brumberg et al., 2016; Kell et al., 2017a; Martin et al., 2014; McGuire et al., 1996; Palmer et al., 2001; Shergill et al., 2002; Tian & Poeppel, 2012; Tian, Zarate, & Poeppel, 2016). Therefore, I refer to the self-perception of inner speech as an internal auditory feedback.
As already explained in Section 1.3, I do not assume that both feedback loops work concurrently at the same time. Instead, I assume that they work alternatingly: The internal loop is active only when the external loop is interrupted or non-existing, which is the case in verbal thinking (inner speech), in speaking in silent mouth movements with held breath (mouthing), under total auditory masking, or with deafness. In these conditions, stuttering usually disappears, and the cause might be that (i) self-produced speech is heard internally, via the internal feedback, and (ii) the internal feedback loop seems to be more stable, thus it cannot get disrupted by a distraction of attention. But why is the internal loop more stable? The cause may be that overt and inner speech are depending on self-perception in different ways:
In overt speech, production is widely independent of self-perception: We can speak overtly without perceiving our speech output – at worst, we talk nonsense without becoming aware of it, as with patients suffering from a Wernicke’s aphasia or with individuals talking in their sleep. In inner speech, by contrast, production and perception are depending on each other: We cannot think without attention to our thoughts. If attention is suddenly distracted (e.g., because the phone rings), the stream of thought, i.e., the internal speech flow breaks off immediately (read more). This interdependence of production and perception makes the internal feedback loop stable as long as inner speech lasts. The stability of the internal feedback loop might be the reason why stuttering disappears, when the external feedback is interrupted, and self-monitoring shifts to the internal feedback (read more). Overt speech without external feedback is nothing than inner speech, only accompanied by (for the speaker) inaudible phonation and silent mouth movements – and during inner speech, as we know, stuttering does not occur.
Some further observations have hardly been understood until now: Stuttering is mostly reduced when auditory feedback is altered – delayed (DAF) or frequency-shifted (FAF) – by means of electronic devices. The effect is sometimes explained in a similar way as the effect of masking: It is premised that the involvement of auditory feedback in speech control anyway causes stuttering. Because of the delay and/or the alteration of frequency, so is assumed, auditory feedback becomes useless and is no longer involved in speech control; the speaker pays more attention to proprioception (see, e.g., quotation Van Riper). But I think just the contrary is true.
Foundas et al. (2013) examined the effect of the SpeechEasy, a small in-the-ear device that reduces stuttering by altering auditory feedback (DAF and/or FAF). It was found that the amount of enhanced fluency with altered auditory feedback was not much greater than in a control condition without feedback alteration (auditory feedback was only slightly amplified by the device in this condition). The authors conclude that “altered auditory feedback per se may not be solely responsible for some of the observed treatment effects.” (p. 148). Similar results were obtained by Unger, Glück, and Cholewa (2012), who examined the effect of two other AAF devices (see also Unger, 2012).
But what may be responsible then? I think hearing one’s own speech in an unfamiliar manner, e.g., with the SpeechEasy device in the ear and with a slight amplification, already suffices to draw the speaker’s attention more to the auditory channel. DAF or FAF may only increase this effect because these alterations increase the unfamiliar and irritating effect. – By the way: Another account for that stuttering is reduced by DAF was that speaking is slowed down. In fact, speaking is slowed by DAF, but Kalinowski et al. (1993) and MacLeod at al. (1995) found slowed speech not to be crucial for the fluency-enhancing effect of DAF.
Therefore, I think that a short delay of about 60ms (that is the default setting of the SpeechEasy device) or an altered frequency (i.e., a higher or lower pitch) of auditory feedback do not make the feedback useless – in contrast to a DAF of more than 200ms that triggers disfluencies typical for the Lee Effect also in stutterers. But a short DAF just above the threshold of perception and/or a FAF draw the speaker’s attention to the auditory channel because it sounds odd, and more attention to the auditory channel improves feedback processing (see Section 2.3). Other methods that reduce stuttering by way of giving speech an unfamiliar sound – speaking in a disguised voice or in an unfamiliar dialect – might be effective for just the same reason. I agree with Bloodstein and Bernstein Ratner (2008) who have assumed “that any changes in stutterer's accustomed way of hearing themselves speak is likely to alleviate their speech difficulty.” (p. 301) But what causes the alleviation? The unfamiliar auditory feedback draws the speaker’s attention to the auditory channel.
A very effective fluency-enhancing method is paced speech, e.g., chorus reading, or speaking synchronously to the rhythm of a metronome (Johnson & Rosen, 1937). Two causes have been assumed to be responsible for the effect: (1) The speaker gets clues for syllable starts, which is helpful because stutterers are poorly able to generate their own speech rhythm. (2) The speaker must listen to the metronome or to the co-speakers whereby attention is distracted from the auditory feedback of his/her own speech. I think, both is wrong.
Paced speech does not function in the way that the speaker each time reacts to the pace signal. If you, at each time, wait until you hear the beat of the metronome, or until you hear the co-speakers starting a syllable, and only then say the syllable yourself, you will never be synchronous but always too late. Instead, you must capture the given pace so that you can predict and anticipate it. Then you must adjust your own pace to the given pace and continuously monitor whether you are still in sync with the metronome or the co-readers. That means, you must attentively listen to both, the given pace and your own speech, in order to correct your pace if necessary. Therefore, I think that paced speech does not only makes the speaker listen to the given pace, but also listen to his/her own speech, and in this way, it improves the processing of auditory feedback.
This view is confirmed by an experiment conducted by Howell and El-Yaniv (1987): Adults who stutter were reading a short story (1) normally, (2) while listening to the clicks of a metronome and (3) while listening to clicks that occurred at the beginning of every syllable, triggered by the intensity of the speaker’s voice (participants were speaking in their natural rate and rhythm). The third condition reduced stuttering nearly as effective as the second condition: The mean number of disfluencies in the story was 20.25 in the normal condition, 0.6 with metronome, and 2.5 with click at syllable onset (standard deviations: 9.15, 1.35, and 2.01, resp). These results clearly show that it is not mainly the rhythm which reduces stuttering in the metronome condition.
The above considerations are consistent with empirical findings of brain imaging studies: Speaking paced by the rhythm of a metronome, speaking with FAF and DAF, and chorus reading were found to be associated with greater activations in the secondary auditory areas of the cortex (Braun et al., 1997; Fox et al., 1996; Ingham et al., 2003: Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011; see also Table 1). In a meta-analysis of seven brain-imaging studies, in which fluency-enhancing means like paced speech or chorus reading were applied (Budde, Barron, & Fox, 2014), it became apparent that none others than the secondary auditory areas were consistently greater activated during induced fluency than during stuttered speech. In the secondary auditory areas (also referred to as auditory association areas), higher stages of perceptive speech processing including self-monitoring are localized (Indefrey & Levelt, 2004; Indefrey, 2011); therefore, greater activations in those areas can hardly be explained, e.g., by the perception of metronome beats only.
Speech shadowing is a further very effective fluency-enhancing method (Cherry & Sayers,1956; Marland, 1957; MacLaren, 1960; Kelham & McHale, 1966; Kondas, 1967; Shelton, 1975): A therapist speaks words or sentences (or reads aloud a text that is not visible to the client), and the client repeats the words or phrases immediately, i.e., with the short time lag needed for recognizing a word or short phrase, thus therapist and client are speaking at the same time, but not synchronously. Shadowing requires to pay attention to the leader’s speech and, at the same time, to monitor whether one’s own speech exactly follows. Therefore, as in paced speech, we cannot assume that the speaker’s attention is distracted from auditory feedback by the task; to the contrary, the speaker must attentively monitor his/her own speech output.
In the 50s and 60s, when speech shadowing was applied in the treatment of stuttering, its effect was misinterpreted in the framework of the distraction hypothesis and/or in the framework of the learning theory. In the latter case, stuttering was regarded as a learned incorrect pattern of speaking, and the fluency during shadowing was assumed to be caused by the exact imitation of a correct pattern. That’s nonsense (read more). Fluent speech during shadowing might mainly be the result of an improved processing of auditory feedback because shadowing requires attention to the auditory channel.
As mentioned above, stuttering is also often reduced when one’s own speech is difficult to hear, e.g., in whispering, with blocked ears, or with partial masking (by noise in a volume so that one’s own voice is still audible). To these conditions, the same might be applicable as to the effect of DAF and FAF: The speaker’s attention is drawn to the auditory channel. What do we usually do when the speech of someone else is difficult to hear because of a low voice or a noisy environment? We automatically listen more attentively. I think, the same happens, when one’s own speech is difficult to hear: Attention is automatically drawn to the auditory channel with the consequence that the processing of auditory feedback is improved. That is coherent with Christoffels et al. (2011) who found greater activations in the secondary auditory areas of the cortex during speech when the quality of auditory feedback was reduced by noise masking, and also in this case, greater activations in these cortical areas can hardly be explained simply by the perception of the noise.
As we know from experience, poor audibility is no obstacle to speech comprehension, at least as far as enhanced attention can compensate for the deficit. Accordingly, I think that poor audibility is no obstacle to the functioning of the external feedback loop, as far as attention can compensate for the deficit. Only if the external feedback is unusable for monitoring (completely masked, considerably distorted or delayed), then attention automatically shifts to the internal auditory feedback, i.e., the internal loop gets activated. The latter is consistent with Christoffels, Formisano, and Schiller (2007), who found no lower activation in the left superior temporal gyrus (that roughly corresponds to Wernicke’s area) during speech with complete masking, compared to the control condition with normal auditory feedback.
So, by taking into account the role of attention and the nature of inner speech, we can provide an uniform explanation for the prima facie contradictory observations concerning fluency-enhancing conditions: Stuttering is markedly reduced or disappears when the processing of auditory feedback is improved, which can happen in two ways: Either the speaker’s attention is drawn to the external auditory feedback, or speech monitoring shifts to the internal feedback loop because external feedback is interrupted or unusable.
Figure 10: Fluency-enhancing conditions. * Singing seems to be a special issue: You must listen to your own voice to meet the melody, and additionally, lyrics are committed to memory and reeled off as one speaking program (see Section 1.2). Rhythm may be a further fluency-enhancing factor.
Finally, the astonishing results obtained by Webster and Dorman (1970) become understandable. They found that stuttering was reduced to approximately the same amount (i) by continuous auditory masking, (ii) by masking only during phonation, and (iii) by masking only during breaks in phonation. In the first two conditions, the external feedback loop was interrupted during speech, and the internal loop was active. In the third condition, the unfamiliar noise during the breaks had the same effect as other alterations of auditory feedback: It drew the participants’ attention to the auditory channel.
The belief stuttering would be caused or triggered by hearing one’s own speech has a long history. In the 19th century, researchers assumed that the speaker’s mind would excited and alarmed by hearing his own stutter, and in this way, stuttering would increase or become reinforced. For example, the German medicine L. Sandow, who stuttered himself, referred to the disorder as “sensory echoic stuttering” and recommended: “Either, plug your ears with cotton wool, or speak lower! In both cases, the acoustic irritant will be weaker, and you will immediately pull the carpet from under those despicable spluttering.” (Sandow, 1898, p. 67).
And more than seventy years later, Van Riper wrote: “Our position is that some of the stutterer’s difficulties seem to originate in the auditory processing systems. We feel that if we can get him to concentrate upon proprioceptive feedback we can bypass these difficulties. Accordingly, we use masking noise, DAF, and other methods for facilitating motor control through proprioception. We want the stutterer to stop listening to the gaps and abnormalities in his speech when they occur and when he expects them.” (Van Riper, 1973, p. 211)
An explanation claiming that stuttering would result from hearing one’s own stutter is obviously circular. In the late 50s, however, the idea came up that auditory feedback could be distorted permanently in stutterers, e.g., by interaural phase disparity or by interference between air-conducted and bon-conducted feedback (Stromsta, 1959; 1972; Webster & Lubker, 1968). This issue seems to be not yet completely clarified; however, the problem with such simple physical theories is that they cannot account for the variability of stuttering, e.g., for the impact of the speech situation. More recent studies rather suggest subtle deviations in central auditory processing which seem to be closely related to the control of auditory attention (see Section 3.3 about the predisposition for stuttering).
A modern remake of the distraction hypothesis was proposed by Vasic and Wijnen (2001; 2005), who suspect an overly sensitive monitoring of normal disfluencies to be the primary cause of stuttering – see here.
We can consider verbal thinking as an internal imagination of speech, i.e., as a special kind of motor imagination. An automatized motor behavior, for example, jogging, does not require attention to this behavior (I can look at the landscape or dwell on thoughts while jogging). By contrast, jogging in my imagination is impossible without attention to this imagination – when attention is suddenly distracted, then the imagination of jogging breaks off.
The same is true for the imagination of speech: It does not work without attention to the imagination. Speaking with auditory masking, mouthing, and the speech of deaf people are accompanied by an internal imagination of how the words sound like – this internal feedback enables control: When speaking without externally hearing yourself, you must listen to the inner feedback, and that’s why stuttering disappears in these conditions.
Let’s go back ones again to the above example of jogging with distracted attention – given I’m deep in thought, I’m thinking about a difficult problem while jogging. In this condition, the risk of tripping over a stone or branch might be higher, because my brain might process the visual information of the way before my feet and the information of my balance more poorly. Now, speaking is a much more complex movement than jogging, thus it is not surprising that one stumbles if attention is not appropriately allocated.
When I claim that thinking (inner speech) is impossible without attention to the thoughts, then it is necessary to conceptually distinguish between thoughts and unconscious brain processes. Thoughts are conscious; we are always aware of our thoughts. We are responsible for our thoughts: If a foolish or an evil thought comes up in my mind, I can say to myself: That’s a bad idea, forget it. But we cannot evaluate our unconscious brain processes, and we are not responsible for them in particular – even if we, within certain limits, are responsible for our brain to be able to work well (not to drink too much alcohol, etc.). Naturally, unconscious brain processes might precede our thoughts.
My assumption that stuttering does not occur during inner speech because the internal auditory feedback loop works well is in agreement with an astonishing finding reported by Ingham (2001): In normal fluent speakers, as a group, auditory association areas (BA21/22) were activated on the right hemisphere during overt reading. In stutterers , by contrast, auditory association areas were considerably deactivated (compared to resting state!) bilaterally during overt (stuttered) reading (Fig .2, p. 504). However – and that’s astonishing – during inner speech (referred to as ‘speech imagery’ by Ingham), auditory deactivations disappeared on the left hemisphere and were reduced on the right one. Thus, auditory association areas in the stutterer group were significantly greater activated during inner speech than during overt speech.
Unfortunately, there were two limitations: First, the very small group (4 stutterers, 4 controls). Second, the study was not conducted in order to examine inner speech in stutterers, but to demonstrate that the brain activation pattern associated with stuttering does not depend on overt speech, i.e., on muscle movements – for that reason the stuttering participants were asked to imagine stuttered (!) reading. Results may be different with normal silent reading or silent verbal thinking of stutterers.
A special variant of speech shadowing is simultaneous interpreting. It is, so to say, shadowing in another language. I personally know two stutterers who are able to translate simultaneously, one of them from Russian into German, the other one between Turkish / Kurdish and German. The latter, a woman, has worked as a professional interpreter for many years. Both do stutter in normal communication, but are fluent when interpreting simultaneously. The examples show that imitation is not the way in which shadowing makes fluent.