Levelt (1995) has formulated the Main Interruption Rule as an operating principle of the self-monitoring of speech: “Stop the flow of speech immediately upon detecting trouble” (p. 478). “Trouble” is when the sound sequence heard doesn’t match the expected correct sound sequence. The automatic self-monitoring of speech based on external auditory feedback is a widely unconscious and simple process: comparing two patterns and interpreting any mismatch between them as an error.
The detection of an error causes a shift from the mode ‘speaking as planned’ into the mode ‘error repair’. The detection of an error interrupts the speech flow, and at the same time, the speaker’s attention is drawn to what he has just said. After the speaker has realized the error, he or she will accept the interruption and make a repair (read more).
However, a mismatch between the expected and the perceived phoneme sequence can also arise if the external auditory feedback is temporarily disrupted; e.g., feedback information is not completely processed, transmitted, or kept in working memory. The perceived sound sequence appears then incomplete or noisy and doesn’t match the expectation. Since the monitor only compares patterns, it is unable to distinguish between a mismatch due to a real speech error and a mismatch due to a feedback disruption. In both cases, the monitor responds equally and interrupts the speech flow (read more).
This, I think, is the cause of the sudden blockage of speech without any identifiable reason. Now, the speaker, who is unaware of any error, spontaneously tries to overcome the blockage and to continue talking. This natural and automatic behavior is causes the observable core symptoms of stuttering.
Figure 5: The detection of a real speech error (left) and stuttering due to an invalid error signal (right). The monitor (in the circle) behaves equally in both cases. By the way, this doesn’t mean that stutterers have difficulty detecting and repairing their slips of the tongue (see below).
In other words, if the monitor detects a mismatch because of a speech error, two processes are triggered: first, at the motor level, a blockage of speech, and, second, at the mental level, the recognition of the error and a shift from the ‘speaking as planned’ mode into the ‘error repair’ mode. The monitor’s response to a mismatch due to an invalid error signal, by contrast, elicits only the first process because there is no speech error to find. Despite the blockage at the motor level, the speaker remains in the ‘speaking as planned’ mode and tries to continue, and this causes the stuttering symptoms.
We must, however, consider the following problem. For the detection of a speech error, an expectation of the correct sound sequence of the word (the same goes for a familiar phrase) is needed; it is generated based on auditory feedback (see Section 1.4). But if so, how can the monitoring mechanism even work when auditory feedback is disrupted? Nothing suggests that the detection of speech errors is impaired in stutterers (read more).
The answer is that error detection works properly as long as auditory feedback is not permanently disrupted, but only at the end of a word or phrase. The feedback of the initial portion of a familiar word or phrase is sufficient to identify the word or phrase and predict its correct sound sequence (see Section 1.4).
An error signal (valid or not) occurs at the end of a word or phrase (read more); therefore, the resulting interruption of speech affects the start of the subsequent word or phrase, also because the of the reaction time (read more). In stuttering, the speaker can often still produce the initial sound(s) of a syllable or the first word of a phrase, but then, either the motor program tries to start repeatedly, but is stopped again and again at the same point, or the program gets caught at the point of blockage. In the latter case, either the sound is prolonged (if phonation is not interrupted), or speaking is completely blocked. In this way, the core symptoms of stuttering, repetition, prolongation, and silent block, come about.
What kind of symptom occurs at a certain moment may depend on three factors: the affected phoneme (stop consonants can’t be prolonged), the inclusion of breathing in the motor blockage, and the degree of muscle tension. The more tension the greater the tendency towards tonic symptoms, e.g., “M(ə)-m(ə)-monica” turns into “Mmmonica”, “P(ə)-p(ə)-peter” turns into “P – – eter” because the lips are pressed together.
Apart from whole word repetitions, speech is usually blocked on the core of a syllable, that is, on the vowel or diphthong. The cause may be that the production of a syllable is inhibited always in the same way, namely by blocking the vowel. The fact that stuttering complies with certain rules suggests that is isn’t a breakdown but a regular response of the control system (read more). – Summarizing, we get a preliminary definition of stuttering:
Summarizing, we get a preliminary definition of stuttering:
A stuttering event is the blockage of a speech-motor program by an automatic monitoring mechaism because of an invalid error signal. The invalid error signal is the response to a mismatch between the expected correct sound sequence of word or phrase and the perceived sound sequence, and the mismatch is not caused by a real speech error but by a temporary disruption of the auditory feedback.
The above definition, which includes elements not observeable, is part of the proposed stuttering theory. The common definition of stuttering (based on the observable symptoms) still applies and is taken as a premise: stuttering is, when someone, obviously against his or her will and, repeats words or parts of words, prolongs sounds, or gets stuck. In the proposed theoretical definition, the three core symptoms of stuttering—repetitions, prolongations, and silent blocks— are explained in the same way, namely, as caused by a blockage of a speech-motor program and the speaker’s spontaneous attempt to overcome the blockage. This is in accordance with Dayalu et al. (2001), who postulated “that the production of core stuttering behaviors represents an attempt by the person who stutters to overcome the involuntary block, and regain forward-flowing speech” (p. 111).
The proposed definition clearly distinguishes stuttering events from normal disfluencies caused by delays in speech planning—unintended pauses that can be filled with word repetitions or fillers, such as “hm.”. In contrast to such normal speech disfluencies, a stuttering event, as defined above, never occurs because of a delay in speech planning, but can only occur if the relevant speech-motor program is about to start or has just started. That is, the speaker knows what he or she wants to say and is about or has begun to articulate a specific sound sequence. The clear distinction between stuttering and normal speech disfluencies is supported by Jiang et al. (2012), who found different brain activity patterns for stuttering-typical and normal disfluencies; this suggests that they are caused differently (read more).
The assumption that the invalid error signals that cause stuttering are elicited at the end of a word or phrase allows us to explain why, in more than 90% of cases, words are stuttered at their onset or on their first syllable (read more). The cause is that the monitoring system needs a reaction time; therefore, the program for subsequent word or phrase is blocked at its start or in its initial part. If the program for a word has been blocked, the word is usually stuttered on the first syllable. If the program for a phrase has been blocked, sometimes the first word of the phrase, e.g., an article or a preposition, is repeated as a whole.
Sometimes, a syllable other than the first of a word is stuttered, often a stressed syllable or a syllable beginning with a sound at which the speaker anticipates difficulty. In these cases, the speaker may have disassembled the motor program for the word and started the syllables separately, perhaps in an attempt to articulate them carefully. Most young children fragment words that they find difficult into smaller units (Bloodstein, 1975). Such fragmentation may lead to stuttering on a syllable other than the first of a word. Here, not the motor program for a word, but that for a syllable, is blocked because of an error signal at the end of the preceding syllable.
The opposite of the fragmentation of speech-motor programs is the production of a long speech passage controlled by only one motor program, for instance, when speaking a memorized poem or stage role (see Section 1.2). One reason that some stutterers speak fluently in these situations is that they confide in their automatic speech control and don’t fragment the learned motor program, but ‘reel it off’ as a whole. The ‘adaptation effect’ (stuttering is reduced after a text has been read repeatedly; Johnson & Knott, 1937; Van Riper & Hull, 1955) is caused in a similar way: the more familiar the text, the fewer words are fragmented, and the more words get bound together into phrases produced by only one motor program.
For the proposed theory, only phonological (not semantic) monitoring is relevant. Phonological monitoring means evaluating whether the sounds of a word or phrase were spoken completely and in the correct order. The perceived phoneme structure is compared to an expected correct structure; this happens very quickly because it doesn’t require semantic understanding (see Sections 1.3 and 1.4).
(return)
It is important to understand that, even if a real speech error is detected, the speech is not interrupted because the speaker has noticed an error. Instead, speech is interrupted because an internal monitor, an unconscious neural network, has detected a mismatch between perception and prediction. In addition to the automatic interruption of speech, the detection of a mismatch triggers a shift in the speaker’s attention to remember what has been said and to decide what repair is necessary.
We believe that we stop talking because we have noticed an error. This might be true for semantic errors whose detection depends on understanding, but not for structural, e.g., phonological errors. We become aware of the interruption after we have noticed the error, but this may result from the shift in attention after the mismatch has been detected.
(return)
Empirical studies have shown that stutterers notice and repair their slips of the tongue as well as other people (Brocklehurst & Corley, 2011; Postma & Kolk, 1992). This was used as an argument against theories claiming that impaired auditory feedback causes stuttering. Therefore, such theories have to explain why the detection of speech errors works well in stutterers.
There are two ways of speech error detection. A conflict monitoring mechanism (independent of auditory feedback) detects and repairs errors prior to articulation (Nozari, Dell, & Schwartz, 2011). Further, speech errors can be detected after articulation by external auditory feedback. Receptive speech processing, including that of auditory feedback, branches into two streams in the brain: a dorsal stream via the superior longitudinal fasciculus and a ventral stream via fiber tracts crossing through the extreme capsule. Phonological processing runs via the dorsal, semantic processing via the ventral stream (Hickok & Poeppel, 2004).
Only the dorsal stream, but not the ventral one, is impaired in stutterers (Kronfeld-Duenias et al., 2016). Since almost all errors in one’s native language have an impact on meaning, they can be detected via the normally working ventral stream. Therefore, the fact that stutterers normally detect their speech errors doesn’t contradict the proposed theory.
One may object that even semantic processing depends on word recognition, which in turn depends on the recognition of the phoneme sequence; therefore, poor phonological processing should still impair semantic processing and error detection. However, according to the analysis-by-synthesis model, word recognition does not depend on complete phonological processing (see Section 1.4).
(return)
The analysis-by-synthesis model (see Section 1.4) describes how words are recognized by an interaction between prediction (we could also say: hypotheses) and successively incoming perceptions of sounds. With every additional sound that is perceived via the auditory feedback, the prediction is either confirmed or corrected (Poeppel & Monaban, 2011). Therefore, a mismatch between the current prediction and the perception challenges the prediction, such that it becomes unstable; the system is unsure whether the prediction is correct. This is independent of whether the mismatch is caused by a speech error or by disrupted feedback.
Only after the prediction has been re-stabilized, e.g., by taking context information into account—the system is now sure that the prediction is correct—a deviating perception can be clearly evaluated as an error. Briefly put, the system needs enough input, i.e., enough time to recognize the word and predict its correct sound sequence, before it can evaluate a deviation from that prediction as an error. Therefore, error signals, as a rule, are generated at the ends of words or phrases, and the interruption of speech affects the start of the next following word.
(return)
In an MEG study, MacGregor et al. (2012) found the earliest difference between the brain response to a word and to a similarly sounding nonword after 50–80 ms, with the participants’ attention being distracted from the stimulus. A Mismatch Negativity—the event-related brain potential indicating the attention-independent brain response to an unexpected acoustic stimulus—usually has a latency of 100–150 ms (Näätänen et al., 2007). We can thus assume such latencies for the detection of a mismatch due to a feedback disruption as wll (a word incompletely fed back may appear to the monitoring system like a similarly sounding nonword). The time for the response, the interruption of speech, must be added to get the total reaction time of the speech control system.
(return)
One can stutter “T(ə)–t(ə)–tom” (repetitions) and “T – – om”,(block), but not “Tom(ə)–m(ə)–m”, or “Tommm”; that is, the problem is the execution of the syllable, particularly of its core, the vowel. What is repeated is mostly a syllable-like sound sequence, but not a true syllable of the affected word: in the above example, [tə] is not a syllable of the stuttered word “Tom” (the vowel is transformed into a schwa). Some experts have assumed that the syllable was the typical object of stuttering (see, e.g., the Syllable Initiation Theory by Packman, Code, & Onslow, 2007). By contrast, I think that usually the motor program of a word or a phrase is blocked, even though the observable symptoms affect a syllable, mostly the first one.
But why does stuttering always affect the onset or the anterior portion, but not the end of a syllable—or, more precisely, why is the speaker apparently unable to move on to the nucleus of the syllable (Wingate, 1988)? If stuttering is caused by an interruption of speech as the regular brain response to an error signal, there should also be a regular method for the control system to execute the interruption. This method seems to be to inhibit the production of the vowel, probably because (i) almost every syllable contains a vowel, and (ii) all vowels can be inhibited in the same way. There is no difference between a, e, I, o, and u in this regard, and the neural network responsible for the inhibition needs no knowledge of the particular speech-motor program to stop.
(return)
92% of all stuttering events occurring on the initial sound of words were found by Johnson and Brown (1935), 98% by Hahn (1942), 96% by Sheeban (1974), and 97% by Taylor (1966), all in adults who stutter. In 3–4 years old children who stutter, Egland (1955) found 91% of all stuttering events on word initial position, 9% on medial position, and non on final position. Natke et al. (2004) found that, in preschool children (mean time of 9 months since the onset of stuttering). 97.8% of stuttering events occurred on the first syllable of words, 1,8% on the second syllable, and 0.4% on further syllables (monosyllabic words included). For an overview, see Bloodstein and Bernstein Ratner (2008) or St. Louis (1979).
(return)
According to the classification system proposed by Johnson et al. (1959), Ambrose and Yairi (1999) define ‘stuttering-like disfluencies’ as part-word-repetitions, single-syllable word repetitions, and disrhythmic phonation (prolongations, blocks and broken words). In contrast, interjections, revisions and multisyllabic or phrase repetitions are categorized as ‘other disfluencies’.
Most types of stuttering-like disfluencies rarely occur in the speech of normally fluent children, but there is a striking overlap regarding single-syllable word repetitions. However, significant differences were found between stuttering and non-stuttering children with that kind of disfluency. Stuttering children produce more single-syllable word repetitions with more iterations than normally fluent children, who rarely repeat a word more than once. Moreover, stuttering children repeat single-syllable words faster than normally fluent children do by producing shorter silent intervals (Ambrose & Yairi, 1995; van Ark et al. 2004; Niermann Throneburg & Yairi, 1994).
From my experience as a stutterer and from personal observation in stuttering self-help groups, I know that even two-syllabic words are sometimes repeated in stuttering (a speaker can assuredly distinguish between a stutter and a repetition to fill a pause). Repetitions of short words are well explicable in the framework of the proposed theory, namely by the existence of motor programs for familiar phrases. When the motor program for a phrase is blocked, an iteration of its first word can occur. The transition to the second word is impossible, but the speaker tries to continue, and the program starts again and again with the first word until the blockage is resolved.
(return)