The effect of rhythmic speech on stuttering has been known for a long time. At least since the middle of the eighteenth century, exercises involving rhythm have been advocated as a means of recovery from stuttering (Wingate, 1969a). Stuttering is mostly eliminated or greatly reduced when stutterers time their speech to a rhythmic beat, such as hand clapping or the clicking of a metronome. Van Dantzig (1940) showed that stutterers immediately became fluent when they timed each syllable with the tipping of a finger. The effect of metronome pacing was confirmed experimentally in many studies (e.g., Johnson & Rosen, 1937; Martin & Haroldson, 1979). Syllables as well as words were timed with the metronome, with striking results.
Several hypotheses concerning the rhythm effect were tested. Barber (1940), Fransella and Beech (1965), and Hanna and Morris (1977) investigated the impact of rate. In all cases, stuttering was significantly reduced in rhythmic speech, even at rates faster than usual. This suggests that a reduced rate is not the crucial cause of the rhythm effect. Stager, Jeffries, and Braun (2003) compared stutterers’ voicing duration during metronome-timed speech with that in their natural speech and found no difference. Fransella (1967) tested the rhythm effect in a dual-task condition; from her data. she concluded that the effect of rhythm could not be viewed merely as a distraction. Brady (1969) found that neither rate nor distraction and not even rhythmicity caused the reduction of stuttering in metronome-paced speech. He suggested that the beat had a ‘cue’ function signaling when the next syllable or word is to be spoken.
The cue hypothesis appears tempting, but there are some problems. First, the idea behind the cue hypothesis is that stutterers need such cues because they are unable to produce a proper speech rhythm by themselves. But most stutterers are quite able to produce an even rhythm when clapping the hands or singing. More importantly, stutterers produce a normal speech rhythm when they are fluent in other FCs, e.g., with altered auditory feedback or masking noise. Second, a speaker who each time waits for the click to start a syllable is never synchronous with the metronome but always too late because of the reaction time. Instead, you must capture the metronome rhythm, produce this rhythm by yourself, and continuously monitor whether your rhythm is still in sync with the metronome.
I propose that speaking in time with a rhythm entails listening to both, the metronome (or the clapping of the hands) and the speech rhythm. The speaker’s attention is thus not only drawn to the presented rhythm, but also to the auditory feedback of speech. This improves the processing and integration of auditory feedback and reduces stuttering.
This account is consistent with the results of an experiment conducted by Howell and El-Yaniv (1987). Stutterers read a short story in three conditions, namely (1) without metronome (control condition), (2) syllable by syllable in time with the rhythmic clicking of a metronome, and (3) while listening to clicks that sounded at the beginning of each syllable, triggered by the intensity of the speaker’s voice. The third condition, in which the participants were speaking in their natural rhythm reduced stuttering nearly as effectively as the second condition. This indicates that not the evenness of the presented rhythm reduces stuttering in the metronome condition, but rather the fact that the speaker’s attention is drawn to auditory feedback (in Ccondition 3 due to the unfamiliar “echo” clicks after each syllable start).
The effect of metronome-timed speech on brain activation in stutterers was investigated in some neuroimaging studies (Braun et al., 1997; Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011). The findings show greater, that is, normalized, activation in auditory association areas in the superior temporal cortex. This was not exclusively due to the metronome signal, since normally fluent controls showed no similar increase in auditory activation during metronome-timed speech. The neuroimaging results support the view that, as Stager, Jeffries, and Braun (2003) concluded, metronome-timed speech reduces stuttering “by enabling more efficient use of auditory information” (p. 333).
It has been known for a long time that most stutterers have no difficulty speaking in unison, for instance, when reciting the Lord’s prayer in church. The fluency-inducing effect of reading in chorus was confirmed experimentally by Johnson and Rosen (1937). Choral speech is one of the most effective FCs: stuttering is usually reduced by 90–100% (see, e.g., Kiefte & Armson, 2008, for an overview).
Several causes were supposed to be responsible for the fluency of stutterers during choral speech. Some of them, such as distraction, engagement of mirror neurons, or a change of prosody, were already discussed in Section 3.1. The effect of choral reading was further attributed to reduced rate, increased loudness, or more continuous vocalization (e.g., Wingate, 1969b). However, Adams and Ramig (1980) found no increase in loudness in choral reading, and they observed that stutterers even shortened vowel duration. Ingham and Packman (1979) and Andrews et al. (1982) concluded from their studies that the effect of choral reading cannot be attributed to reduced rate.
Pattie and Knight (1944) proposed a cue hypothesis, with the second reader acting as a ‘pacemaker’. However, Cummings (2003) points to the fact that live choral speech is a collaborative activity in which both speakers modify the timing of their speech about equally. And, just as in metronome-timed speech, if you each time wait until you hear the co-speaker starting a syllable, you will never be in sync with the co-speaker, but always too late because of your reaction time.
A hypothesis similar, in some respect, to the cue or pacemaker hypothesis was proposed by Hickok, Houde, and Rong (2011). They assume that stuttering occurs when the sensory system receives inaccurate predictions from an internal feedback mechanism and that, in choral speech, “the sensory system (which is coding the inaccurate prediction) is bombarded with external acoustic input that matches the sensory target and thus washes out and overrides the inaccurate prediction allowing for fluent speech” (p. 417). In other words, stutterers are fluent during choral speech because they hear the words they are speaking at the same time from their co-speaker(s).
This hypothesis, just as a cue or pacemaker hypothesis, could be true only for “classical” choral speech, in which the stutterer and the co-speaker(s) in unison speak the same words. However, a nearly similar reduction in stuttering was observed when the co-speaker read different material than that read by the stutterer (Barber, 1939; Bloodstein, 1950; Cherry & Sayers, 1956) or when the co-speaker changed without warning to gibberish (Cherry & Sayers, 1956). Enhanced fluency was even observed when tape-recorded speech played backward was presented (Cherry & Sayers, 1956; Rami & Diederich, 2005) or a continuous audio signal like /a/ (Dayalu et al., 2011). Therefore, the hypothesis by Hickok, Houde, and Rong (2011) is not convincing.
I assume that, in “classical” choral speech, stutterers must listen to both, the co-speaker(s) and their own speech, to monitor the synchronicity. That is, attention to auditory feedback is task-relevant. This explains the especially powerful effect of choral speech. Task-irrelevant auditory stimuli, such as speech played backward, gibberish, or a continuous “side tone,” can draw the speaker’s attention to the auditory channel (because they sound unfamiliar) and to the auditory feedback of speech (which is mixed with the additional auditory stimulus), but there is no need for active listening. It is therefore not surprising that the fluency-enhancing effect of task-irrelevant auditory stimuli is weaker and individually more different.
For instance, Rami and Diederich (2005), different from Cherry and Sayers (1956), found no statistically significant effect on stuttering (but a large effect size) when speech played backward was presented. Rami et al. (2005) found that choral reading did not reduce stuttering when the co-speaker’s voice signal was lowpass filtered so that it sounded very deep, and words were hardly intelligible. Some auditory stimuli that are felt unpleasant or very distracting may make stutterers rather even more disregard the auditory channel.
These differences in effectiveness are consistent with the view that the reduction of stuttering by a stimulus depends on the effect the stimulus has on the speaker’s attention. The attentional response to a stimulus is influenced by subjective factors, such as attitudes, emotions, and expectations, and by the flexibility or rigidity of the individual’s attention system. This explains the weak, volatile, or lacking fluency-enhancing effect of some auditory stimuli on some stutterers.
The assumption that choral speech improves the processing of auditory feedback in stutterers is consistent with results of brain research. In an MEG study, Salmelin et al. (1998) found reduced sensitivity in the left auditory cortex in stutterers during solo reading as compared with normally fluent controls, but the left-hemispheric sensitivity was restored in the stutterers during choral reading. In neuroimaging studies, choral reading, compared to solo reading, was found to be associated with greater, that is, normalized activation in the auditory association areas in stutterers (Fox et al., 1996; Stager, Jeffries, & Bruan, 2003; Toyomura, Fujii, & Kuriki, 2011; Wu et al., 1995). This increase in auditory activation was probably not caused by listening to the co-speaker alone, since normally fluent controls showed no similar increase.