Often I have pondered attempting to publish my theory of stuttering in a journal, e.g, in the Journal of Fluency Disorders, but there is a difficulty: My theory, as it is presented in the main text of this website and in the PDF, consists of two theories: (1) a causal theory of stuttering and (2) a theory about the control of normal fluent speech the stuttering theory depends on. The second theory is mainly about the role auditory feedback normally plays in speech control and self-monitoring. A theory which claims that stuttering is caused by insufficient processing of auditory feedback is plausible only if auditory feedback is important for the control of fluent speech. And just this – whether or not auditory feedback is important for the control of normal fluent speech – is still a matter of debate in psycholinguistics. Such a general debate, however, has no place in a journal especially about fluency disorders.
Here, I want to give an impression of this debate. – a debate on something that, in my view, is basic for understanding the mysterious role of auditory feedback in stuttering. In my last blog entry, I reported the paper by Lind et al. (2015), the results of which strongly suggest a crucial role of auditory feedback in the self-monitoring of normal speech. In 2017, Sieb Nooteboom and Hugo Quené published a paper where they question the results obtained by Lind and colleagues, because the results and conclusions of both studies are virtually inconsistent. I first briefly report the results and conclusions of Nooteboom and Quené (2017), and then I reason why some of their conclusions are false (do not necessarily follow from the results), such that the results of both studies are quite consistent, and the conclusions of Lind et al. (2015) might be correct.
In their article “Self-monitoring for speech errors: Two-stage detection and repair with and without auditory feedback” in the Journal of Memory and Language, 95. 19–35, Nooteboom and Quené (2017) report two experiments eliciting segmental speech errors and self-repairs. Error frequencies, detection frequencies, error-to-cutoff times and cutoff-to-repair times were assessed with and without auditory feedback, for errors against four types of segmental oppositions. One of their hypotheses was that prearticulatory and postarticulatory detection of errors is reflected in a bimodal distribution of error-to-cutoff times. The results showed the expected bimodal distribution of error-to-cutoff times. Further, they tested the hypothesis that postarticulatory error detection depends on auditory feedback (see below for details), and they found that it is not. The authors conclude that error detection does not depend on audition, but on prearticulatory monitoring.
I think this conclusion is not correct, and the results of Nooteboom and Quené’s experiments are quite consistent with the results obtained by Lind et al. (2015) and their conclusions. As far as I see, there are two main problems regarding the consistency:(1) If all errors in overt speech are detected on the basis of auditory feedback – the proprioception of articulatory movements may be supportive – then there is (virtually) no reason to expect Nooteboom and Quené’s finding of a bimodal distribution in the error-to-cutoff times.
(2) The position that the self-monitoring of overt speech strongly depends on auditory feedback is (virtually) inconsistent with Nooteboom and Quené’s finding that the detection of errors against the voiced/voiceless opposition and of errors against vowels was not impaired by masking noise.
Regarding the first problem, the bimodal distribution in the error-to-cutoff times might reflect two stages of processing, but not necessarily ‘internal’ and ‘external’. From event-related brain potentials we know that error-detection on the basis of semantic processing takes more time than error detection on the basis of non-semantic, i.e., formal or structural processing. Semantic errors elicit a response after ca. 400ms in a listener’s brain (P400), syntax errors (phrase structure violations), for example, elicit a response already after ca. 120ms (Friederici, 1999).
I think we can generally assume that error detection doing without semantic processing takes less time than with, and we can further assume that the detection of a ‘structural’ error results in an automatic, immediate interruption, whereas, when a semantic error is detected, the speaker voluntarily decides to interrupt himself to make a repair.
The errors evoked by Nooteboom and Quené’s experiment design – confusions of phonemes – could easily be detected without semantic processing – the correct target word pair was presented long enough, and participants might often have expected to confuse the initial consonants or vowels, respectively, of the target word pair. Thus most of the errors were detected already on a non-semantic stage of processing, and only in less cases the higher stage of semantic processing resulted in error detection and self-interruption. This explains the bimodal distribution in the error-to-cutoff times, even if all errors were detected on the basis of auditory feedback.
Regarding the second problem: The internal feedback loop in Levelt’s model as well provides a feedback in the auditory sensory modality. When I’m speaking internally, e.g, in verbal thinking, I hear my voice internally, I can speak in a higher or lower pitch or even in the voice of someone else, my internal speech includes stress and prosody (see also the Sections 1.3 and 3.1 in the main text).
It is therefore not surprising that all kinds of errors were detected via internal ‘auditory’ feedback (with masking noise) as well as via external auditory feedback in silence. The question is, whether the internal loop is active also when we hear ourselves speaking externally. I think it is not – otherwise we would hear ourselves twice, as the two feedback loops might have different delays (Lackner & Tuller, 1979). This however, means that there is no pre-articulatory self-monitoring in spontaneous speech.
As far as I know, two arguments were alleged for a pre-articulatory monitoring: the ‘lexical bias’ in speech errors and the spontaneous suppression of ‘taboo words’.
The lexical bias can easily be explained in the framework of the model proposed by Engelkamp and Rummer (1999) which they derived from results of aphasia research: The articulatory movement sequence of each familiar word is represented in the brain by a speech motor program, but there are no such programs for nonwords; thus the production of a nonword requires the fragmentation and/or combination of motor programs, which might be more effortful for the brain and therefore less likely than the confusion of the complete motor programs of two words.
The suppression of taboo words can be explained in the same way as the spontaneous suppression of other socially inappropriate behaviors: A mechanism of fear saves us from doing dangerous or awkward things without requiring incessant self-monitoring. The ‘somatic markers’ posited by Antonio Damasio (1994) may be such a mechanism: Mental representations of concepts are linked to feelings, i.e., to mental representations of a somatic states. When a concept is activated in the brain, the linked feeling is co-activated at the same time, providing an emotional estimation of the concept, which helps us to control our behavior. I don’t say the taboo word because the word itself is associated with the anticipation of embarrassment.
So I think there is no reason for further adhering to the idea of pre-articulatory monitoring in spontaneous speech. By contrast, auditory feedback is the basis of self-comprehension and, with that, of lexical self-monitoring. Since the comprehension of self-produced words is depending on the comprehension of self-produced phonemes, we can assume that phonological self-monitoring as well works on the basis of auditory feedback. And this is a basic assumption of my theory of stuttering.
to the top
When I started pondering on the cause of stuttering in 2011, I soon realized that the disorder can hardly be explained in the framework of the standard model of speech production and self-monitoring (see here in the main text). Thus I developed a simple alternative model without pre-articulatory monitoring in spontaneous speech, but with an important role of auditory feedback. (Section 1.4 in the main text).
In November last year, I found the paper by Andreas Lind and colleagues (2015) in the web, and I was happy when I read it. The study provides evidence for an important part of my model: the assumption that (i) auditory feedback is crucial for the self-monitoring of speech, and (ii) that the idea of a pre-articulatory monitoring in spontaneous speech is superfluous.
The paper has a long titled: “Auditory feedback is used for self-comprehension: When we hear ourselves saying something other than what we said, we believe we said what we hear.” The authors covertly manipulated their (normal fluent) participants’ auditory feedback in real time so that they said one thing but heard themselves saying something else. In 85% of all cases in which the exchange went undetected, the inserted words were experienced as self-produced.
So the authors demonstrate how much normal fluent speakers rely on auditory feedback. The results suggest: When we are speaking, we indeed have an idea of the message we are going to tell, but it is the auditory feedback which informs us about what we actually have exactly said.
The authors write that their real-time speech-exchange method could be used to study cases in which aberrant feedback processing has been implicated, such as in stuttering. So the question arises: Can their method be useful to test my theory of stuttering?
The core idea of my theory is that invalid error signals occur in the monitoring system because auditory feedback (in some cases also the proprioception of breathing) is insufficiently processed by the brain. The cause of the insufficient processing is a misallocation of attention during speech, that is, a misallocation of perceptual and processing capacity.
Of course, it would be great to test whether stuttering-like disfluencies, or at least involuntary interruptions of speech flow, can be caused in normal fluent speakers by transient disruption of auditory feedback. For example, the end of a word, or a short unstressed function word (best preceding the main content word of an utterance) could be distorted or replaced by noise. Importantly, participants’ attention should be distracted from auditory feedback such that they not become aware of the manipulation.
However, I’m skeptical if stuttering would be elicited in such an experiment. There is a difference between a distortion outside and inside the brain. In the experiment, the participants would hear their own speech with disruptions. In the mechanism of stuttering assumed in my theory, the speaker hears his or her own speech without disruption – disruptions occur in the brain, on the way from the inner ear to the network responsible for the self-monitoring of speech. That’s a difference. In the experiment, the participants’ attention would drawn to the auditory channel just because of the manipulation, that is, because their own speech sounds anyway odd to them...
to the top