Keyboard Shortcuts?

×
  • Next step
  • Previous step
  • Skip this slide
  • Previous slide
  • mShow slide thumbnails
  • nShow notes
  • hShow handout latex source
  • NShow talk notes latex source

Click here and press the right key for the next slide (or swipe left)

also ...

Press the left key to go backwards (or swipe right)

Press n to toggle whether notes are shown (or add '?notes' to the url before the #)

Press m or double tap to slide thumbnails (menu)

Press ? at any time to show the keyboard shortcuts

 

Categorical Perception of Speech

Consider this following representation of twelve sounds. Each sound differs from its neighbours by the same amount as any other sound, at least when difference is measured by frequency. Most people would not be able to discriminate two adjacent sounds ...
except for two special cases (one around -3 to -1 and one around +1 to +3) where the discrimination is easier; here people hear the sound change from da to ga or from ga to ba.
So although these sounds are, from an acoustic point of view, no more different than ...
... these sounds, the second pair of sounds are easy to discriminate.
This pattern of heightened discrimination is the defining characteristic of categorical perception as it is usually operationally defined. Small changes to stimuli can make large differences to perception, large changes to stimuli can make small differences to perception, and the stimuli can be ordered and sorted into categories such that discriminating nearby pairs of stimuli on either side of a category boundary is dramatically easier than discriminating pairs from within a category.

What determines where the category boundaries fall?

It is perhaps tempting to think that categorical perception of speech is just a matter of categorising sounds. But this is not straightforward.
For one thing, the existence of these category boundaries is specific to speech perception as opposed to auditory perception generally. When special tricks are used to make subjects perceive a stimulus first as speech and then as non-speech, the locations of boundaries differ between the two types of perception \citep[p.~20--1]{Liberman:1985bn}.
phonetic context
phonetic context
coarticulation
the location of the category boundaries changes depending on contextual factors such as the speaker’s dialect or the rate at which the speaker talks ; both factors dramatically affect which sounds are produced. This means that between two different contexts, different stimuli may result in the same perceptions and the same stimulus may result in different perceptions.
So which features of the stimuli best predict category membership?
Liberman and Mattingly argue that, in the case of speech, category boundaries typically correspond to differences between intended phonic gestures. The existence of category boundaries and their correspondence to intended phonic gestures needs explaining.
Following Liberman and Mattingly, we can explain this by postulating a module for speech perception. Anything which is potentially speech (including both auditory and visual stimuli) is passed to the module which attempts to interpret it as speech. It does this by attempting to replicate stimuli by issuing the same gestures that are also used for producing speech (this is the ‘motor’ in ‘motor theory’). Where a replication is possible, the stimuli are perceived as speech, further auditory or visual processing is partially suppressed, and the module identifies the stimuli as composed of the gestures that were used in the successful replication. Accordingly we can say that the stimuli are perceived as a sequence of phonic gestures.
One line of response to this argument involves attempting to show that the category boundaries correspond to some acoustic property of speech at least as well as to intended phonic gestures. If such a correspondence were found, it might be possible to give a explanation better than Liberman and Mattingly’s for the existence of categories corresponding to intended phonic gestures. Reasons for doubting any such explanation exists include the constancy effects already mentioned and also coarticulation, the fact that phonic gestures overlap (this is what makes talking fast).
In outline Liberman and Mattingly’s argument for the claim that the objects of speech perception are intended phonic gestures has this form:

(1) There are category boundaries … .

(2) … which correspond to phonic gestures.

What is a phonic gesture?
In speaking we produce an overlapping sequence of articulatory gestures, which are motor actions involving coordinated movements of the lips, tongue, velum and larynx. These gestures are the units in terms of which we plan utterances (Browman and Goldstein 1992; Goldstein, Fowler, et al. 2003).

(3) Facts (1) and (2) stand in need of explanation.

(4) The best explanation of (1) and (2) involves the claim that the objects of speech perception are phonic gestures.

This illustrates how we might establish claims about the objects of perception
But why accept that the best explanation of (1) and (2) involves this claim? Part of the reason concerns relations between speech production and speech perception ...

‘word listening produces a phoneme specific activation of speech motor centres’ \citep{Fadiga:2002kl}

‘Phonemes that require in production a strong activation of tongue muscles, automatically produce, when heard, an activation of the listener's motor centres controlling tongue muscles.’ \citep{Fadiga:2002kl}

‘word listening produces a phoneme specific activation of speech motor centres’

‘Phonemes that require in production a strong activation of tongue muscles, automatically produce, when heard, an activation of the listener's motor centres controlling tongue muscles.’

Good, but this stops short of showing that the motor activations actually faciliatate speech recognition ...

Fadiga et al (2002)

D'Ausilio et al (2009, figure 1)

‘Double TMS pulses were applied just prior to stimuli presentation to selectively prime the cortical activity specifically in the lip (LipM1) or tongue (TongueM1) area’ \citep[p.~381]{dausilio:2009_motor}
‘We hypothesized that focal stimulation would facilitate the perception of the concordant phonemes ([d] and [t] with TMS to TongueM1), but that there would be inhibition of perception of the discordant items ([b] and [p] in this case). Behavioral effects were measured via reaction times (RTs) and error rates.’ \citep[p.~382]{dausilio:2009_motor}

D'Ausilio et al (2009, figure 1)

‘Effect of TMS on RTs show a double dissociation between stimulation site (TongueM1 and LipM1) and discrimination performance between class of stimuli (dental and labial). The y axis represents the amount of RT change induced by the TMS stimulation. Bars depict SEM. Asterisks indicate significance (p < 0.05) at the post-hoc (Newman-Keuls) comparison.’ \citep{dausilio:2009_motor}

(1) There are category boundaries … .

(2) … which correspond to phonic gestures.

(3) Facts (1) and (2) stand in need of explanation.

(4) The best explanation of (1) and (2) involves the claim that the objects of speech perception are phonic gestures.