BCS Course Materials

BCS 561: Speech Perception and Recognition

Spring 2006
Instructor: Richard Aslin
Wednesdays 1:15 - 3:15 PM
Meliora 418

Schedule of topics and list of readings

February 1: Historical perspective on speech perception

Discussion leader: Dick

Readings:
Liberman, A. M. (1996). Introduction: Some assumptions about speech and how they changed. In A. M. Liberman (Ed.), Speech: A special code. Cambridge, MA: MIT Press.
Liberman,, A. M., Cooper, F.. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code.  Psychological Review, 74, 431-461.
Liberman, A. M. and Whalen, D. H. (2000). On the relation of speech to language.  Trends in Cognitive Sciences, 4, 187-196.

February 8: Categorical perception and gradiency

Discussion leaders: Meghan and Dick

Readings:
Pisoni, D.B. and Tash, J. (1974) Reaction times to comparisons within and across phonetic categories. Perception & Psychophysics, 15(2), 285-290.
Miller, J.L. (1997) Internal structure of phonetic categories. Language and Cognitive Processes, 12, 865-869.
McMurray, B., Tanenhaus, M., and Aslin, R. (2002). Gradient effects of within-category phonetic variation on lexical access, Cognition, 86(2), B33-B42.
Kuhl, P. K. (1991). Human adults and human infants show a 'perceptual magnet effect' for the prototypes of speech categories, but monkeys do not.  Perception & Psychophysics, 50, 93-107 (NOTE: only Experiments 1 and 2).
Lotto, A. J., Kluender, K. R., and Holt, L. L. (1998). Depolarizing the peceptual magnet effect.  Journal of the Acoustical Society of America, 103, 3648-3655.

Optional readings:
Emmorey, K., McCullough, S. and Brentari, D. (2003). Categorical perception in American Sign Langauge. Language and Cognitive Processes, 18(1), 21-45.
Gerrits, E., and Schouten, M.E.H. (2004). Categorical perception depends on the discrimination task.  Perception & Psychophysics, 66(3), 363-376.
Harnad, S. (1987). Categorical Perception: The Groundwork of Cognition.  New York: Cambridge University Press. No pdf available
Pisoni, D.B. (1973). Auditory and phonetic memory codes in the discrimination of consonants and vowels. Perception & Psychophysics, 13(2), 253-260.

February 15: Motor and gesture theories

Discussion leaders: Michael B. and Joyce

Readings:
Fowler, C. A. and Saltzman, E. (1993). Coordination and coarticulation in speech production.  Language and Speech, 36, 171-195.
Goldstein, L. and Fowler, C. A. (2003). Articulatory phonology: A phonology for public language use. In N. O. Schiller and A. Meyer (eds) Phonetics and Phonology in Language Comprehension and Production: Differences and Similarities. (pp. 159-207) Berlin: Mouton de Gruyter.
Lindblom, B. (1996). Role of articulation in speech perception: Clues from production.  Journal of the Acoustical Society of America, 99, 1683-1692.
Remez, R. E. (1996). Critique: Auditory form and gestural topology in the perception of speech.    Journal of the Acoustical Society of America, 99, 1695-1698.
Ohala, J. J. (1996). Speech perception is hearing sounds, not tongues.  Journal of the Acoustical Society of America, 99, 1718-1725.
Fowler, C. A. (1996). Listeners do hear sounds, not tongues.  Journal of the Acoustical Society of America, 99, 1730-1741.

Optional readings:
Rizzolatti, G. and Arbib, M. A. (1998). Language within our grasp.  Trends in Neuroscience, 21, 188-194.
Fadiga, L., Craighero, L., Buccino, G., and Rizzolatti, G. (2002). Speech listening specifically smodulates the excitability of tongue muscles: a TMS study.  European Journal of Neuroscience, 15, 399-402.

February 22: Speech perception by non-humans

Discussion leaders: Alison and Natalie

Readings:
Fitch, W. T. (2000). The evolution of speech: a comparative review.  Trends in Cognitive Sciences, 4,  258-267.
Kuhl, P. K. and Miller, J. D. (1975). Speech perception by the Chinchilla: Voiced-voiceless distinction in alveolar plosive consonants.  Science, 190, 69-72.
Kuhl, P. K. (1986). Theoretical contributions of tests on animals to the special-mechanisms debate in speech.  Experimental Biology, 45, 233-265. [Skip pp. 253-265] No pdf available
Kluender, K. R., Diehl, R. L., and Killeen, P. R. (1987). Japanese quail can learn phonetic categories.  Science, 237, 1195-1197.
Lotto, A. J., Kluender, K. R., and Holt, L. L. (1997). Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica).  Journal of the Acoustical Society of America, 102, 1134-1140.
Dent, M. L., Brittan-Powell, E. F., Dooling, R. J., and Pierce, A. (1997). Perception of synthetic /ba/-/wa/ speech continuum by budgerigars (Melopsittacus undulatus).   Journal of the Acoustical Society of America, 102, 1891-1897.
Sinnott, J. M. and Gilmore, C. S. (2004). Perception of place-of-articulation information in natural speech by monkeys versus humans.  Perception and Psychophysics, 66, 1341-1350.

Optional Readings:
Trout, J. D. (2001). The biological basis of speech: What to infer from talking to the animals.  Psychological Review, 108, 523-549.
Dooling, R. J., Okanoya, K., and Brown, S. D. (1989). Speech perception by budgerigars (Melopsittacus undulatus).  Perception and Psychophysics, 46, 65-71. [No pdf available]

March 1: Speech perception by infants

Discussion leaders: Kyle and Dick

Readings:
Eimas, P. D., Siqueland, E. R., Jusczyk, P. W. and Vigorito, J. (1971). Speech perception in infants.  Science, 171, 303-306.
Werker, J. F. and Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life.  Infant Behavior and Development, 7, 49-63.
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., and Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by six months of age.  Science, 205, 606-608.
Rosenblum, L. D., Schmuckler, M. A., and Johnson, J. A. (1997). The McGurk effect in infants.  Perception & Psychophysics, 59, 347-357.
Maye, J., Werker, J. F., and Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination.  Cognition, 82, B101-111.
Kuhl, P. K., Tsao, F-M., and Liu, H-M. (2003). Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning.  Proceedings of the National Academy of Sciences, 100, 9096-9101.
Vouloumanos, A. and Werker, J. F. (2004). Tuned to the signal: the privileged status of speech for young infants.  Developmental Science, 7, 270-276.
McMurray, B. and Aslin, R. N. (2005). Infants are sensitive to within-category variation in speech perception.  Cognition, 95, B15-26.
Werker, J. F. and Yeung, H. H. (2005). Infant speech perception bootstraps word learning.  Trends in Cognitive Sciences, 9, 519-527.

Optional Readings:
Jusczyk P. W. (1997).  The discovery of spoken language. MIT Press.
Werker, J. F. and Curtin, S. (2005). PRIMIR: a developmental framework of infant speech processing.  Language Learning and Development, 1, 197-234.

March 8: Brain mechanisms of speech perception

Discussion leaders: Andrea, Kate, and Neil

Readings:
Scott, S. K. and Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception.  Trends in Neurosciences, 26, 100-107.
Price, C., Thierry, G., and Griffiths, T. (2005). Speech-specific auditory processing: where is it?  Trends in Cognitive Sciences, 9, 271-276.
Benson, R. R., Whalen, D. H., Richardson, M., Swainson, B., Clark, V. P., Lai, S. and Liberman, A. M. (2001). Parametrically dissociating speech and nonspeech perception in the brain using fMRI.  Brain and Language, 78, 364-396.
Vouloumanos, A., Kiehl, K. A., Werker, J. F. and Liddle, P. F. (2001). Detection of sounds in the auditory stream: Event-related fMRI evidence for differential activation to speech and nonspeech.  Journal of Cognitive Neuroscience, 13, 994-1005.
Liebenthal, E., Binder, J. R., Piorkowski, R. L. and Remez, R. E. (2003). Short-term reorganization of auditory analysis induced by phonetic experience.  Journal of Cognitive Neuroscience, 15, 549-558.
Dehaene-Lambertz, G., Pallier, C., Serniclaes, W., Sprenger-Charolles, L., Jobert, A. and Dehaene, S. (2005). Neural correlates of switching from auditory to speech perception.  NeuroImage, 24, 21-33.
Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T. and Medler, D. A. (2005). Neural substrates of phonemic perception.  Cerebral Cortex, 15, 1621-1631.
Blumstein, S. E., Myers, E. B. and Rissman, J. (2005). The perception of voice onset time: An fMRI investigation of phonetic category structure.  Journal of Cognitive Neuroscience, 17, 1353-1366.

Optional Readings:
Scott, S. K. and Wise, R. J. S. (2004). The functional neuroanatomy of prelexical processing in speech perception.  Cognition, 92, 13-45.
Zatorre, R. J., Belin, P. and Penhune, V. B. (2002). Structure and function of auditory cortex: mucis and speech.  Trends in Cognitive Sciences, 6, 37-46.

March 22: Cue reliability and trading relations

Discussion leaders: Meghan, Alison, and Dick

Readings:
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., and Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production.  Journal of the Acoustic Society of America, 101, 2299-2310.
Ernst, M. O. and Bulthoff, H. H. (2004). Merging the senses into a robust percept.  Trends in Cognitive Sciences, 8, 162-169.
Francis, A. L., Baldwin, K. and Nusbaum, H. C. (2000). Effects of training on attention to acoustic cues.  Perception & Psychophysics, 62, 1668-1680.  [HARDCOPY ONLY]
Guion, S. G., Flege, J. E., Akahane-Yamada, R., and Pruitt, J. C. (2000). An investigation of current models of second language speech perception: The case of Japanese adults' perception of English consonants.   Journal of the Acoustic Society of America, 107, 2711-2724.
Mayo, C. and Turk, A. (2004). Adult-child differences in acoustic cue weighting are influenced by segmental context: Children are not always perceptually biased toward transitions.  Journal of the Acoustic Society of America, 115, 3184-3194.
McCandliss, B. D., Fiez, J. A., Protopapas, A., Conway, M., and McClelland, J. L. (2002). Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception.  Cognitive, Affective, & Behavioral Neuroscience, 2, 89-108.
Repp, B. (1982). Phonetic trading relations and context effects: New experimental evidence for a speech-mode of perception.  Psychologial Bulletin, 92, 81-110.  [HARDCOPY ONLY]

Optional Readings:
Francis, A. L., & Nusbaum, H. C. (2002). Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance, 28, 349-366.
Mitterer, H., Csepe, V., Honbolygo, F., and Blomert, L. (in press). The recognition of phonologically assimilated words does not depend on specific language experience.  Cognitive Science.

March 29: Effects of indexical variables (talker, rate, dialect) on speech perception

Discussion leaders: Natalie and Daphna

Readings:
Nygaard, L. C. (2005). Perceptual integration of linguistic and nonlinguistic properties of speech. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech perception, pp. 390-413. Blackwell.
Allen, J. S., & Miller, J. L. (2004). Listener sensitivity to individual talker differences in voice-onset-time. Journal of the Acoustical Society of America, 115, 3171-3183.
McLennan, C. T. & Luce, P. A. (2005). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning, Memory & Cognition, 31, 306-321.
Clarke, C. M., & Garrett, M. (2004). Rapid adaptation to foreign accented speech. Journal of the Acoustical Society of America, 116, 3647-3658.
Kraljic, T. & Samuel, A. G. (In press). How general is perceptual learning for speech? Psychonomic Bulletin & Review.
Evans, B. B., & Iverson, P. (2004). Vowel normalization for accent: An investigation of best exemplar locations in northern and southern British English sentences. Journal of the Acoustical Society of America, 115, 352-361.

Optional Readings:
Goldinger, S. (1998). Echoes of echoes: An episodic theory of lexical access.  Psychological Review, 105, 251-279.
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355-376. [HARDCOPY ONLY]
Bradlow, A. R., & Pisoni, D. B. (1999). Recognition of spoken words by native and non-native listeners: Talker-, listener- and item-related factors. Journal of the Acoustical Society of America, 106 (4) , 2074-2085.

April 5: Perception of non-speech signals (Remez guest speaker)

Discussion leaders: Neil and Dick

Readings:
Pisoni, D. B. (1977). Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops.  Journal of the Acoustical Society of America, 61, 1352-1361. [Hardcopy]
Holt, L. L. (2005). Temporally nonadjacent sounds affect speech categorization.  Psychological Science, 16, 305-312.
Remez, R. E., Rubin, P. E., Pisoni, D. B., and Carrell, T. D. (1981). Speech perception without traditional speech cues.  Science, 212, 947-950. [Hardcopy] Remez, R. E., Pardo, J. S., Piorkowski, R. L., and Rubin, P. E. (2001). On the bistability of sinewave analogues of speech.  Psychological Science, 12, 24-29.
Remez, R. E. (2005). Perceptual organization of speech. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech perception. Blackwell (pp. 28-50). [Hardcopy]
Shannon, R. V., Zeng, F-G., Kamath, V., Wygonsky, J., and Ekelid, M. (1995). Speech recognition with primarily temporal cues.  Science, 270, 303-304. [Hardcopy]
Wade, T. and Holt, L. L. (2005). Incidental categorization of spectrally complex non-invariant auditory stimuli in a computer game task.  Journal of the Acoustical Society of America, 118, 2618-2633.

Optional Readings:
Jusczyk, P. W., Pisoni, D. B., Walley, A., and Murray, J. (1980). Discrimination of relative onset time of two-component tones by infants.   Journal of the Acoustical Society of America, 67, 262-270. [Hardcopy]
Remez, R. E., Rubin, P. E., Berns, S. M., Pardo, J. S., and Lang, J. M. (1994). On the perceptual organization of speech.  Psychological Review, 101, 129-156.
Holt, L. L., Lotto, A. J., and Diehl, R. L. (2004). Auditory discontinuities interact with categorization: Implications for speech perception.   Journal of the Acoustical Society of America, 116, 1763-1773.
Mirman, D., Holt, L. L., and McClelland, J. L. (2004). Categorization and discrimination of nonspeech sounds: Differences between steady-state and rapidly-changing acoustic cues.   Journal of the Acoustical Society of America, 116, 1198-1207.
Barker, J. and Cooke, M. (1999). Is the sine-wave speech cocktail party worth attending?  Speech Communication, 27, 159-174.
Davis, M. H., Johnsrude, I. S., Hervais-Adelman, A., Taylor, K. & McGettigan, C. (2005). Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences.  Journal of Experimental Psychology: General, 134, 222-241.
Keidel, J. L. (unpublished ms). Perceptual change induced by learning of novel auditory categories. University of Wisconsin.

April 12: Levels of analysis and processing of speech signals

Discussion leaders: Austin and Mike T.

Readings:
Lahiri, A. & Marslen-Wilson, W. (1991). The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition, 38, 245-294.
Pierrehumbert, J. (2003). Probabilistic Phonology: Discrimation and Robustness. In R. Bod, J. Hay and S. Jannedy (Eds.), Probability Theory in Linguistics. MIT Press.
Magnuson, J.S., Tanenhaus, M.K., Aslin, R.N. (under review). Which words compete? Dynamic similarity during spoken word recognition.
Salverda, A. Dahan, D, Tanenhaus, M., Crosswhite K., Masharov, M., & McDonough, J. (under review). Effects of prosodically-modulated sub-phonetic variation on lexical neighborhoods.
Vitevitch, M.S. (2002). The influence of phonological similarity neighborhoods on speech production.  Journal of Experimental Psychology: Learning, Memory and Cognition, 28, 735-747.
Ferreira, V. S. & Griffin, Z. M. (2003). Phonological influences on lexical (MIS)selection.  Psychological Science, 14, 86-90.

Optional Readings:
Wheeldon, L. & Waksler, R. (2004). Phonological underspecification and mapping mechanisms in the speech recognition lexicon.  Brain and Language, 90, 401-412.
Stevens, K. N. (2005). Features in Speech Perception and Lexical Access. In D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception. Cambridge, MA: Blackwell, pp. 125-155. [Hardcopy]

April 19: Speech perception via cochlear implants and adaptive plasticity

Discussion leaders: Neil and Kate

Readings:
Pisoni, D. B. (2005). Speech perception in deaf children with cochlear implants. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech perception. Blackwell (pp. 494-523). [Hardcopy]
Shannon, R. V., Zeng, F-G., and Wygonski, J. (1998). Speech recognition with altered spectral distribution of envelope cues.  Journal of the Acoustical Society of America, 104, 2467-2476.
Rosen, S., Faulkner, A., and Wilkinson, L. (1999). Adaptation by normal listeners to upward spectral shifts of speech: Implications for cochlear implants.  Journal of the Acoustical Society of America, 106, 3629-3636.
Eisenberg, L. S., Shannon, R. V., Martinez, A. S., Wygonski, J., and Boothroyd, A. (2000). Speech recognition with reduced spectral cues as a function of age.  Journal of the Acoustical Society of America, 107, 2704-2710.
Burkholder, R. A., Pisoni, D. B., and Svirsky, M. A. (in press). Perceptual learning and nonword repetition using a cochlear implant simulation.  JEP:HPP. [Progress Report No. 26, Indiana University, Speech Research Laboratory]

Optional reading:
Loizou, P. C. (1998). Mimicking the human ear.  IEEE Signal Processing Magazine, September, 101-130. [Hardcopy]

IN ADDITION, we did not have time last week to discuss Wade & Holt (2005) and Keidel (unpublished ms) on learning to perceive non-speech as "phonetic". Let's be sure to discuss these two articles as well.

April 26: Automatic speech recognition devices

Discussion leaders: Joyce and Michael B.

Readings:
Scharenborg, O., Norris, D., Bosch, L., and McQueen, J. M. (2005). How should a speech recognizer work?  Cognitive Science, 29, 867-918.
Zue, V. (2004). Eighty challenges facing speech input/output technologies. Paper presented at a conference "From Sound to Sense", MIT.
Wet, F., Weber, K., Boves, L., Cranen, B., Bengio, S., and Boulard, H. (2004). Evaluation of formant-like features on an automatic vowel classification task.  Journal of the Acoustical Society of America, 116, 1781-1792.

Optional Background:
Knill, K. and Young, S. (1997). Hidden markov models in speech and language processing. In S. Young & G. Bloothooft (Eds.), Corpus-based methods in language and speech processing. Kluwer (pp. 27-68). [Hardcopy only]

top