The Weekly Newsletter of MIT Linguistics

Phonology Circle 11/8 - Natalie Boll-Avetisyan

Speaker: Natalie Boll-Avetisyan (Utrecht/Potsdam)
Title: Does the lexicon bootstrap phonotactics or vice versa?
Time: Monday Nov 8, 5pm, 32-D831

Speech segmentation is a prerequisite to lexical acquisition. The distribution of phonemes in speech provides important cues to word boundaries: Co-occurrence probabilities of phonemes are generally higher within than across words. Infants rely on phonotactic cues in segmentation (e.g. Mattys, Jusczyk, Luce, & Morgan, 1999), but: where does phonotactic knowledge come from?

It has been generally assumed that phonotactics is derived from lexical knowledge (Juszcyk, Luce, & Charles-Luce, 1994). This theory has advantages: Learners would only need to acquire which sound sequences typically occur within words. When listening to continuous speech, infants merely need to attend to cues to which sequences need to be chunked, and the word boundaries would fall out themselves (e.g. Perruchet & Pacton, 2006). From an infant’s perspective, however, a paradox arises: Facilitative cues from phonotactics for segmentation could only be acquired after the onset of lexical acquisition. Avoiding this paradox, phoneme distributions in continuous speech might be proposed as an alternative source of phonotactic knowledge, which has the additional advantage of containing not only chunking information, but also information about low-probable sequences, which should be split (Adriaans & Kager, 2010).

We hypothesize that the prior source of phonotactics is continuous speech, and that infants use knowledge of both over- and underrepresentations of consonant co-occurrences as a cue for speech segmentation. We focus on infants’ knowledge of the probabilities of non-adjacent pairs of phonemes as a cue for speech segmentation. Non-adjacent dependencies are cross-linguistically common (e.g. OCP, McCarthy, 1986) and have been found to influence segmentation in infants (vowel harmony, Van Kampen Parmaksiz, van de Vijver, & Höhle, 2008). Regarding that non-adjacent dependendies are more difficult to acquire than adjacent dependencies, and learning might require additional cues, we used dependencies of identical consonants for our study. In Dutch infant-directed speech (van de Weijer, 1998), some CVC sequences with identical Cs (e.g. /pVp/) are over-represented, and others (e.g. /sVs/) are under-represented. We predict Dutch infants to chunk /pVp/, but split /sVs/ in segmentation.

This was tested in two artificial language (AL) segmentation experiments using the head-turn preference procedure. In Experiment 1, 9 and 15 month-olds were familiarized with an AL that employed 6 syllables, of which four started with /p/ (p1={pe, po}, p2={pa, pe}) and two with /t/ (t={ta, to}), each assigned to a fixed slot and concatenated into a speech stream without pauses (…p1p2tp1p2tp1p2t…). Transitional probabilities between syllables were held constant, rendering three possible segmentations: ptp, ppt, or tpp. If overrepresentation of /pVp/ is used for segmentation, ptp should be dispreferred. Unexpectedly, infants did not exhibit a looking preference for either ptp-words (e.g. /patape/) or ppt-words (e.g. /popata/).

Experiment 2 tested whether 15-month olds use the under-representation of /sVs/ as a segmentation cue in a similar AL, with /p/ replaced by /s/, and /t/ by /x/ (…s1s2xs1s2xs1s2x…). Here, infants had a novelty preference for ssx-words (e.g. /sosaxa/), suggesting that during familiarization, they had split /sVs/ and consequently heard sxs-words (e.g. /saxase/). Furthermore, there was an interaction of the experiments testing /sVs/ and /pVp/. This indicates that that the distribution of specific sequences in the input affects segmentation, rather than some innate bias for either grouping or splitting identical consonants.

The results indicate that during infancy, splitting cues might be more relevant than chunking cues in speech segmentation. This suggests that the source of phonotactic knowledge is continuous speech rather than the lexicon.

Adriaans, F. & Kager, R. (2010). Adding generalization to statistical learning: The induction of phonotactics from continuous speech. Journal of Memory and Language, 62, 311-331.
Mattys, S., Jusczyk P., Luce, P., and Morgan, J. (1999). Phonotactic and prosodic effects on word segmentation in Infants. Cognitive Psychology 38, 465-494.
Jusczyk, P.W., Luce, P.A., & Charles-Luce, J. (1994). Infants sensitivity to phonotactic patterns in the native language. Journal of Memory and Language, 33, 630-645.
McCarthy, J. (1986). OCP effects: gemination and antigemination. Linguistic Inquiry, 17, 207-263.
Perruchet, P. & Pacton, S. (2006): Implicit learning and statistical learning: one phenomenon, two approaches. Trends in Cognitive Sciences, 10(5), 233-238.
Van Kampen, A., Parmaksiz, G., van de Vijver, R. & Höhle, B. (2008). Metrical and statistical cues for word segmentation: vowel harmony and word stress as cues to word boundaries by 6- and 9-month old turkish learners. In A. Gavarró & M. J. Freitas (eds.) (2008). Language Acquisition and Development. Newcastle: Cambridge Scholars Publishing, 313-324.
Weijer, J. van de (1998). Language-Input for Word Discovery. P.h.D. Thesis. Max-Planck Series in Psycholinguistics 9.

Upcoming talks:
Nov 15: Michael Kenstowicz (MIT)
Nov 29: RUMMIT Practice talks
Dec 6: Suyeon Yun (MIT)

You can view the current, up-to-date version of the schedule here (click ‘agenda’ to see the schedule as a list), or subscribe via iCal here.