Speaker: Adam Albright (MIT)
Title: Speakers avoid saying improbable words, but not exceptional words
Time: Monday, March 2nd, 5pm – 6:30pm
Location: 32-D831

Abstract: Numerous studies over the past two decades have documented cases in which restrictions that are obeyed categorically in some languages are obeyed gradiently in others, and that speakers are aware of such gradient restrictions. Such facts suggest that gradient restrictions are included in speakers’ grammars, even if they are not enforced absolutely. An implication of this approach is that lexicons may be rife with exceptions to gradient restrictions. This is not necessarily a problem, as long as learners have adequate evidence to learn both the restriction and the exceptions. In order to learn that a particular morpheme is exceptional, it must have sufficiently high token frequency. This effect is seen very clearly in the domain of irregular morphology, where irregular items tend to be skewed towards high token frequency. Phonological alternations show a similar effect: items that run counter to general lexical trends tend to have higher token frequency. The current study tests whether a similar skewing is found for words that are exceptions to static phonotactic trends, using data from English and Korean. The hypothesis is that exceptional items should likewise require the support of high token frequency.

In order to examine the distribution of grammatically exceptional forms, I employed the UCLA Phonotactic Learner to discover gradient phonotactic restrictions in English and Korean. I examined three lexicons: 4,657 English monosyllabic lemmas, 15,386 Korean mono- and disyllabic nouns, and 3,750 Korean verbs. For each lexicon, the model was used to discover 500 constraints, which were assigned weights to form a maximum entropy grammar. In order to examine the frequency distribution of exceptions, I selected constraints with at least modestly high weights, and reasonably many (>50) exceptions. For each constraint, the frequency density distributions of regular vs. exceptional forms were then compared, testing for a skew towards high frequency among exceptions.

For some constraints, exceptional items are indeed skewed towards higher token frequency. For example, English exhibits a gradient restriction against [ŋ] followed by coronals, and exceptions show a slight skewing towards higher frequencies. This effect is small compared to the effect for morphological irregulars, however, and most constraints do not show such a skewing at all. The same general pattern holds for for Korean nouns and verbs: a few constraints show a slight trend for exceptions to have higher frequency, but most constraints do not.

In order to test the relation between phonotactic probability and frequency more generally, I also calculated bigram transitional probability of existing items in each of the three datasets, and compared bigram probability to token frequency. For all three data sets, generalized linear models show that even when segment count is controlled for, lemma frequency is positively correlated with bigram probability. Thus, phonotactically unusual words tend to have lower frequency, not higher frequency. In the domain of static phonotactics, grammatically exceptional words do not require high token frequency to maintain their exceptionality. I conclude that phonotactically exceptional words are reliably learned, but speakers tend to avoid using them, lowering their token frequency (Martin 2007).