Friday, Sept. 19, 3:30 PM

32D-141

Jason Riggle

University of Chicago

“Learning and Linguistic Typology”

In this talk, I present results on the learnability of phonological grammars for two constraint-based models, Harmonic Grammar (HG; Legendre, Miyata, and Smolensky 1990) and Optimality Theory (OT; Prince and Smolensky 1993). I first establish that grammars in these models are learnable from reasonably sized samples of data and then present a learning algorithm for OT that is guaranteed to make no more than k log2 k mistakes when learning grammars with k constraints, which is only logarithmically worse than the best possible learning algorithm for OT.

The proposed learning algorithm calculates the number of rankings that are consistent with a set of data. This makes possible a simple and effective Bayesian heuristic for learning – all else equal, choose can- didates that are preferred by the highest number of rankings consistent with previous observations. This general strategy can be applied to HG or to any parameterized model of grammar, and it associates with each language generated by the theory an abstract quantity measuring the fraction of the parameter space corresponding to grammars that generate that language.

The p-volume seems to encode ‘restrictiveness’ in a way similar to Tesar and Prince’s (1999) r-measure. Preliminary investigations indicate that p-volume is significantly correlated with typological frequency (cf. Bane and Riggle 2008). This fact is neatly explained if language learners use a strategy that is some- times called a Gibbs leaner wherein they keep track of the region of the parameter space consistent with previous observations but make guesses according to a single hypothesis grammar randomly selected from that region. Upon making an error the Gibbs leaner updates the parameter region and randomly selects a new hypothesis grammar from that region. Following this strategy, learners will be predisposed towards grammars with large p-volume in cases where the hypotheses are underdetermined by the data. Moreover, priors other than the ‘flat’ distribution over rankings can be included to implement models of ranking bias.

One of the primary assets of this strategy is that it allows linguistic theory to be informed by the relative frequencies of patterns in linguistic typologies rather than only by the boolean distinction of whether or not a pattern is attested. Though some of the frequency asymmetries surely come from non-linguistic historical accidents, a model of learning that is able to account for some of the frequency variance is clearly of interest and makes a range of predictions that can be tested in experimental settings.