The Language @MIT series returns this week, featuring a talk by Regina Barzilay.
Title: Learning to Model Text Structure
Speaker: Regina Barzilay, CSAIL
When: Wednesday Dec 3, 3-4:30pm
Where: 26-310
Discourse models capture relations across different sentences in a document. These models are crucial in applications where it is important to generate coherent text. Traditionally, rule-based approaches have been predominant in discourse research. However, these models are hard to incorporate as-is in modern systems: they rely on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability.
In this talk, I will present discourse models that can be effectively learned from a collection of unannotated texts. The key premise of our work is that the distribution of entities in coherent texts exhibits certain regularities. The models I will be presenting operate over an automatically-computed representation that reflects distributional, syntactic, and referential information about discourse entities. This representation allows us to induce the properties of coherent texts from a given corpus, without recourse to manual annotation or a predefined knowledge base. To conclude my talk, I will show how these models can be effectively integrated in statistical generation and summarization systems.
This is joint work with Mirella Lapata and Lillian Lee.