Speaker: David Alvarez Melis (MIT CSAIL)
Title: Interpretability for black-box sequence-to-sequence models
Date/Time: Thursday, November 16th, 5:00-6:30pm
Location: 46-5165
Abstract:
Most current state-of-the-art models for sequence-to-sequence NLP tasks have complex architectures and millions —if not billions—of parameters, making them practically black-box systems. Such lack of transparency can limit their applicability to certain domains and can hamper our ability to diagnose and correct their flaws. Popular black-box interpretability approaches are inapplicable to this context since they assume scalar (or categorial) outputs. In this work, we propose a model to interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an “explanation” consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.