Speaker: Hao Tang (MIT CSAIL)
Title: Automatic speech recognition without linguistic knowledge?
Date and time: Thursday, 11/15, 5-6pm
Location: 46-5165
Abstract:
Building state-of-the-art speech recognizers, besides a large corpus of transcribed speech, require several additional ingredients, such as a phoneme inventory, a lexicon, and a language model. These ingredients carry linguistic constraints to make training more feasible and more sample efficient. Recently, there has been a push towards building a speech recognizer end to end, i.e., using few or even none of the aforementioned ingredients. This raises a fundamental question: is it possible to train a speech recognizer without any linguistic constraint?
How much data do we need to make it possible? What linguistic constraints are necessary for building a speech recognizer? In this talk, I will review the inner workings of conventional and end-to-end speech recognizers, and to help answer some of those questions, I will present empirical results in training end-to-end speech recognizers without any linguistic constraints.