[This is extracted from the Spring 2022 version on Canvas, so some links/formatting may be broken.]
This week, we’re building on last week’s core ideas about speech recognition. We’ll start by discussing biases in speech recognition and how to overcome them, based on the Scientific American article as well as two others I’m posting here on machine learning biases. Next, we’ll talk about a simple but conceptually useful algorithm, which can help us design a simple speech recognition model: the nearest-neighbors algorithm. Those readings will help us shore up our phonetic model, and the last topic of this week’s class will be starting in on the language model. We’ll go back to the textbook and examine how spell checkers and autocorrect systems work, and what information they use to infer what you meant when the input data is noisy or erroneous.
We’ll start by talking about the short article from Scientific American that examines which dialects of English are actually captured by current speech recognition technology. This is the same article as last week’s reading, so hopefully you’ve already read it.
I wanted to add a little more context for how biases emerge in machine learning/AI algorithms like speech recognition systems, and I think these two are pretty good. The first, from the MIT Technology Review (HTML), is a brief summary of some of the common sources of AI bias and why fixing them is nontrivial.
The second, a Medium post by a data scientist (HTML), digs deeper into how we can quantify biases and makes the argument that the very process of machine learning is a biased perception of data, so addressing bias in machine learning is even more complex than it initially seems. (I’ll confess I’m not entirely won over by this argument, which feels a little too hand-washy, but I think the idea is worth ruminating on.)
You might be wondering how we actually implement speech recognition. We talked in class about the features that different sounds have, but how does an algorithm classify them? I’ve written up some notes on a simple classification algorithm, known as Nearest Neighbors. This algorithm trains on labelled examples of different sounds in a language or dialect (e.g., a bunch of examples of people producing a specified vowel) and classifies a new sound based on which labelled examples it’s most similar to. While modern speech recognition systems use more complex algorithms than this one, nearest neighbors is a nice tradeoff between effectiveness and ease of implementation. We’ll discuss the algorithm and how to apply it to speech recognition and other linguistic tasks in class.
Lastly, please read Sections 2.1-2.3, excluding “Under the Hood 3: Dynamic programming” from the textbook. This section covers the basics of spell checking/autocorrect, as well as our first exposure to trigram models, which will pop up a few more times through the semester. We’ll cover the rest of the chapter next week, including the dynamic programming section, so read ahead if you’re interested.