Categories
LangComp

Ling 354: Week 6

[This is extracted from the Spring 2022 version on Canvas, so some links/formatting may be broken.]

This week, we’re digging deeper into spell checking and language models. We’ll focus on simple models first, like n-grams, which we talked about at the end of class. That will require us to spend a little time talking about probabilities (especially conditional probabilities) as well.

Textbook reading

First, please read the remainder of Chapter 2 of the textbook. Don’t panic if you’re having trouble with Section 2.4.1; that’s a too-brief overview of syntax, a quite complex part of linguistics. We’ll go into more depth on syntax next week once we understand simpler language models like n-grams, but getting some familiarity with the concepts now will help when we come back to it next week.

Additional articles, etc.

I have some additional notes that should help with understanding the concepts in this chapter. The first is a basic overview of probability theory for linguistics (PDF).  This is optional reading, but if you’re not familiar with probabilities or hate math, I hope you’ll find it an accessible introduction to the topic, which will come up a few times this semester. I originally developed it for my Ling 502 class, but the concepts apply equally well in this class. We’ll talk about conditional probability and Bayes’ Rule as ways of working with n-gram models this week.

The second set of notes looks at how we collect and use linguistic data to try to build better language models (PDF). In particular, you might find it interesting to play around on Google Books N-grams to look at real-world usage data and see how increased context changes the probabilities of certain words.

The last set of notes (PDF) digs into the topic of Under the Hood 3: dynamic programming. I don’t think the book does a great job of explaining how dynamic programming (in the form of topological orderings on directed acyclic graphs) works, so I worked through a few examples.

Finally, let’s wrap it up with a look at how spell checkers succeed and fail in practice. First, here are two blogposts on the Cupertino effect, an unintended consequence of early automatic spelling correction systems (linklink). Second, a blogpost from the team at Microsoft that worked on Office 2007’s, discussing how they chose to trade off between high precision (if it labels something an error, it’s probably right, but it also misses some errors) versus high recall (it catches most errors, but also flags a lot of non-errors). I found their discussion of user preferences really interesting, and I’d like us to talk on Thursday about user design in these kinds of systems.

Categories
LangComp

Ling 354: Week 5

[This is extracted from the Spring 2022 version on Canvas, so some links/formatting may be broken.]

This week, we’re building on last week’s core ideas about speech recognition. We’ll start by discussing biases in speech recognition and how to overcome them, based on the Scientific American article as well as two others I’m posting here on machine learning biases. Next, we’ll talk about a simple but conceptually useful algorithm, which can help us design a simple speech recognition model: the nearest-neighbors algorithm. Those readings will help us shore up our phonetic model, and the last topic of this week’s class will be starting in on the language model. We’ll go back to the textbook and examine how spell checkers and autocorrect systems work, and what information they use to infer what you meant when the input data is noisy or erroneous.

Additional articles/videos

We’ll start by talking about the short article from Scientific American that examines which dialects of English are actually captured by current speech recognition technology. This is the same article as last week’s reading, so hopefully you’ve already read it.

I wanted to add a little more context for how biases emerge in machine learning/AI algorithms like speech recognition systems, and I think these two are pretty good. The first, from the MIT Technology Review (HTML), is a brief summary of some of the common sources of AI bias and why fixing them is nontrivial.

The second, a Medium post by a data scientist (HTML), digs deeper into how we can quantify biases and makes the argument that the very process of machine learning is a biased perception of data, so addressing bias in machine learning is even more complex than it initially seems. (I’ll confess I’m not entirely won over by this argument, which feels a little too hand-washy, but I think the idea is worth ruminating on.)

You might be wondering how we actually implement speech recognition. We talked in class about the features that different sounds have, but how does an algorithm classify them? I’ve written up some notes on a simple classification algorithm, known as Nearest Neighbors. This algorithm trains on labelled examples of different sounds in a language or dialect (e.g., a bunch of examples of people producing a specified vowel) and classifies a new sound based on which labelled examples it’s most similar to. While modern speech recognition systems use more complex algorithms than this one, nearest neighbors is a nice tradeoff between effectiveness and ease of implementation. We’ll discuss the algorithm and how to apply it to speech recognition and other linguistic tasks in class.

Textbook reading

Lastly, please read Sections 2.1-2.3, excluding “Under the Hood 3: Dynamic programming” from the textbook. This section covers the basics of spell checking/autocorrect, as well as our first exposure to trigram models, which will pop up a few more times through the semester. We’ll cover the rest of the chapter next week, including the dynamic programming section, so read ahead if you’re interested.

Categories
LangComp

Ling 354: Week 4

[This is extracted from the Spring 2022 version on Canvas, so some links/formatting may be broken.]

This week, we’re turning to speech recognition. How do Alexa, Siri, Google, and other voice-activated assistants work? What causes difficulties for them, and how can we overcome them? For that matter, how does human speech recognition work? Why do they screw up your name at cafes? Why do so many people think my name is “Dave”? We’ll try to get to the bottom of these mysteries this week and next.

Textbook reading

Sect 1.4. This covers the basics of speech recognition from a computer’s perspective, as well as a quick overview of the sound patterns of human language, which is covered in much more detail in the reading below.

Additional articles/videos

Read through Sections 2.1, 2.2, 2.3 and 2.6 of Language Files. This provides more detail, from a linguistic perspective, on how linguistic sounds are produced (2.1-2.3) and perceived (2.6). Any reasonable speech recognition system will need to incorporate this sort of information to accurately determine what sounds people are making.

Also, the discussion of syllable structure should help clarify the nature of syllabaries and abugidas from our discussion of writing systems.  (Sections 2.4 and 2.5 are less important for English speech recognition, so you can skip them. But in case you’re interested in the structure of language more generally, I left them in the file for you.)

Since that’s pretty dense reading, I want to wrap up the week with one short article from Scientific American that examines which dialects of English are actually captured by current speech recognition technology. Think about cases where you or your friends are misunderstood, whether by humans or computers, and we’ll talk about how these failures arise and can be countered.

(One last thing, and strictly optional, but the Proceedings of the National Academy of Sciences article that forms the basis of the SA article is pretty good, and worth a look if you have the time/interest.)

Categories
LangComp

Ling 354: Week 3

[This is extracted from the Spring 2022 version on Canvas, so some links/formatting may be broken.]

In our Week 3 meeting, we’ll first wrap up emoji, including digging a bit deeper into how deeply we share our understanding of emoji. We’ll then turn to the QWERTY effect, research that argues that the ways we type language has a subtle but significant influence on our perception of it. 

Textbook reading

No textbook reading for this week.  

Additional articles/videos

We’ll start class by discussing the Bai et al 2019 paper (A Systematic Review of Emoji: Current Research and Future Perspectives) that I’d meant to get to last class. Hopefully, you’ve already read it, but here’s the links again in case it’s helpful (HTML).

A couple people in the pre-class discussion had questions about a point that Bai et al made, which is that emoji are prone to “inefficiency” and “misunderstanding”. I’ll be honest: I was also confused by Bai et al’s discussion on this point. So I went back to the papers they cited, and I found one that both clarifies this point and is interesting in its own right: Tigwell & Flatla 2016. We’ll discuss this paper alongside the Bai et al one, and talk more generally about how messages are understood and misunderstood. (Optionally, if you’re interested in these issues, you may also want to read this paper.)

For the QWERTY effect, we’ll be reading an original research paper: Jasmin and Casasanto 2012. The statistical analysis in this paper may be a little tough if you’re not familiar with such things, so if you’re feeling stuck, focus on the higher-level concepts over the specific results. What is the QWERTY effect supposed to be? What do J&C think might cause it? How do they propose testing it? Do you find their methods convincing? How could you adapt this work to investigate languages/cultures with other keyboards and other writing systems?