Data and SocietyBias in Word Embeddings
As a first example for why we may need to change our technical choices in order to make fair algorithms, we will look at the
Word embeddings are a technique for transforming text into matrices, by turning each word into a vector with real entries. The goal is to design word embeddings so that words with similar meaning are mapped to vectors that are close to each other (as measured by their angle or distance). Many word embeddings are based on co-occurence counts: which words frequently appear together in the same document? The idea behind this design is that words that occur together in a given document should share similar meaning. Word embeddings are useful since they can streamline search engines (if people search for the same topic using different but similar words) or infer meaning from texts.
To understand the impacts of word embeddings that have biases, we will look at two types of harm that algorithms can cause, namely representational and allocative. A representational harm reinforces existing stereotypes we may have of people: they may therefore amplify existing social biases. An allocative harm occurs when a decision-making system allocates opportunity, goods, or services in a biased manner.
Biased word embeddings are a(n)
With the following exercise, we aim to accomplish three goals: (i) learn more about biases in word embeddings, (ii) learn how to read technical papers, and (iii) begin to write brief summaries for a general non-expert audience. The tasks are to first read the word-embeddings paper referenced above and second, write a 300-500 word article that summarizes its content for a general audience. Upload your article via this Google Form
Reading Technical Papers
Graduate school requires reading large amounts of material to learn. Working in a fast-moving field like data science requires the same reading skill to stay up-to-date and current in one's area. There are some useful techniques for reading scientific and technical papers that can make this easier. Since this is a new skill for most graduate students, faculty have written guides on how to read for their discipline, including for computer science. Data science methods are often published in computer science related venues so this fits well.
Open the reading-guide paper referenced above and read the headings.
It recommends reading a paper in
Now read sections 1 and 2 of the tutorial paper more thoroughly.
This multistage approach allows you to look for a little bit of information in each pass, but different information in each step. The reading guide paper is targeted at PhD students who will come across many papers that are potentially related to their work, but will never have enough time to read them all. It also addresses a level of detail that may not be practical. We recommend a slightly modified three passes, that follow the same basic goals as the paper.
Use that to put the three steps in order:
For projects, you may need to do some more extensive reading. If you plan to pursue a PhD later, you will definitely need to write literature reviews and eventually review papers.
Now try reading the word embeddings paper, taking note of what the key points are and what a non data scientist should know about that article.
Writing a newspaper style article might also be new. This piece should be in less formal language than academic publishing and accessible to an adult with a secondary (high school) education. Below are a number of (optional) resources to help you understand what we are looking for.
Your article should summarize the paper. Here is a guide to summarizing a psychology research paper as an example. There are some psychology-specific points (eg APA is American Psychological Association), but it is overall a good guide. Here is a second guide to summarizing a research paper.
As an example, see this Science News article AI acquired humanlike 'numbersense' on its own. That is a stand-alone summary of the finding, like what you should aim for in your article. As a second example, see When should banks chase debts? New method could help them decide, note here that this one is based on an interview with the researchers and includes quotes. We're not asking for that, but the style and level of complexity used is appropriate.
If you're looking for more inspiration, Massive Science tabulates publicly accessible science articles written by other scientists. Below are a couple of examples you might read to get a feel for the style of writing. These are longer and more detailed than we expect for this assignment, but have the right style.
- Being a Pokémon Master in childhood permanently alters your brain
- Artificial intelligence isn’t a ‘black box.’ It’s a key to studying the brain
Hopefully now you feel prepared to write and submit your summary of the paper.