Is the news realistically portraying gender?
This interactive visualization looks at 3 billion words used in English news articles shared through Google News in 2019, and plots how close each word is associated with the male or female gender.
Completed Individually (1 month) with D3, Python
Words have many biases embedded into them based on the way we use them every day. U.S. adults spend more than half their day consuming media, and it’s paramount we recognize what gender labels are being perpetuated in the news and the internet. 300 billion usage associations of words from Google News were converted to vectors, and plotted on their relatedness to gender. The vector process used here is common as a preprocessing step to text-based machine learning. Unfortunately, our models have learned to capture the biases present in the real-life data on which we train them. When we train our machine learning models on embeddings like these, a recruiter searching for "programmers" will leave female resumes at the bottom of the pile.
Out of the 3 million words indexed, a cluster of 5000 deep word embeddings most related to "relationship".
My algorithm pulled the 100 closest words to a content category, and ordered them based off vector distance to male/female prepositions.