Biased Data

Beneficial Intelligence

Apr 9 2021 • 7 mins

In this episode of Beneficial Intelligence, I discuss biased data. Machine Learning depends on large data sets, and unless you take care, ML algorithms will perpetuate any bias in the data it learns from.

The famous ImageNet database contains 14 million labeled images. However, 6% of these have the wrong label. The labels are provided by humans paid very little per image, so they will work very fast. Unfortunately, as Nobel Prize winner Daniel Kahneman has shown, when humans work fast, they depend on their fast System 1 thinking that is very prone to bias. Thus, a woman in hospital scrubs is likely to be classified "nurse" and a man in the same clothes is likely to be classified "doctor."

Google Translate was showing its bias when translating from Hungarian. Hungarian only has a gender-neutral pronoun, but the English translation was given a pronoun. The original gender-neutral phrases became "she does the dishes" and "he reads" in English.

As CIO or CTO, you need to make sure somebody ensures the quality of the data you use to train your machine learning algorithms. If you don't have a Chief Data Officer, maybe you have a Data Protection Officer who could reasonably be given this purview. But you cannot foist this responsibility on individual development teams under deadline pressure. It is your responsibility to ensure that any machine learning system is learning from clean, unbiased data.

Beneficial Intelligence is a weekly podcast with stories and pragmatic advice for CIOs, CTOs, and other IT leaders. To get in touch, please contact me at sten@vesterli.com

You Might Like

Darknet Diaries
Darknet Diaries
Jack Rhysider
Marketplace Tech
Marketplace Tech
Marketplace
Hard Fork
Hard Fork
The New York Times
WSJ’s The Future of Everything
WSJ’s The Future of Everything
The Wall Street Journal
Acquired
Acquired
Ben Gilbert and David Rosenthal
Rich On Tech
Rich On Tech
Rich DeMuro
TechStuff
TechStuff
iHeartPodcasts
Fortnite Emotes
Fortnite Emotes
Lawrence Hopkinson
The Vergecast
The Vergecast
The Verge
Waveform: The MKBHD Podcast
Waveform: The MKBHD Podcast
Vox Media Podcast Network