Diving Deep into Synthetic Data with Alex Watson of Gretel.ai

Machine Learning Engineered

Apr 20 2021 • 1 hr 19 mins

Alex Watson is the co-founder and CEO of http://Gretel.ai (Gretel.ai), a startup that offers APIs for creating anonymized and synthetic datasets. Previously he was the founder of http://Harvest.ai (Harvest.ai), whose product Macie, an analytics platform protecting against data breaches, was acquired by AWS. Timestamps: 02:15 Introducing Alex Watson 03:45 How Alex was first exposed to programming 05:00 Alex's experience starting Harvest AI, getting acquired by AWS, and integrating their product at massive scale 21:20 How Alex first saw the opportunity for http://Gretel.ai (Gretel.ai) 24:20 The most exciting use-cases for synthetic data 28:55 Theoretical guarantees of anonymized data with differential privacy 36:40 Combining pre-training with synthetic data 38:40 When to anonymize data and when to synthesize it 41:25 How Gretel's synthetic data engine works 44:50 Requirements of a dataset to create a synthetic version 49:25 Augmenting datasets with synthetic examples to address representation bias 52:45 How Alex recommends teams get started with http://Gretel.ai (Gretel.ai) 59:00 Expected accuracy loss from training models on synthetic data 01:03:15 Biggest surprises from building http://Gretel.ai (Gretel.ai) 01:05:25 Organizational patterns for protecting sensitive data 01:07:40 Alex's vision for Gretel's data catalog 01:11:15 Rapid fire questions Links: https://gretel.ai/blog (Gretel.ai Blog) https://www.wired.com/2010/03/netflix-cancels-contest/ (NetFlix Cancels Recommendation Contest After Privacy Lawsuit) https://greylock.com/portfolio-news/the-github-of-data/ (Greylock - The Github of Data) https://gretel.ai/blog/improving-massively-imbalanced-datasets-in-machine-learning-with-synthetic-data (Improving massively imbalanced datasets in machine learning with synthetic data) https://gretel.ai/blog/deep-dive-on-generating-synthetic-data-for-healthcare (Deep dive on generating synthetic data for Healthcare) https://medium.com/gretel-ai/synthetic-data-performance-report-e5a3cd6b1e6d (Gretel's New Synthetic Performance Report) https://www.goodreads.com/book/show/18007564-the-martian (The Martian) https://www.penguinrandomhouse.com/books/172832/snow-crash-by-neal-stephenson/ (Snow Crash) https://us.macmillan.com/series/themurderbotdiaries/ (The MurderBot Diaries)