Datacast

James Le

Datacast follows the narrative journey of founders, operators, and investors in the data and AI infrastructure space to unpack the careers that they have built. James Le hosts the show. read less

Episode 111: Astrophysics, Visualization Recommendation, and Scalable Data Science with Doris Lee
1w ago
Episode 111: Astrophysics, Visualization Recommendation, and Scalable Data Science with Doris Lee
Show Notes(01:30) Doris walked through her time doing research in physics and astrophysics at UC Berkeley and getting involved with data science.(04:11) Doris reflected on her decision to pursue the Ph.D. program in computer science at the University of Illinois, Urbana-Champaign.(05:53) Doris discussed her development of no-code, interactive visualization interfaces accelerating users toward data insight discovery.(10:37) Doris explained how the RISE Lab and I School at UC Berkeley helped shape her thinking around working with end-users and building something to serve the data science community.(16:05) Doris unpacked the focus of her Ph.D. dissertation - which is to make data exploration and visualization easier and more accessible through automation.(17:27) Doris shared the motivation and high-level design behind the development of Lux, a general-purpose visual exploration assistant situated within a computational notebook.(21:25) Doris revealed the recipe for open-source community engagement and roadmap prioritization with Lux.(26:17) Doris shared the founding story of Ponder, whose mission is to improve data science productivity by empowering users to do data science at all scales.(31:02) Doris explained how Ponder helps solve the fragmentation challenges across the data stack.(34:27) Doris provided a brief overview of Modin, which improves the scalability of data frames.(38:41) Doris discussed Ponder's go-to-market strategy to drive more enterprise interest toward the product.(41:23) Doris discussed her team's challenges in finding early design partners across various industries.(44:16) Doris shared valuable hiring lessons to attract the right people who are excited about Ponder's mission.(47:42) Doris shared fundraising advice to founders who are seeking the right investors for their startups.(49:33) Doris highlighted the difference between being a researcher and a founder.(51:06) Closing segment.Doris' Contact InfoWebsiteTwitterLinkedInGitHubPonder's ResourcesWebsite | Twitter | LinkedIn | SlackModin | LuxEventsMentioned ContentPublicationsThe Case for a Visual Discovery Assistant:A Holistic Solution for Accelerating Visual Data Exploration (IEEE Data Bulletin 2018)Understanding Sense-making in Visual Query Systems (IEEE Visual Analytics Science and Tech 2019)Deconstructing Categorization in Visualization Recommendation: A Taxonomy and Comparative Study (IEEE Transactions on Visualization and Computer Graphics 2021)Lux: Always-On Visualization Recommendation for Exploratory Data Science (Dec 2021)Blog PostsInsight Machines: The Past, Present, and Future of Visualization Recommendation (Multiple Views, Feb 2020)Announcing Ponder (March 2022)How we parallelized 600+ pandas functions with Modin (March 2022)Using Lux to visualize your pandas dataframes with zero effort (March 2022)Ph.D. Alum Doris Lee Wants to Democratize Data Science Tools (March 2022)PeopleChip HuyenShreyar ShankarParul PandeyNotesMy conversation with Doris was recorded back in May 2022. Earlier this year, Ponder developed the first-of-its-kind technology that allows anyone to run their pandas code directly in your data warehouse, be it Snowflake, BigQuery, or Redshift. With Ponder, you get the same pandas-native experience that you love, but with the power and scalability of cloud-native data warehouses. More details are in this blog post.Additionally, you can run NumPy commands on your data warehouse as well. This means you can work with the NumPy API to build data and ML pipelines, and let Snowflake / BigQuery / Redshift take care of scaling, security, and compliance. More details are in this blog post.If you are interested in trying these new capabilities out, sign up here!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 110: Wisdom in Building Data Infrastructure, Lessons From Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini
Mar 14 2023
Episode 110: Wisdom in Building Data Infrastructure, Lessons From Open-Source Development, The Missing README, and The Future of Data Engineering with Chris Riccomini
Show Notes(01:47) Chris reflected on his educational experience at Santa Clara University in the mid-2000s, where he also interned at NeoMagic and Intacct Corporation.(07:31) Chris recalled valuable lessons from his first job as a software engineer at PayPal, researching new fraud prevention techniques.(11:28) Chris shared the technical and operational challenges associated with his work at LinkedIn as a data scientist - scaling LinkedIn's Hadoop cluster, improving LinkedIn's "People You May Know" algorithm, and delivering the next generation of LinkedIn's "Who's Viewed My Profile" product.(22:00) Chris provided criteria that his team relied on when choosing their big data solutions (which include Aster Data, Greenplum, and Hadoop).(25:22) Chris gave advice to early-stage startups that want to start adopting best practices in observability and deployment.(28:02) Chris expanded on his concept that models and microservices should be running on the same continuous delivery stack.(30:52) Chris discussed his strategy to become a better interviewer - as he performed ~1,500 interviews at LinkedIn and WePay.(37:39) Chris explained the motivation behind the creation of Apache Samza (LinkedIn's streaming system infrastructure built on top of Apache Kafka) and discussed its high-level design philosophy.(46:19) Chris shared lessons learned from evangelizing Samza to the broader open-source community outside of LinkedIn.(52:44) Chris talked about his decision to join the Data Infrastructure team at WePay as a principal software engineer after 7 years at LinkedIn.(01:00:53) Chris shared the technical details behind the evolution of WePay's data infrastructure throughout his time there.(01:12:40) Chris shared an insider perspective on the adoption of Apache Airflow from his experience as a Project Committee Member.(01:20:15) Chris discussed the fundamental design principles that make Apache Kafka such a powerful technology.(01:25:40) Chris reflected on his experience building out WePay's engineering team.(01:27:14) Chris shared the story behind the writing journey of the "Missing README" - which he co-authored with Dmitriy Ryaboy.(01:38:16) Chris revisited his predictions in a 2019 post called "The Future of Data Engineering" and discussed key trends such as real-time data warehouses, data mesh, and headless BI.(01:44:27) Chris gave advice to a smart, driven engineer who wants to explore angel investing - given his experience as a strategic investor and advisor for startups in the data space since 2015.(01:48:17) Chris shared advice on hiring engineers and navigating open-source product strategies for companies he invested in.(01:53:57) Chris reflected on his consistency in adding value to the relationships he has formed over the years.(01:58:00) Closing segment.Chris's Contact InfoWebsiteTwitterLinkedInGithubAngelListMentioned ContentBlog PostsJoel Spolsky's BlogModels and microservices should be running on the same continuous delivery stack (Oct 2018)Using checksums to verify syncing 100M database records (Napkin Math, Jan 2021)Datacast episode with Jeremiah Lowin, CEO of Prefect (March 2022)Kafka CDC breaks database encapsulation (Nov 2018)Kafka provides data portability and infrastructure agility (Jan 2019)The Future of Data Engineering (July 2019)Work For Two Companies (Nov 2021)PeopleWill LarsonMaxime BeaucheminJulia EvansGunnar MorlingCoda HaleBooksGoogle's Site Reliability Engineering Books"On Writing Well""The Missing README""Empire of Light: Tesla, Edison, Westinghouse, and the Race to Electrify the World"NotesMy conversation with Chris was recorded back in May 2022. Earlier this year, Chris released Recap, a dead simple data catalog for engineers, written in Python. Recap makes it easy for engineers to build infrastructure and tools that need metadata. Check out his blog post and get started with Recap's documentation!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 109: Developer Productivity, Real-Time Data Infrastructure, and The Fat-Tailed Nature of Enterprise Software with Nnamdi Iregbulem
Feb 21 2023
Episode 109: Developer Productivity, Real-Time Data Infrastructure, and The Fat-Tailed Nature of Enterprise Software with Nnamdi Iregbulem
Show Notes(01:32) Nnamdi shared formative experiences of his upbringing, where he spent countless hours building computers, coding up websites, and finding ways to game Google search.(04:54) Nnamdi described his undergraduate experience studying Economics at Yale University and interning at McKinsey and J.P. Morgan.(08:10) Nnamdi reflected on the decline of the investment banking industry - given his one year working for the technology, media, and telecommunications group at J.P. Morgan in New York.(12:52) Nnamdi discussed his career transition into venture investing at ICONIQ Capital, where he deployed over $500 million into high-growth technology companies.(15:00) Nnamdi reflected on his proudest accomplishments during his four formative years at ICONIQ.(17:35) Nnamdi talked about his excitement for GitLab, one of his investments.(21:27) Nnamdi touched on his time getting an MBA from the Stanford Graduate School of Business.(24:21) Nnamdi also completed coursework in Stanford's Computer Science department (such as CS 231N and CS 224N) during his MBA.(26:37) Nnamdi explained the venture ecosystem at Stanford, given his experience serving as the Co-President and Vice President of the Venture Capital and Tech Clubs, respectively.(28:57) Nnamdi unpacked his experience working at Confluent as a product manager and conducting independent research on trends in developer productivity.(32:23) Nnamdi reflected on his decision to join Lightspeed Venture in mid-2020, investing in early-stage software startups to enhance the productivity of technical knowledge workers.(34:17) Nnamdi shared how he proved his value upfront in potential deals and started forming his investment theses as a new investor at Lightspeed.(36:24) Nnamdi dissected the key factors that triggered him to make investments in the seed rounds of Ponder and Voltron Data (in the domain of developer tools).(40:36) Nnamdi explained his Series A investment in Redpanda and Materialize (in the domain of real-time data infrastructure).(45:45) Nnamdi shared advice he had been giving his portfolio companies in hiring decisions and navigating growth strategy.(49:07) Nnamdi walked through his 3-part series on major industry trends, top strategic priorities, and biggest challenges for software and infrastructure startups pushing the developer productivity frontier.(52:37) Nnamdi shared advice to startups thinking about scaling their developer relations, given the challenge of hiring developer advocates for dev-focused startups.(56:27) Nnamdi unpacked his 3-part series on the developer productivity manifesto that introduces the developer productivity flywheel, explains how more developers lead to lower productivity, and argues that we are leaving on the table $670B of software by not maximizing developer employment and developer productivity.(01:01:26) Nnamdi examined his obsession with the fat-tailed nature of high-growth startups, such as why VCs don't index-invest, why Saas monetization is concentrated on the tails, and why product-market fit gets harder to achieve the longer you search for it.(01:04:26) Nnamdi explained his new and improved SaaS metric called Weighted ACV, which is the weight of the revenue that a customer represents and tells founders where to look if they want to best understand the revenue of their businesses.(01:07:53) Nnamdi thought about his recognition as equal to his credibility as an investor on a mission to increase total software output by investing in technical tools for technical people.(01:11:03) Closing segment.Nnamdi's Contact InfoWebsiteLightspeed ProfileLinkedInTwitterGitHubMediumLightspeed's ResourcesWebsite | Twitter | LinkedInGlobal PresenceMedium BlogMentioned ContentArticlesSix Trends Shaping Developer ProductivityTop Three Strategic Priorities of Developer Productivity StartupsFour Challenges Facing Developer Productivity StartupsAwesome Developer Advocates Are Hiding in Plain SightThe Developer Productivity Manifesto Part 1 — The FlywheelThe Developer Productivity Manifesto Part 2 — More (Developers) Isn’t Always MoreThe Developer Productivity Manifesto Part 3 — Leaving Software on the TableYou Don't Understand Compound GrowthFunding Simply Shifts the BottleneckWhy Don't VCs Index Invest?Enterprise Software Monetization is Fat-TailedProduct-Market Fit is LindyIntroducing a New and Improved SaaS Metric: Weighted ACVPeopleMike Volpi (Index Ventures)Keith Rabois (Founders Fund)BooksNassim Taleb's Incerto Series:Fooled By RandomnessThe Black SwanThe Bed of ProcrustesAntifragileSkin In The GameNotesMy conversation with Nnamdi was recorded in May 2022. Since then, many things have happened. I'd recommend checking out:Lightspeed's announcement of the new three funds last yearNnamdi’s new series on software valuations (1, 2, 3)Nnamdi’s recent posts on the reality of tech layoffs and the need for more startupsNnamdi’s recent investment in Select StarAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 108: Computer Vision, Product Management, and Enterprise Investing with Tom Rikert
Feb 16 2023
Episode 108: Computer Vision, Product Management, and Enterprise Investing with Tom Rikert
Show Notes(02:00) Tom shared formative experiences of his upbringing.(04:17) Tom described his educational experience at MIT and his research thesis in Computer Vision.(07:08) Tom talked about his interest in computer vision and computational neuroscience.(08:38) Tom recalled lessons from his first job out of school as a software engineer at Silicon Graphics, building high-performance visualization systems.(11:32) Tom reflected on his time as a product lead at Autodesk, launching a location services platform for global wireless carriers and a developer ecosystem for GIS applications.(13:48) Tom reflected on his MBA experience at Harvard Business School.(16:54) Tom reflected on his first stint at Google - leading a sales operations team in AdWords and building YouTube's monetization systems.(20:06) Tom recalled lessons learned as a first-time founder of a social commerce startup called Renown Labs.(23:11) Tom walked over his time as the Director of Product at the SaaS social media marketing startup Wildfire (which was acquired by Google in August 2012).(27:00) Tom explained his career transition from tech operating into venture investing - after joining the enterprise investment team at a16z as a partner in 2012.(29:55) Tom revisited his thesis, discussing the rise of Enterprise Hackers back in 2013.(32:25) Tom talked about his decision to join NextWorld Capital as a partner in 2014, leading investments across enterprise applications, the Internet of Things, and AI.(34:50) Tom unpacked his investment thesis on enterprise technology that helps the blue-collar working class.(36:50) Tom shared his mental checklist used to evaluate investment opportunities in enterprise AI at NextWorld.(40:39) Tom shared the founding story of Masterful AI - where he has been a co-founder and CEO since 2019.(43:08) Tom expanded upon the 2-year incubation period from the inception to the announcement of the Masterful AI platform.(45:17) Tom unpacked major inefficiencies of ML development and explained how the Masterful platform works at a high level.(48:18) Tom shared exciting initiatives in Masterful's product roadmap.(50:09) Tom highlighted the principles that stood the test of time in computer vision over the past two decades.(51:41) Tom shared valuable hiring lessons to attract the right people who are excited about Masterful AI's mission.(55:07) Tom discussed his team's challenges in finding early design partners across various industries.(58:24) Tom shared fundraising advice to founders who are seeking the right investors for their startups.(01:01:01) Tom reflected on his career traversing across product management, venture capital, and startup founder.(01:05:20) Closing segment.Tom's Contact InfoLinkedInTwitterMediumWebsiteMasterful AI's ResourcesWebsite | Twitter | LinkedInDocs | Slack Community"Building Things with Machine Learning" PodcastMentioned ContentArticles"The Enterprise Hacker Rises" (a16z Blog, Dec 2013)"Joining NextWorld Capital" (Personal Blog, Nov 2014)"The Next Big Opportunity In Enterprise Starts In The Field" (TechCrunch, July 2015)"My visit to the Obama White House: AI, the future of jobs, and a VC’s Letter to the next administration" (NextWorld Insights, Jan 2017)"AI hype has peaked so what’s next?" (TechCrunch, Sep 2017)"AI is bringing superpowers to the specialist" (LinkedIn, Oct 2018)"Introducing Masterful AI" (Masterful Blog, Nov 2021)PeopleAndrew Ng (Founder of DeepLearning.AI, Founder and CEO of Landing AI, Co-Founder of Coursera)Chris Dixon (General Partner at a16z)BookAI Superpowers (by Kai-Fu Lee)NoteMy conversation with Tom was recorded back in May 2022. Here is the note from Tom regarding updates with Masterful:The latest at Masterful AI is that we’re launching a new generative AI product.  We saw a need to make generative models more customizable and more reliable, so companies can trust them for real business applications.  We’re starting by enabling companies to tell a more vivid and personalized story about their products at scale.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 107: Investing At The Nexus of Computational Sciences with Grace Isford
Feb 10 2023
Episode 107: Investing At The Nexus of Computational Sciences with Grace Isford
Show Notes(02:02) Grace shared formative experiences of her upbringing - being heavily influenced by the financial sector from growing up near New York and getting an appreciation for diverse perspectives from studying abroad in Tokyo.(03:47) Grace described her college experience at Stanford studying Management Science and Engineering.(08:04) Grace talked about her participation in the Mayfield Fellowship and her service as a Co-President of Stanford Women in Business.(12:34) Grace walked through her internship experiences as an investor at the Stanford Management Company, in product at ed-tech startup Handshake, and in growth equity at Stripes Group.(16:20) Grace reflected on her time at Canvas Ventures - where she joined as a campus scout while still a student.(19:11) Grace shared three approaches to prove her value upfront in potential deals and to form her investment thesis as a new VC associate.(23:07) Grace dissected her Series A investment in Vendia, a blockchain-powered, real-time data-sharing platform that solves the growing inter-organization data collaboration problem.(25:17) Grace examined her Series A investment in Robocorp, which offers the first cloud-native, open-source automation stack and orchestration platform to power any automation process.(26:43) Grace shared three pieces of advice in hiring decisions, navigating go-to-market strategy, and growing product offerings that she had given her portfolio companies.(29:49) Grace shared trends in the API-first economy that she is most excited about in the upcoming years.(31:56) Grace unpacked key takeaways from her article "The Mindset of a Data Leader."(34:02) Grace discussed under-hyped and over-hyped trends in Web3 - taken from her incredibly detailed deck on the Web3 World.(36:31) Grace dissected the major categories of the Web3 infrastructure, including Decentralized Finance, Decentralized Apps, DAOs, NFTs, and Guild Education/Reskilling.(41:43) Grace walked through her decision to join Lux Capital, a firm that invests in emerging science and technology ventures at the outermost edges of what is possible, as a principal investor in early 2022.(45:12) Grace shared her mental checklist to evaluate entrepreneurs and make investment decisions at the nexus of web3, data infrastructure, and applications of AI/ML.(47:10) Grace talked about her community-building work to promote women's voices in tech.(48:59) Grace reflected on her consistency in adding value to every conversation with people in her community.(51:26) Closing segment.Grace's Contact InfoWebsiteLux ProfileLinkedInTwitterLux CapitalWebsite | Twitter | LinkedInSecurities (Podcast & Newsletter)Mentioned ResourcesArticles"The Third-Party API Economy: Part I" (Sep 2020)"The Third-Party API Economy: Part II" (Feb 2021)"The Mindset of a Data Leader" (Nov 2020)"The Web3 World" (Jan 2022)"Welcoming our newest investor Grace Isford to Lux Capital" (Feb 2022)PeopleFred Wilson (Union Square Ventures)Matt Huang and Fred Ehrsam (Paradigm Ventures)Katie Haun (Haun Ventures)Book"Wanting" (by Luke Burgis)NotesMy conversation with Grace was recorded back in April 2022. Since then, many things have happened. I'd recommend:Listening to her chats with Tina Seelig and Christian Catalini on the Securities podcastReading her thoughts on building the next AI/ML infrastructure stackCheck out her reflections on the intersection of AI and creativityAdditionally, Grace invested in RunwayML's Series C, a pioneer in the Generative AI space. If you are in NYC, be sure to stop by the upcoming first annual AI film festival powered by Runway!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 106: Advancing AI Adoption with Dânia Meira
Jan 4 2023
Episode 106: Advancing AI Adoption with Dânia Meira
Show Notes(01:32) Dânia shared her upbringing in Brazil and her college experience studying Applied Mathematics at the University of Campinas.(05:58) Dânia touched on her early career working in marketing intelligence in Brazil.(10:38) Dânia described her thesis on scalable implementations of the Alternating Least Squares algorithm for Collaborative Filtering recommendation, conducted during her Master's degree in Computer Science from the University of Fluminense.(16:10) Dânia recalled her hustling phase working and getting a Master's degree simultaneously.(24:19) Dânia reflected on her move to Berlin to work as a data scientist in several startups.(31:00) Dânia looked back at her time working at MYTOYS GROUP's Analytics team, responsible for Predictive Analytics and Machine Learning Modeling.(34:12) Dânia compared doing data science to practicing mixed martial arts.(38:35) Dânia reflected on her involvement with Data Science for Social Good Berlin as a data ambassador and Data Science Retreat as a SQL Masterclass Teacher.(43:14) Dânia shared the founding story of AI Guild - the go-to community for data and business professionals advancing AI adoption - where she is a founding member.(47:36) Dânia gave her thoughts on barriers preventing more women from entering the data field.(51:21) Dânia discussed the #datalift initiative, which pushes to productionize more data analytics and machine learning solutions.(58:27) Dânia explained her work supporting the advancement of #datacareer talents and experts.(01:01:22) Dânia gave her take on the evolution of the data field over the past decade.(01:03:16) Closing segment.Dânia's Contact InfoLinkedInTwitterWebsiteGitHubMediumAI Guild's ResourcesWebsite | LinkedIn | YouTubeJoin As A Member#datalift#datacareerMentioned ContentPeopleAndrew Ng: Founder of deeplearning.ai, co-founder of CourseraAlessandra Sala: President of Women in AI, Sr. Director of Artificial Intelligence and Data Science at ShutterstockJoy Buolamwini: Founder and Executive director of The Algorithmic Justice League and maker of the "Coded Bias" documentary, available on NetflixBookWeapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'NeilAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 105: Building The Next-Generation Spreadsheet, Being A Curious Analyst, and Engineering Entrepreneurship with Bobby Pinero
Dec 20 2022
Episode 105: Building The Next-Generation Spreadsheet, Being A Curious Analyst, and Engineering Entrepreneurship with Bobby Pinero
Show Notes(01:33) Bobby shared his upbringing in DC and high-school experience at St. Albans School.(04:10) Bobby described his academic experience at Stanford studying Management Science and Engineering.(07:39) Bobby recalled valuable career lessons learned working as a Finance Analyst at IBM and Inflection.(09:56) Bobby reflected on his rationale for joining Intercom as one of the company's early employees right after its Series A financing in 2013.(14:16) Bobby unpacked his 2016 talk "Scaling Analytics at Intercom," which explained the analytics journey at Intercom.(18:46) Bobby shared a few metrics that are fundamental to the health of a startup across its growth stages (read his Intercom blog about the data points that startups should measure).(22:50) Bobby shared the founding story of Equals.(27:33) Bobby explained his decision to choose Ben McRedmond as his co-founder.(29:35) Bobby expanded on the appealing traits of using spreadsheets.(31:54) Bobby described the evolution of spreadsheet-like products and how the Equals product works at a high level.(34:35) Bobby gave his take on how the concept of a next-generation spreadsheet fits into the quickly evolving modern data stack.(38:31) Bobby shared valuable hiring lessons to attract the right people who are excited about Equals' mission.(44:34) Bobby shared the challenges of finding Equals' early design partners and lighthouse customers.(47:17) Bobby recapped key lessons about hiring financial analysts at Intercom.(51:45) Bobby shared advice to a smart, driven finance operator looking to get more influence within a startup environment.(56:26) Bobby emphasized the valuable skills acquired from his analyst career for his current founder journey.(58:45) Closing segment.Bobby's Contact InfoLinkedInTwitterEquals ResourcesWebsite | Twitter | LinkedInSpreadsheet TemplatesInsights In Action interview seriesIntroducing Pivot Tables for Equals (Aug 2022)Equals raises $16M Series A from a16z to replace Excel (Nov 2022)Equals is hiring across Engineering, Design, Growth, and an Executive Assistant. Reach out to Bobby if you are interested!Mentioned ContentArticles + Talk23 SaaS Metrics for Fundraising + Optimization (March 2015)Scaling Analytics at Intercom (Intercom Analytics Meetup, April 2016)Data Points: What Should Your Startup Measure? (Oct 2017)Every analyst is a finance analyst (May 2021)The only question that matters when interviewing analysts (May 2021)When to make your first finance hire (May 2021)The hardest leap to make as a scaling finance leader (June 2021)Finance and describing product-market fit (Sep 2021)The curious analyst (Sep 2021)The less lonely finance leader (Sep 2021)Why every scaling finance team is understaffed (Nov 2021)Revenue is the best North Star metric (March 2022)PeopleKaren Church (VP of Research and Data Science at Intercom, Founder of HER+Data)Noah Goodman (President at DataCRT)Peter Fishman (Co-Founder of Mozart Data)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 104: Streamlining Machine Learning In Production with Ran Romano
Dec 9 2022
Episode 104: Streamlining Machine Learning In Production with Ran Romano
Show Notes(01:34) Ran reflected on his time working as a Technical Product Manager at the Israeli Intelligence army.(04:07) Ran recalled his favorite classes on Machine Learning and Computer Graphics during his education in Computer Science at Reichman University.(05:24) Ran talked about a valuable lesson learned as a Software Engineer at VMware's Cloud Provider Software Business Unit.(08:07) Ran shared his thoughts on how engineers could be more impactful in startup organizations.(09:52) Ran talked about his decision to join Wix.com to work as a software engineer focusing on data infrastructure.(12:48) Ran explained the motivation for building Wix's internal ML platform, designed to address the end-to-end ML workflow.(16:48) Ran discussed the main components of Wix's ML platform: feature store, CI/CD mechanism, UI management console, and API prediction service.(18:51) Ran unpacked the virtual feature store and the CI/CD components of Wix's ML platform.(24:41) Ran expanded on the distinction between virtual and materialized feature stores.(27:01) Ran provided three key lessons for organizations looking to build an internal ML platform (as brought upon his 2020 talk discussing Wix's ML Platform).(31:43) Ran shared the essential attributes of exceptional data and ML engineering talent.(33:54) Ran shared the founding story of Qwak, which aims to build an end-to-end ML engineering platform to automate the MLOps processes.(37:07) Ran talked about his responsibilities as the VP of Engineering at Qwak.(38:45) Ran dissected the key capabilities that are baked into the Qwak platform - a Build System, a Serving layer, a Data Lake, a Feature Store, and Automations capabilities.(44:05) Ran explained the big engineering challenges for teams to build an in-house feature store and envisioned the future of the feature store ecosystem in the upcoming years.(47:45) Ran shared valuable hiring lessons to attract the right people who are excited about Qwak's mission.(50:22) Ran reflected on the challenges for Qwak to find the early design partners.(52:43) Ran described the state of the ML Engineering community in Israel.(54:53) Closing segment.Ran's Contact InfoLinkedInQwak's ResourcesWebsite | Twitter | LinkedInWhy QwakBlogMentioned ContentTalks"Overview of Wix's Machine Learning Platform" (2020)"Feature Stores - Unified Data Pipelines for ML" (2022)PeopleAndrew NgMatei ZahariaBarr MosesBook"Principles" (by Ray Dalio)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 103: Computational Economics, Statistical Arbitrage, and Adaptable Data Consolidation with Eric Daimler
Nov 28 2022
Episode 103: Computational Economics, Statistical Arbitrage, and Adaptable Data Consolidation with Eric Daimler
Show Notes(02:15) Eric reflected on his early interest in computer science and his decision to study at Carnegie Mellon University in the early 90s.(05:40) Eric recalled his academic and overall college experience, emphasizing the importance of the people he was surrounded with.(08:22) Eric talked about his time working as a quant analyst early in his career, the moment he encountered the birth of the Mosaic browser, and his decision to join the tech industry.(13:01) Eric imparted wisdom learned from venture investing during the dot-com boom.(18:02) Eric talked about the next phase of his academic career - earning a Ph.D. in Computer Science from Carnegie Mellon and dropping out of a Ph.D. program at Stanford.(21:06) Eric discussed his academic research on Computational Economics for corporate malfeasance during his time as a Ph.D. student.(27:39) Eric shared different initiatives he worked on with Carnegie Mellon University - serving as the Assistant Dean and Assistant Professor of Software Engineering, launching CMU's Silicon Valley Campus, and founding CMU's Entrepreneurial Management program.(31:54) Eric described his journey in founding Hg Analytics, a hedge fund focused on statistical arbitrage, alongside other CMU's Computer Science PhDs.(37:36) Eric revisited his passion for AI and robotics, which eventually led to serving as a Presidential Innovation Fellow during the Obama Administration with the White House Office of Science and Technology Policy.(42:54) Eric shared his perspective on the role of AI in geopolitics and highlighted the challenges with data integration.(47:29) Eric explained his company Conexus, which develops a technology spin-off from MIT's Mathematics department using a branch of math called Category Theory.(50:55) Eric went over a customer case study that uses Conexus's solution to guarantee the semantics of data integrity during data transformation.(54:20) Eric showed his enthusiasm for the concept of data relationships.(56:59) Eric provided a sneak peek of his forthcoming book, "The Coming Composability: The roadmap for using technology to solve society's biggest problems."(58:38) Closing segment.Eric's Contact InfoTwitterLinkedInConexus' ResourcesWebsite | ResourcesMentioned ContentPeopleKai-Fu LeeAndrew NgEric XingBook"ReCulturing: Design Your Company Culture to Connect with Strategy and Purpose for Lasting Success" (by Melissa Daimler)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.
Episode 102: Early-Stage Investing, Modern Venture Capital, and Trends in Enterprise Infrastructure with Astasia Myers
Nov 23 2022
Episode 102: Early-Stage Investing, Modern Venture Capital, and Trends in Enterprise Infrastructure with Astasia Myers
Show Notes(01:56) Astasia shared her childhood growing up in Silicon Valley.(05:12) Astasia reflected on her undergraduate education at Stanford - studying Political Science and International Relations.(06:35) Astasia discussed her research at the Graduate Business School with Professor Condoleezza Rice on a case study called "San Leon Energy: Hydraulic Fracturing in Poland" - which explores how to manage the political risks of using a controversial energy extraction technology in the European Union.(09:26) Astasia talked about her year in the UK getting a Master's in Technology Policy at the University of Cambridge's Judge Business School.(12:52) Astasia recalled her experience as an Equity Research Analyst at Baird and Co.(17:49) Astasia mentioned her work at Cisco Investments, driving their cloud-infrastructure M&A and venture investments.(20:58) Astasia shared her thoughts on different M&A frameworks she learned from Cisco.(23:27) Astasia reflected on her decision to join Redpoint Ventures in early 2017, leading investments across developer tools, cloud infrastructure, data/ML infrastructure, AI applications, and cybersecurity.(25:44) Astasia debunked misconceptions about the venture industry.(29:30) Astasia discussed ways to prove her value upfront in potential deals and start forming her investment theses as a new investor.(33:01) Astasia dissected the key factors that triggered her to invest in the Series A of Solo.io and the Series B of LaunchDarkly (in the domain of cloud infrastructure).(38:48) Astasia explained her Series A investment in Hex and Series B investment in Preset (in the domain of data infrastructure).(44:12) Astasia shared advice she had given her portfolio companies in hiring decisions, pricing products, and navigating go-to-market strategy while at Redpoint.(47:36) Astasia walked through her process of writing comprehensive research primers in her Medium blog Memory Leak on wide-ranging topics - from data science notebooks and data orchestration to data pipelining and ML data management.(51:19) Astasia shared the typical challenges she has seen in companies looking to incorporate Product-Led Growth into their go-to-market motion.(54:10) Astasia discussed building a community as a fuel for product-led growth and shared advice to startups thinking about starting their community initiatives.(56:40) Astasia shared advice for hiring good DevRel practitioners.(01:00:15) Astasia shared advice for a smart, driven operator who wants to explore angel investing.(01:03:26) Astasia talked about her current journey as the Founding Partner at Quiet Capital, sitting on its early-stage enterprise team and leading opportunities across pre-seed, seed, Series A, and Series B.(01:05:13) Astasia expanded upon her typical mental checklist to evaluate entrepreneurs and make investment decisions.(01:07:36) Astasia briefly touched on LP fundraising for Quiet Capital to become a "modern venture firm."(01:09:59) Astasia emphasized her enthusiasm for the Data-Centric ML movement.(01:13:41) Closing segment.Astasia's Contact InfoLinkedInMediumTwitterQuiet CapitalWebsiteLinkedInTwitterMentioned ResourcesContentJohn Gannon BlogPeopleSatish Dharmaraj (Redpoint Ventures)Scott Raney (Redpoint Ventures)Amanda Robson (Cowboy Ventures)NotesMy conversation with Astasia was recorded back in April 2022. Since then, many things have happened. I'd recommend:Signing up for her Memory Leak newsletterBrowsing through Quiet Capital's new portfolio careers pageListening to Astasia's appearance on the Data Stack ShowChecking out Quiet Capital's investments in Edge Delta, Diagrid, and OmniLooking at her real-time infrastructure landscape
Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack with Tarush Aggarwal
Nov 7 2022
Episode 101: Scaling Data Engineering, Building Data Teams, and Managed Data Stack with Tarush Aggarwal
Show Notes(02:24) Tarush shared his upbringing in India and his decision to study abroad in the US.(03:51) Tarush walked through his college experience studying Computer Engineering at Carnegie Mellon University.(06:24) Tarush described the non-existent state of data infrastructure at Salesforce when he joined as the first data engineer in 2012.(11:21) Tarush went over his contribution to the automation and benchmarking frameworks over his tenure at Salesforce.(15:50) Tarush recalled lessons learned from building and managing a data team as a Data Manager at Wyng.(19:54) Tarush explained how a data team can serve other functional units more efficiently.(22:37) Tarush elaborated on his decision to adopt Looker for Wyng's Business Intelligence needs.(26:30) Tarush talked about his decision to join WeWork as their Director of Data Engineering in 2016.(30:39) Tarush went over the origin and evolution of Marquez - WeWork’s first open-source project around data lineage - during his time as the director of WeWork’s Data Platform team.(33:49) Tarush highlighted the main challenges of building an internal data platform.(35:43) Tarush recalled his move to China to help establish WeWork’s Asia operations and focus on the hyper-growing Chinese market.(39:01) Tarush shared the founding story of 5x during his sabbatical in 2020.(42:39) Tarush explained the industry's need for a managed data stack.(45:20) Tarush went over 5x’s process of sourcing, interviewing, and onboarding data engineers who are pre-trained on the modern data stack.(48:37) Tarush talked about finding the right vendors that make up the modern data stack to partner with.(50:06) Tarush walked through his production process to put together a lot of good videos to explain what 5x does and raise awareness about the company.(51:52) Closing segment.Tarush's Contact InfoLinkedInTwitterMedium5x ResourcesWebsite | LinkedIn | Twitter | YouTube | Instagram5x Explained in 2 MinutesManaged Data PlatformOn-Demand Data Engineering ServicesIntegrationsMentioned ContentPeopleGeorge Fraser and Taylor Brown (Founders of Fivetran)Prukalpa Sankar (Co-Founder and CEO of Atlan)Frank Slootman (CEO and Chairman of Snowflake)BooksStealing Fire (by Steven Kotler and Jamie Wheal)The 5 AM Club (by Robin Sharma)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.
Episode 100: Data-Centric Computer Vision, Productizing AI, and Scaling a Global Startup with Hyun Kim
Oct 28 2022
Episode 100: Data-Centric Computer Vision, Productizing AI, and Scaling a Global Startup with Hyun Kim
Show Notes(01:59) Hyun shared his upbringing and experience living in Korea, Singapore, and the US.(04:18) Hyun described his undergraduate experience at Duke University.(08:21) Hyun shared how he got a real taste of the game-changing potential of deep learning from the experience of bringing ML to diagnose Parkinson’s disease with brain MRI scans.(10:54) Hyun talked about his journey of leveling up coding and ML knowledge.(12:13) Hyun reflected on his motivation to pursue a Ph.D. program in computer science at Duke.(15:22) Hyun talked about his participation in the 2016 Amazon Robotics Challenge as the “Team Duke” leader and its Motion Planning function.(17:25) Hyun reflected on his decision to take a leave of absence from his Ph.D. program and return to Korea to work as an ML Research Engineer at the AI Research Lab of SK Telecom, a major Korean conglomerate.(19:46) Hyun discussed his research on game AI and synthetic image generation during his time with SK Telecom.(22:57) Hyun shared the founding story of Superb AI.(27:11) Hyun described going through the Y Combinator Winter 2019 batch.(32:25) Hyun unpacked the evolution of Superb AI’s Labeling platform since its inception.(34:47) Hyun walked through the process of prioritizing the product roadmap.(36:54) Hyun zoomed in on Superb AI’s automated labeling feature, Custom Auto-Label, which automatically detects and labels common or niche objects in images and videos.(40:21) Hyun touched on challenges with manually reviewing and auditing labels.(42:25) Hyun dissected the data-centric problems in computer vision that the newly released Superb DataOps platform is built to solve.(46:46) Hyun hinted at Superb AI’s product roadmap, judging from current industry-wide pain points.(48:53) Hyun highlighted a customer use case of Superb AI product offerings.(51:42) Hyun shared his vision of where Superb AI fits into the quickly evolving AI Infrastructure ecosystem.(54:15) Hyun shared valuable hiring lessons to attract people who are excited about Superb AI’s mission.(58:01) Hyun expanded his perspectives on defining and scaling a global company culture.(01:00:06) Hyun reflected on the challenges of running a remote-first company.(01:01:54) Hyun shared fundraising advice for founders seeking the right investors for their startups.(01:03:35) Hyun highlighted the difference between being a researcher and a founder.(01:05:08) Closing segment.Hyun’s Contact InfoLinkedInTwitterSuperb AI ResourcesWebsite | LinkedIn | Twitter | YouTube | GitHub | DocsSuperb AI Suite Labeling PlatformSuperb AI DataOps PlatformThe Ground Truth NewsletterSuperb AI AcademyMentioned ContentPeopleAndrew NgAndrej KarpathyIan GoodfellowBookZero To One (by Peter Thiel)About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to or browse the full guest list.
Episode 99: Data Mobility, Enterprise GTM, and Tech Leadership with Gary Hagmueller
Aug 30 2022
Episode 99: Data Mobility, Enterprise GTM, and Tech Leadership with Gary Hagmueller
Show Notes(01:45) Gary walked through his academic experience getting a Bachelor’s degree in Business Administration at Arizona State University and an MBA in Finance at USC — Marshall School of Business.(04:52) Gary recalled the most valuable lesson from leading a business development team in the enterprise offerings group at Verizon.(07:45) Gary recalled the challenges of bringing a company public during his time as the Director of Corporate Development at NorthPoint Communications.(12:18) Gary shared his learnings while holding a COO role at Vinfolio — an innovator in the wine Industry.(15:19) Gary talked about his responsibilities in the Chief Financial Officer roles at KnowNow and Zuora.(19:06) Gary gave advice to founders seeking the right investors for their startups.(23:51) Gary walked through the learning curves while serving as the CFO, CRO, and COO of enterprise AI pioneer Ayasdi.(31:06) Gary shared his playbook on building a well-oiled sales operations machine.(33:46) Gary shared his journey as a first-time CEO at CLARA Analytics.(36:37) Gary talked about his proudest accomplishments while driving significant growth for CLARA.(37:52) Gary discussed the go-to-market motions implemented at CLARA.(41:07) Gary walked through his brief stint as an Entrepreneur-In-Residence at Redpoint Ventures, a top-tier VC firm focused on early-stage investing.(44:14) Gary rationalized his decision to become the CEO of Arcion Labs in December 2021.(49:39) Gary explained the high-level architectural design of Arcion’s data mobility platform.(54:19) Gary discussed strategies for finding the right technology partners to collaborate with.(57:42) Gary highlighted a few customer use cases of Arcion.(01:01:48) Gary shared valuable hiring lessons to attract the right people who are excited about Arcion’s mission.(01:04:28) Gary distilled lessons learned while building a high-performance team at Arcion.(01:09:14) Gary described the benefits of adopting usage-based pricing in enterprise technology.(01:11:41) Closing segment.Gary’s Contact InfoLinkedInTwitterCrunchbaseArcion’s ResourcesWebsite | LinkedIn | Twitter | YouTube | Docs | Slack“Dawn of the Data Mobility Era” (Feb 2022)“Arcion lands $13M to help companies replicate data across platforms” (Venture Beat, Feb 2022)Mentioned ContentContentThe Network Effects Bible (by James Currier of NFX)Blog by Tomasz Tunguz of Redpoint VenturesPeopleGurjeet Singh (Co-Founder and CEO of Oma Robotics, Ex-CEO/Co-Founder of Ayasdi)Satish Dharmaraj (Managing Director at Redpoint Ventures)NotesMy conversation with Gary was recorded back in March 2022. Since then, many things have happened at Arcion. I’d recommend checking out:The introduction of Arcion Cloud.This article about data mobility on The New Stack.This article about change data capture on Venture Beat.This big product launch on Oracle log reader availability featured by VentureBeatThe article about the missing piece for the Modern Data Stack featured by CrunchbaseArcion is launched with Databricks Partner Connect, featured by DatanamiArcion is a proud sponsor of the Oracle Cloud World 2022 in Las Vegas, Oct 17–20. If any data professionals are attending the conference, they should stop by the Arcion booth to say hi!About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 98: Building Developer Tools, Managing Platform Products, Fostering Diversity, and Enabling Real-Time Data Applications with DeVaris Brown
Aug 17 2022
Episode 98: Building Developer Tools, Managing Platform Products, Fostering Diversity, and Enabling Real-Time Data Applications with DeVaris Brown
Show Notes(01:43) DeVaris reflected on his upbringing on the south side of Chicago and college experience at UIUC, studying Mathematics and Computer Science in the early 2000s.(06:46) DeVaris shared his journey of learning how to program, make computers, and dive into the Internet.(09:35) DeVaris recalled valuable lessons from interning at Intel and Cisco Systems.(15:49) DeVaris shared his proudest accomplishments during his five years at Microsoft — first as a system engineer and then as an academic developer evangelist.(22:06) DeVaris recalled his experience working in the gaming and music space as the Chief Developer Evangelist at Marmalade and the Chief Product Officer at Klick Push, respectively.(27:49) DeVaris provided his perspective on the startup acquisition process.(29:13) DeVaris unpacked his two years as a platform product manager at Zendesk, where he drove the adoption of the Zendesk Developer Platform for developers to create unique customer experiences.(35:43) DeVaris revealed the challenges of building a technical community, given his experience at Zendesk.(38:25) DeVaris recalled his time working for a year as the Lead Product Manager at VSCO — a startup that builds digital tools for the modern creative.(45:12) DeVaris went over the challenges of building software for brand ambassadors and children’s playtime, given his time as the Head of Product Management at Slyce.io and the CTO at Super Heroic.(49:39) DeVaris reflected on his desire to scratch his entrepreneurial itch.(52:00) DeVaris gave advice for early-career technologists on evaluating startup opportunities.(55:51) DeVaris unpacked the product challenges he encountered while building tools for developers as the Director of Product Management at Heroku.(58:57) DeVaris touched on his one year as the first platform engineering PM hire at Twitter.(01:02:18) DeVaris shared the founding story of Meroxa.(01:04:28) DeVaris dissected how Meroxa’s platform architecture is designed at a high level — including a change data capture service, schema registry, event streaming service, API proxy, and incident automation framework.(01:06:06) DeVaris explained the technical challenges associated with creating connections between data sources and destinations in real time.(01:08:37) DeVaris zoomed into Conduit — Meroxa’s open-source, single-binary data integration tool written in Golang that provides developer-friendly streaming data orchestration.(01:12:32) DeVaris highlighted a few customer use cases of Meroxa.(01:16:16) DeVaris shared valuable hiring lessons to attract the right people who are excited about Meroxa’s mission and fit with Meroxa’s cultural values.(01:18:37) DeVaris shared challenges to finding the early design partners & lighthouse customers for Meroxa.(01:20:24) DeVaris gave advice to founders seeking the right investors for their startups.(01:22:58) DeVaris gave advice to smart, driven operators looking to explore angel investing.(01:25:17) DeVaris discussed the remaining barriers that prevent minorities from pursuing a technology career.(01:30:42) DeVaris imparted lessons from photography and DJ that benefited his career in product.(01:32:26) Closing segment.DeVaris’ Contact InfoLinkedInTwitterWebsiteGitHubMeroxa’s ResourcesWebsite | LinkedIn | Twitter | YouTubeCareers | Medium BlogDocumentationConduit (GitHub | Discord | Twitter | Docs)Mentioned ContentArticles“Hello World, Meroxa Style” (April 2021)“Streaming Your Database Changes with Change Data Capture” (Part 1 + Part 2)“Conduit: Streaming Data Integration for Developers” (Jan 2022)“Why Conduit? An Evolutionary Leap Forward for Real-Time Data Integration” (Feb 2022)“Hello Meroxa 2.0” (April 2022)Resources for minoritiesKura Labs (A free training and job placement academy for Infrastructure Computing, DevOps, and SRE for students from underserved communities)Free Code Camp (Learn to code — for free)BooksZero To One (by Peter Thiel)The Hard Thing About Hard Things (by Ben Horowitz)PeopleTristan Handy (Co-Founder and CEO of dbt Labs)Arjun Narayan (Co-Founder and CEO of Materialize)Benn Stancil (Chief Analytics Officer at Mode Analytics)Chad Sanderson (Head of Data Platform at Convoy)NotesMy conversation with DeVaris was recorded back in April 2022. Since then, many things have happened at Meroxa. I’d recommend checking out:The introduction of Meroxa 2.0 and Turbine.This interview on data-driven work culture.New CDC Connectors built into Conduit.Meroxa is a recipient of DoD funding to help the US Space Force monitor aircraft health in real-time.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 97: Escaping Poverty, Embracing Digital Learning, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman
Aug 2 2022
Episode 97: Escaping Poverty, Embracing Digital Learning, Benchmarking ML Systems, and Advancing Data-Centric AI with Cody Coleman
Show Notes(01:49) Cody shared his upbringing in New Jersey, his childhood interest in science and technology, and the few people who have made big differences in his story.(09:35) Cody went over his academic experience studying Electrical Engineering and Computer Science at MIT.(17:51) Cody recalled his favorite classes taken at MIT.(22:43) Cody talked about his engagement in serving as the president of MIT’s chapter of Eta Kappa Nu Honor Society and advancing online education at the MIT Office of Digital Learning.(31:25) Cody is bullish on the future of digital learning.(35:43) Cody expanded on his internships with Google throughout his time at MIT — doing local search quality and YouTube analytics.(42:31) Cody described the challenges of dealing with high-frequency trading data from his one year working as a junior data scientist at the Vendor Data Group of Jump Trading in Chicago.(46:50) Cody reflected on his decision to embark on a Ph.D. journey in Computer Science at Stanford University.(51:54) Cody mentioned his participation in the DAWN project, specifically DAWNBench, an end-to-end deep learning benchmark and competition.(54:21) Cody unpacked the evolution of MLPerf, an industry-standard benchmark for the training and inference performance of ML models.(56:52) Cody walked through the motivation and empirical work in his paper “Selection via Proxy: Efficient Data Selection for Deep Learning.”(59:34) Cody discussed his paper “Similarity Search for Efficient Active Learning and Search of Rare Concepts.”(01:06:32) Cody shared his learnings about bringing ML from research to industry from his advisors, Matei Zaharia and Peter Bailis — who were both academics and startup founders simultaneously.(01:09:19) Cody went over key trends in the emerging Data-Centric AI community — given his involvement with the Data-Centric AI workshop at NeurIPS 2021 and the DataPerf benchmark suite.(01:12:19) Cody shared lessons learned about finding product-market fit as the founder of Coactive AI — which brings unstructured data into the world of SQL and the big data tools that teams already love.(01:15:34) Cody emphasized the importance of focusing on the HR function and defining cultural guiding principles for any early-stage startup founder.(01:21:05) Cody provided his perspective on the differences and similarities between being a researcher and a founder.(01:23:47) Closing segment.Cody’s Contact InfoWebsiteTwitterLinkedInGoogle ScholarCoactive AI’s ResourcesWebsiteTwitterLinkedInCulture ValuesMentioned ContentTalk“Digging Deeper: How a Few Extra Moments Can Change Lives” (TEDxStanford 2017)“Data Selection for Data-Centric AI” (Stanford MLSys 2022)Research“Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification” (2015)DAWNBench: An End-to-End Deep Learning Benchmark and Competition (Dec 2017)“MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance” (Feb 2020)“Selection via Proxy: Efficient Data Selection for Deep Learning” (Oct 2020)“Similarity Search for Efficient Active Learning and Search of Rare Concepts” (July 2021)DataPerf, a new benchmark suite for machine learning datasets and data-centric algorithms (Dec 2021)PeopleMatei Zaharia (Cody’s Ph.D. Advisor, Co-Creator of Apache Spark, Co-Founder of Databricks)Fei-Fei Li (Professor of Computer Science at Stanford, Creator of ImageNet Dataset)Michael Bernstein (Professor of Computer Science at Stanford with a focus on Human-Computer Interaction)Books“No Rule Rules: Netflix and the Culture of Reinvention” (by Reed Hastings)“What You Do Is Who You Are: How to Create Your Work Business Culture” (by Ben Horowitz)“The Inner Game of Tennis: The Classical Guide to Peak Performance” (by Timothy Gallwey)NotesMy conversation with Cody was recorded back in January 2022. Since then, many things have happened at Coactive AI. I’d recommend:Attending Cody’s upcoming talk at Snorkel’s The Future of Data-Centric AI.Reviewing the DataPerf workshop at ICML 2022.Reading the CoactiveAI blog post on bringing UI props to MLOps.Watching Cody’s CBS News interview back in February 2022.About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 96: Data Science Training and The Power of Education with Merav Yuravlivker
Jul 14 2022
Episode 96: Data Science Training and The Power of Education with Merav Yuravlivker
Show Notes(02:18) Merav talked about her undergraduate experience at McGill University studying Psychology and Sociology.(04:33) Merav discussed important attributes of an exceptional teacher given her two years teaching elementary special education in NYC public schools through the Teach For America program.(08:19) Merav commented on her time working at the International Baccalaureate Organization and working as a Kaplan GRE instructor.(10:57) Merav shared the backstory behind the founding of Data Society, a predictive analytics training and consulting company (co-founded with Dmitri Adler and John Nader).(14:15) Merav reflected on her journey into programming.(17:16) Merav explained why data science training should be industry-tailored for maximum success.(20:57) Merav talked about how Data Society creates and evaluates its training curriculum.(23:59) Merav provided an example of how Data Society provides customized AI solutions to inform decisions, automate time-consuming manual processes, and solve complex data challenges for its clients.(27:38) Merav brought up challenges that hinder the adoption of data science in the government sector.(29:49) Merav unpacked the six different steps for organizations to start moving up the data analytics maturity model.(33:07) Merav dissected meldR, Data Society’s internal product built for Learning and Development teams in healthcare.(36:24) Merav reflected on bootstrapping Data Society in the early days (look at this 2016 Kickstarter campaign).(39:48) Merav discussed the shift from a B2C to a B2B model for Data Society and scoring partnerships with Fortune 500 companies and federal agencies.(42:47) Merav shared valuable hiring lessons to attract the right people who are excited about the mission of Data Society.(45:22) Merav shared her experience shaping the remote work culture.(49:05) Merav touched on initiatives at Data Society to bring more goodness to the world.(50:28) Merav provided different ways to engage more women in data science (via the Women Data Scientists DC Meetup and DCFemTech).(53:17) Merav predicted the evolution of education in the next 3 to 5 years.(55:29) Closing segment.Merav’s Contact InfoLinkedInTwitterData Society’s ResourcesWebsiteTwitterLinkedInMentioned ContentArticles“Is Your Enterprise Data-Driven?” (May 2021)“Why Data Science Training Should Be Industry-Tailored for Maximum Success” (August 2021)“Female Founders: Merav Yuravlivker of Data Society On The Five Things You Need To Thrive and Succeed as a Woman Founder” (Sep 2021)PeopleDJ Patil (The first Chief Data Scientist of the US)Hilary Mason (Co-Founder of Hidden Door)Avriel Epps-Darling (Ph.D. candidate, Ford fellow, and Presidential Scholar at Harvard University)BookWeapons of Math Destruction (by Cathy O’Neil)NotesMy conversation with Merav was recorded back in December 2021. Since then, many things have happened at Data Society. I’d recommend:Reading Merav’s articles on Forbes about creating a culture of data sharing, assessing data literacy, and communication in the learning process.Reading Data Society’s white papers about data science in research and data science in healthcare.Checking out the Camelsback product for risk assessment in financial services.Trying out the Data DNA assessment tool for organizations’ data maturity.Finally, Merav was also just recognized as one of the DC region's 40 Under 40. The awards are given annually to recognize the outstanding achievements of young leaders in the Washington, DC, area who lead the community forward through hard work, philanthropy, and community engagement.
Episode 95: Open-Source DataOps, Building In Public, and Remote Work Culture with Douwe Maan
Jul 1 2022
Episode 95: Open-Source DataOps, Building In Public, and Remote Work Culture with Douwe Maan
Show Notes(01:46) Douwe went over formative experiences catching the programming virus at the age of 9, combining high school with freelance web development, and studying Computer Science at Utrecht University in college.(03:55) Douwe shared the story behind founding a startup called Stinngo, which led him to join GitLab in 2015 as employee number 10.(05:29) Douwe provided insights on attributes of exceptional engineering talent, given his time hiring developers and eventually becoming GitLab's first Development Lead.(08:28) Douwe unpacked the evolution of his engineering career at GitLab.(11:11) Douwe discussed the motivation behind the creation of the Meltano project in August 2018 to help GitLab's internal data team address the gaps that prevent them from understanding the effectiveness of business operations.(14:38) Douwe reflected on his decision in 2019 to leave GitLab’s engineering organization and join the then 5-people Meltano team full-time.(20:24) Douwe shared the details about Meltano's product development journey from its Version 1 to its pivot.(26:18) Douwe reflected on the mental aspect of being the sole person whom Meltano depended on for a while.(29:20) Douwe explained the positioning of Meltano as an open-source self-hosted platform for running data integration and transformation pipelines.(34:54) Douwe shared details of Meltano's ideal customer profiles.(37:45) Douwe provided a quick tour of the Meltano project, which represents the single source of truth regarding one's ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various plugins that make up the pipelines should be configured.(40:39) Douwe unpacked different components of Meltano's product strategy, including Meltano SDK, Meltano Hub, and Meltano Labs.(45:05) Douwe discussed prioritizing Meltano's product roadmap in order to bring DataOps functionality to every step of the entire data lifecycle.(48:53) Douwe shared the story behind spinning Meltano out of GitLab in June 2021 and raising a $4.2M Seed funding round led by GV to bring the benefits of open source data integration and DataOps to a wider audience.(52:19) Douwe provided his thoughts behind open-source contributors in a way that can generate valuable product feedback for Meltano.(55:43) Douwe shared valuable hiring lessons to attract the right people who align with Meltano's values.(59:04) Douwe shared advice to startup CEOs who are experimenting with the remote work culture in our “new-normal” virtual working environments.(01:04:10) Douwe unpacked Meltano's mission and vision as outlined in this blog post.(01:06:40) Closing segment.Douwe's Contact InfoGitLabLinkedInTwitterGitHubWebsiteMeltano's ResourcesWebsite | Twitter | LinkedIn | GitHub | YouTubeMeltano Documentation | Product | DataOpsMeltano SDK | Meltano Hub | Meltano LabsCompany Handbook | Community | Values | CareersMentioned ContentArticlesHey, data teams - We're working on a tool just for you (Aug 2018)To-do zero, inbox zero, calendar zero: I think that means I'm done (Sep 2019)Meltano graduates to Version 1.0 (Oct 2019)Revisiting the Meltano strategy: a return to our roots (May 2020)Why we are building an open-source platform for ELT pipelines (May 2020)Meltano spins out of GitLab, raises seed funding to bring data integration into the DataOps era (June 2021)Meltano: The strategic foundation of the ideal data stack (Oct 2021)Introducing your DataOps platform infrastructure: Our strategy for the future of data (Nov 2021)Our next step for building the infrastructure for your Modern Data Stack (Dec 2021)PeopleMaxime Beauchemin (Founder and CEO of Preset, Creator of Apache Airflow and Apache Superset, Angel Investor in Meltano)Benn Stancil (Chief Analytics Officer at Mode Analytics, Well-Known Substack Writer)The entire team at dbt LabsNotesMy conversation with Douwe was recorded back in November 2021. Since then, many things have happened at Meltano. I'd recommend:Checking out their updated company valuesReading Douwe's article about the DataOps Operating System on The New StackExamining Douwe's blog post about moving Meltano to GitHubLooking over the announcement of Meltano 2.0 and the additional seed fundingAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 94: Modern Metadata Management, Open-Source Adoption, and Early-Stage Culture with Mars Lan
Jun 20 2022
Episode 94: Modern Metadata Management, Open-Source Adoption, and Early-Stage Culture with Mars Lan
Show Notes(01:41) Mars walked through his education studying Computer Systems Engineering at The University of Auckland in New Zealand.(03:16) Mars reflected on his overall Ph.D. experience in Computer Science at UCLA.(05:55) Mars discussed his early research paper on a robust and scalable lane departure warning system for smartphones.(07:13) Mars described his work on SmartFall, an automatic fall detection system to help prevent the elderly from falling.(08:34) Mars explained his project WANDA, an end-to-end remote health monitoring and analytics system designed for heart failure patients.(10:06) Mars recalled learnings from interning as a software engineer at Google during his Ph.D.(14:54) Mars discussed engineering challenges while working on PHP for Google App Engine and Gboard personalization during his subsequent four years at Google.(19:05) Mars rationalized his decision to join LinkedIn to lead an engineering team that builds the core metadata infrastructure for the entire organization.(21:15) Mars discussed the motivation behind the creation of LinkedIn’s generalized metadata search and discovery tool, DataHub, later open-sourced in 2020.(25:21) Mars dissected the key architecture of DataHub, which is designed to address the key scalability challenges coming in four different forms: modeling, ingestion, serving, and indexing.(28:50) Mars expressed the challenges of finding DataHub’s early adopters internally at LinkedIn and externally later on at other companies.(35:22) Mars shared the story behind the founding of Metaphor Data, which he co-founded with Pardhu Gunnam and Seyi Adebajo and currently serves as the CTO.(41:55) Mars unpacked how Metaphor’s modern metadata platform serves as a system of record for any organization’s data ecosystem.(48:07) Mars described new challenges with metadata management since the introduction of the modern data stack and key features of a great modern metadata platform (as brought up in his in-depth blog post with Ben Lorica).(53:55) Mars explained how a modern metadata platform fits within the broader data ecosystem.(58:30) Mars shared the hurdles to finding Metaphor Data’s early design partners and lighthouse customers.(01:04:33) Mars shared valuable hiring lessons to attract the right people who are excited about Metaphor’s mission.(01:07:28) Mars shared important culture-building lessons to build out a high-performing team at Metaphor.(01:10:45) Mars shared fundraising advice for founders currently seeking the right investors for their startups.(01:13:22) Closing segment.Mars’ Contact InfoTwitterLinkedInGoogle ScholarGitHubMetaphor DataWebsite | Twitter | LinkedInCareers | About PageData Documentation | Data CollaborationMentioned ContentArticlesDataHub: A generalized metadata search and discovery tool (Aug 2019)Open-sourcing DataHub: LinkedIn’s metadata search and discovery platform (Feb 2020)Founding Metaphor Data (Dec 2020)Metaphor and Soda partner to unify the modern data stack with trusted data (Dec 2021)Introducing Metaphor: The Modern Metadata Platform (Nov 2021)The Modern Metadata Platform: What, Why, and How? (Jan 2022)PapersSmartLDWS: A robust and scalable lane departure warning system for the smartphones (Oct 2009)SmartFall: An automatic fall detection system based on subsequence matching for the SmartCane (April 2009)WANDA: An end-to-end remote health monitoring and analytics system for heart failure patients (Oct 2012)PeopleBenn Stancil (Chief Analytics Officer at Mode Analytics, Well-Known Substack Writer)Tristan Handy (Co-Founder and CEO of dbt Labs, Writer of The Analytics Engineering Roundup)Andy Pavlo (Associate Professor of Database at Carnegie Mellon University)Books“Working In Public” (by Nadia Eghbal)“The Mom Test” (by Rob Fitzpatrick)“A Thousand Brains” (by Jeff Hawkins)“The Scout Mindset” (by Julia Galef)NotesMy conversation with Mars was recorded back in January 2022. Since then, many things have happened at Metaphor Data. I’d recommend:Visiting their brand new websiteReading the 3-part “Data Documentation” series on their blog (part 1, part 2, and part 3)Looking over the Trusted Data landing pageAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 93: Open-Source Development, Human-Centric AI, and Modern ML Infrastructure with Ville Tuulos
Jun 8 2022
Episode 93: Open-Source Development, Human-Centric AI, and Modern ML Infrastructure with Ville Tuulos
Show Notes(01:35) Ville recalled his education getting degrees in Computer Science from the University of Helsinki in Finland.(04:35) Ville walked over his time working at a startup called Gurusoft that planned to commercialize self-organizing maps, a peculiar artificial neural network.(07:17) Ville reflected on his four years as a researcher at Nokia — working on big data infrastructure, analytics, and ML open-source projects (such as Disco and Ringo).(11:56) Ville shared the story of co-founding a startup that built a novel scriptable data platform called Bitdeli with his brother and not finding a product-market fit.(13:58) Ville walked through AdRoll’s acquisition of Bitdeli in June 2013.(15:49) Ville discussed the engineering challenges associated with his work at AdRoll — AdRoll Prospecting and traildb.io.(19:33) Ville mentioned the product and leadership/management lessons during his time being AdRoll’s Head of Data and leading various data/ML efforts.(24:43) Ville rationalized his decision to join the ML Infrastructure team at Netflix in 2017.(27:26) Ville discussed the motivation behind the creation of Netflix’s human-centric ML infrastructure, Metaflow, later open-sourced in 2019.(30:21) Ville unpacked the key design principles that summarize the philosophy of Metaflow, which is influenced by the unique culture at Netflix.(35:00) Ville talked about his well-known diagram on the data infrastructure’s hierarchy of needs.(37:33) Ville examined the technical details behind Metaflow’s integration with AWS to make it easy for users to move back and forth between their local and remote modes of development and execution.(40:58) Ville expressed the challenges of finding Metaflow’s early adopters internally at Netflix and externally later on at other companies.(45:13) Ville went over the strategy around prioritizing features for Metaflow’s future roadmap.(52:22) Ville shared the story behind the founding of Outerbounds, which he co-founded with Savin Goyal and Oleg Avdeev.(55:03) Ville provided his thoughts behind Metaflow’s contributors in a way that can generate valuable product feedback for Outerbounds.(58:30) Ville shared valuable hiring lessons to attract the right people who are excited about Outerbounds’ mission.(01:01:28) Ville shared upcoming initiatives that he is most excited about for Outerbounds.(01:04:05) Ville walked through his writing process for an upcoming technical book with Manning called “Effective Data Science Infrastructure,” a hands-on guide to assembling infrastructure for data science and machine learning applications.(01:06:34) Ville unpacked his great O’Reilly article that digs deep into the fundamentals of ML as an engineering discipline.(01:11:03) Closing segment.Ville’s Contact InfoLinkedInTwitterGitHubOuterboundsWebsite | Twitter | LinkedIn | GitHub | YouTubeMetaflow GitHub | Metaflow DocsSlack CommunityCareersMetaflow Resources for Data ScienceMetaflow Resources for EngineeringMentioned ContentTalksSF Data Mining Meetup: TrailDB — Processing Trillions of Events at AdRoll (July 2016)QConSF 2018: Human-Centric Machine Learning Infrastructure @Netflix (Feb 2019)AWS re:Invent 2019: More Data Science with Less Engineering — ML Infrastructure at Netflix (Dec 2019)Scale By The Bay 2019: Human-Centric ML Infrastructure at Netflix (Jan 2020)AICamp: Metaflow — The ML Infrastructure at Netflix (Aug 2021)ArticlesOpen-Sourcing Metaflow, a Human-Centric Framework for Data Science (Netflix Tech Blog, Dec 2019)Unbundling Data Science Workflows with Metaflow and AWS Step Functions (Netflix Tech Blog, July 2020)MLOps and DevOps: Why Data Makes It Different (O’Reilly, Oct 2021)PeopleMichael Jordan (Distinguished Professor in EECS and Statistics at UC Berkeley)Matthew Honnibal and Ines Montani (Creators of open-source NLP library spaCy)Hadley Wickham (Chief Scientist at RStudio and Adjunct Professor of Statistics at Rice University)Book“The Mom Test” (by Rob Fitzpatrick)NotesMy conversation with Ville was recorded back in October 2021. Since then, many things have happened at Outerbounds. I’d recommend:Visiting Outerbounds’ new website with Metaflow resources for Data Science and EngineeringWatching Ville’s recent talk at Data Council Austin about the Modern Stack for ML InfrastructureBuying Ville’s newly released book “Effective Data Science Infrastructure”About the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.
Episode 92: Analytics Engineering, Locally Optimistic, and Marketing-Mix Modeling with Michael Kaminsky
May 29 2022
Episode 92: Analytics Engineering, Locally Optimistic, and Marketing-Mix Modeling with Michael Kaminsky
Show Notes(01:48) Mike recalled his undergraduate experience studying Economics at Arizona State University and doing research on statistics/econometrics.(04:59) Mike reflected on his three years working as an analyst in the Boston office of the Analysis Group.(09:08) Mike discussed how he leveled up his programming skills at work.(11:05) Mike shared his learnings about building effective data-driven products while working as a data scientist at Case Commons.(17:20) Mike revisited his transition to a new role as the Director of Analytics at Harry’s, the men’s grooming brand — starting a new data team from scratch.(23:04) Mike unpacked analytics and infrastructure challenges during his time at Harry’s — developing the data warehouse, an internal marketing attribution tool, and a fleet of systems for automated decision-making to improve efficiency.(27:21) Mike reasoned his move to Mexico City — spending time practicing Spanish, among other things.(32:22) Mike talked about his journey of starting a new consulting practice to help companies get more value out of their data, which was primarily shaped by his network.(36:30) Mike shared the founding story behind Recast, whose mission is to help modern brands improve the effectiveness of their marketing dollars.(42:09) Mike dissected the core technical problem that Recast is addressing: performing media mix modeling in the context of “programmatic” channels.(46:14) Mike shared the story behind the inception and evolution of Locally Optimistic, a community for current and aspiring data analytics leaders.(49:29) Mike walked through his 3-part blog series on Agile Analytics — discussing the good aspects, the bad aspects, and the adjustments needed for analytics teams to adopt the Scrum methodology.(53:25) Mike unpacked his post “A Culture of Partnership,” — which discusses the three key activities that can help an analytics team identify the most important opportunities in the business and work effectively with key stakeholders and partner teams to drive value.(57:25) Mike examined his seminal piece called “The Analytics Engineer,” which generated much attention from the analytics community — which argues that the analytics engineer can provide a multiplier effect on the output of an analytics team.(01:03:24) Mike shared the motivation and pedagogical philosophy behind the Analytics Engineers Club (co-founded with Claire Carroll), which provides a training course for data analysts looking to improve their engineering skills.(01:07:57) Mike anticipated the evolution of the quickly evolving modern data stack (read his Fivetran article “The Modern Data Science Stack”).(01:09:22) Mike unpacked how organizations can build, start, and maintain the data quality flywheel (read his Datafold article “The Data Quality Flywheel”).(01:11:40) Mike shared his thoughts regarding the challenge of sharing complex analyses.(01:13:15) Closing segment.Mike’s Contact InfoTwitterWebsiteLinkedInGitHubFurther ResourcesRecastLocally OptimisticAnalytics Engineers ClubMentioned ContentArticles“Learning a language is hard” (Personal Blog, Jan 2020)“Modern Media Mix Modeling” (Recast Blog)“Agile Analytics, Part 1: The Good Stuff” (Locally Optimistic Blog, May 2018)“Agile Analytics, Part 2: The Bad Stuff” (Locally Optimistic Blog, June 2018)“Agile Analytics, Part 3: The Adjustments” (Locally Optimistic Blog, July 2018)“A Culture of Partnership” (Locally Optimistic Blog, March 2019)“The Analytics Engineer” (Locally Optimistic Blog, Jan 2019)“Data Education Is Broken” (Analytics Engineering Club, June 2021)“Teaching The Real Tools” (Analytics Engineering Club, Aug 2021)“The Modern Data Science Stack” (Fivetran Blog, Oct 2020)“The Data Quality Flywheel” (Datafold Blog, Nov 2020)“Knowledge Sharing” (Personal Blog, Sep 2020)“TDD for ELT” (Personal Blog, Sep 2020)“Are Data Catalogs Curing the Symptom or the Disease?” (Personal Blog, Dec 2020)PeopleClaire Carroll (Co-Instructor of Analytics Engineering Club, Product Manager of Hex, previous Community Manager of dbt Labs)Drew Banin (Head of Product at dbt Labs)Barry McCardel (Co-Founder and CEO of Hex)NotesMy conversation with Michael was recorded back in October 2021. Since then, Michael has been active in his work projects. I’d recommend:Following the Analytics Engineering Club for upcoming sessions (They are currently teaching their second summer cohort)Reading his collaboration blog post with Reforge on the attribution stackConsuming his Recast content explaining why marketing-mix modeling is hard and laying out the checklist for evaluating an MMM vendorAbout the showDatacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:Listen on SpotifyListen on Apple PodcastsListen on Google PodcastsIf you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.