Data-centric NLP in the era of LLMs

D4 Data Podcast

Jan 20 2023 • 32 mins

In this episode, we dive into the world of data-centric AI and NLP, exploring the ethical considerations, limitations, and trade-offs that come with this approach. We start off discussing data-centric AI, and the ethical considerations that come with it in the context of NLP. We also delve into the limitations of data-centric approaches in NLP, and the trade-offs between using a larger dataset versus a curated, smaller dataset. We also touch on the problem of concept drift and the potential for generating biased or misleading data while augmenting data. We also discuss the importance of ensuring the quality of labels obtained through crowdsourcing and how to achieve that.

Daniel Vila Suero shared with us why Argilla is open source and how Argilla is adapting to the rapidly changing landscape in the AI industry. Join us as we explore the challenges and opportunities in data-centric NLP and discover how Argilla is pushing the boundaries of what's possible with these models.

Daniel Vila Suero is the CEO and co-founder of Argilla. During his PhD thesis on Artificial Intelligence (UPM, 2016), Daniel designed, developed and deployed the end-to-end framework powering the service from the National Library of Spain. Daniel left academia in 2017 to found Argilla.

