Streaming Audio: A Confluent podcast about Apache Kafka®

Confluent, original creators of Apache Kafka®

Streaming Audio is a podcast from Confluent, the team that originally built Apache Kafka. Host Tim Berglund (Senior Director of Developer Advocacy, Confluent) and guests unpack a variety of topics surrounding Apache Kafka, event stream processing, and real-time data. The show covers frequently asked questions and comments about the Confluent and Kafka ecosystems—from Kafka connectors to distributed systems, data integration, Kafka deployment, and managed Apache Kafka as a service—on Twitter, YouTube, and elsewhere. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine
Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system. In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Kafka connectors to connect to various sources and target systems. Kafka connectors: Building two major connectors for the data pipeline, including a source connector to connect the legacy data source to stream data into Kafka, and another target connector to pipe data from Kafka back into the legacy architecture. Algorithm: Implementing Kafka Streams applications to migrate data from a monolithic architecture to a stream processing architecture. Data join: Leveraging Kafka Connect and the JDBC source connector to bring in all data streams to complete the pipeline.Streams join: Using ksqlDB to join streams—the legacy data system continues to produce streams while the Kafka data pipeline is another stream of data. As a final tip, Danica suggests breaking algorithms into process steps. She also describes how her experience relates to the data pipelines course on Confluent Developer and encourages anyone who is interested in learning more to check it out. EPISODE LINKSData Pipelines courseGuided Exercise on Building Streaming Data PipelinesMigrating from a Legacy System to Kafka StreamsWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
5d ago
29 mins
Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik
Getting data from a database management system (DBMS) into Apache Kafka® in real time is a subject of ongoing innovation. John Neal (Principal Solution Architect, Qlik) and Adam Mayer (Senior Technical Producer Marketing Manager, Qlik) explain how leveraging change data capture (CDC) for data ingestion into Kafka enables real-time data-driven insights. It can be challenging to ingest data in real time. It is even more challenging when you have multiple data sources, including both traditional databases and mainframes, such as SAP and Oracle. Extracting data in batch for transfer and replication purposes is slow, and often incurs significant performance penalties. However, analytical queries are often even more resource intensive and are prohibitively expensive to run on production transactional databases. CDC enables the capture of source operations as a sequence of incrementing events, converting the data into events to be written to Kafka. Once this data is available in the Kafka topics, it can be used for both analytical and operational use cases. Data can be consumed and modeled for analytics by individual groups across your organization. Meanwhile, the same Kafka topics can be used to help power microservice applications and help ensure data governance without impacting your production data source. Kafka makes it easy to integrate your CDC data into your data warehouses, data lake, NoSQL database, microservices, and any other system. Adam and John highlight a few use cases where they see real-time Kafka data ingestion, processing, and analytics moving the needle—including real-time customer predictions, supply chain optimizations, and operational reporting. Finally, Adam and John cap it off with a discussion on how capturing and tracking data changes are critical for your machine learning model to enrich data quality. EPISODE LINKSFast Track Business Insights with Data in MotionWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Jan 6 2022
34 mins
Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris
It’s been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka was often viewed as a simple pipe that connected databases together, which allows for easy and scalable data migration. As the Kafka ecosystem evolves with added components like ksqlDB, Kafka Streams, and Kafka Connect, the implementation of Kafka goes beyond being just a pipe—it’s an intelligent pipe that enables real-time, actionable data insights.Fotios shares a couple of use cases showcasing how Kafka solves the problems that many banks are facing today. One of his customers transformed retail banking by using Kafka as the architectural base for storing all data permanently and indefinitely. This approach enables data in motion and a better user experience for frontend users while scrolling through their transaction history by eliminating the need to download old statements that have been offloaded in the cloud or a data lake. Kafka also provides the best of both worlds with increased scalability and strong message delivery guarantees that are comparable to queuing middleware like IBM MQ and TIBCO. In addition to use cases, Tim and Fotios talk about deploying Kafka for banks within the cloud and drill into the profession of being a solutions engineer. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Dec 28 2021
34 mins
Running Hundreds of Stream Processing Applications with Apache Kafka at Wise
What’s it like building a stream processing platform with around 300 stateful stream processing applications based on Kafka Streams? Levani Kokhreidze (Principal Engineer, Wise) shares his experience building such a platform that the business depends on for multi-currency movements across the globe. He explains how his team uses Kafka Streams for real-time money transfers at Wise, a fintech organization that facilitates international currency transfers for 11 million customers. Getting to this point and expanding the stream processing platform is not, however, without its challenges. One of the major challenges at Wise is to aggregate, join, and process real-time event streams to transfer currency instantly. To accomplish this, the Wise relies on Apache Kafka® as an event broker, as well as Kafka Streams, the accompanying Java stream processing library. Kafka Streams lets you build event-driven microservices for processing streams, which can then be deployed alongside the Kafka cluster of your choice. Wise also uses the Interactive Queries feature in Kafka streams, to query internal application state at runtime. The Wise stream processing platform has gradually moved them away from a monolithic architecture to an event-driven microservices model with around 400 total microservices working together. This has given Wise the ability to independently shape and scale each service to better serve evolving business needs. Their stream processing platform includes a domain-specific language (DSL) that provides libraries and tooling, such as Docker images for building your own stream processing applications with governance. With this approach, Wise is able to store 50 TB of stateful data based on Kafka Streams running in Kubernetes. Levani shares his own experiences in this journey with you and provides you with guidance that may help you follow in Wise’s footsteps. He covers how to properly delegate ownership and responsibilities for sourcing events from existing data stores, and outlines some of the pitfalls they encountered along the way. To cap it all off, Levani also shares some important lessons in organization and technology, with some best practices to keep in mind. EPISODE LINKSKafka Streams 101 courseReal-Time Stream Processing with Kafka Streams ft. Bill BejeckWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Dec 21 2021
31 mins
Lessons Learned From Designing Serverless Apache Kafka ft. Prachetaa Raghavan
You might call building and operating Apache Kafka® as a cloud-native data service synonymous with a serverless experience. Prachetaa Raghavan (Staff Software Developer I, Confluent) spends his days focused on this very thing. In this podcast, he shares his learnings from implementing a serverless architecture on Confluent Cloud using Kubernetes Operator. Serverless is a cloud execution model that abstracts away server management, letting you run code on a pay-per-use basis without infrastructure concerns. Confluent Cloud's major design goal was to create a serverless Kafka solution, including handling its distributed state, its performance requirements, and seamlessly operating and scaling the Kafka brokers and Zookeeper. The serverless offering is built on top of an event-driven microservices architecture that allows you to deploy services independently with your own release cadence and maintained at the team level.There are 4 subjects that help create the serverless event streaming experience with Kafka:Confluent Cloud control plane: This Kafka-based control plane provisions resources required to run the application. It automatically scales resources for services, such as managed Kafka, managed ksqlDB, and managed connectors. The control plane and data plane are decoupled—if a single data plane has issues, it doesn’t affect the control plane or other data planes. Kubernetes Operator: The operator is an application-specific controller that extends the functionality of the Kubernetes API to create, configure, and manage instances of complex applications on behalf of Kubernetes users. The operator looks at Kafka metrics before upgrading a broker at a time. It also updates the status on cluster rebalancing and on shrink to rebalance data onto the remaining brokers. Self-Balancing Clusters: Cluster balance is measured on several dimensions, including replica counts, leader counts, disk usage, and network usage. In addition to storage rebalancing, Self-Balancing Clusters are essential to making sure that the amount of available disk and network capability is satisfied during any balancing decisions. Infinite Storage: Enabled by Tiered Storage, Infinite Storage rebalances data fast and efficiently—the most recently written data is stored directly on Kafka brokers, while older segments are moved off into a separate storage tier.  This has the added bonus of reducing the shuffling of data due to regular broker operations, like partition rebalancing. EPISODE LINKSMaking Apache Kafka Serverless: Lessons From Confluent CloudCloud-Native Apache KafkaJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)Watch the video version of this podcast
Dec 14 2021
28 mins
Using Apache Kafka as Cloud-Native Data System ft. Gwen Shapira
What does cloud native mean, and what are some design considerations when implementing cloud-native data services? Gwen Shapira (Apache Kafka® Committer and Principal Engineer II, Confluent) addresses these questions in today’s episode. She shares her learnings by discussing a series of technical papers published by her team, which explains what they’ve done to expand Kafka’s cloud-native capabilities on Confluent Cloud. Gwen leads the Cloud-Native Kafka team, which focuses on developing new features to evolve Kafka to its next stage as a fully managed cloud data platform. Turning Kafka into a self-service platform is not entirely straightforward, however, Kafka’s early day investment in elasticity, scalability, and multi-tenancy to run at a company-wide scale served as the North Star in taking Kafka to its next stage—a fully managed cloud service where users will just need to send us their workloads and everything else will magically work. Through examining modern cloud-native data services, such as Aurora, Amazon S3, Snowflake, Amazon DynamoDB, and BigQuery, there are seven capabilities that you can expect to see in modern cloud data systems, including: Elasticity: Adapt to workload changes to scale up and down with a click or APIs—cloud-native Kafka omits the requirement to install REST Proxy for using Kafka APIsInfinite scale: Kafka has the ability to elastic scale with a behind-the-scene process for capacity planning Resiliency: Ensures high availability to minimize downtown and disaster recovery Multi-tenancy: Cloud-native infrastructure needs to have isolations—data, namespaces, and performance, which Kafka is designed to supportPay per use: Pay for resources based on usageCost-effectiveness: Cloud deployment has notably lower costs than self-managed services, which also decreases adoption time Global: Connect to Kafka from around the globe and consume data locallyBuilding around these key requirements, a fully managed Kafka as a service provides an enhanced user experience that is scalable and flexible with reduced infrastructure management costs. Based on their experience building cloud-native Kafka, Gwen and her team published a four-part thesis that shares insights on user expectations for modern cloud data services as well as technical implementation considerations to help you develop your own cloud-native data system. EPISODE LINKSCloud-Native Apache KafkaDesign Considerations for Cloud-Native Data SystemsSoftware Engineer, Cloud Native KafkaJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)Watch the video version of this podcast
Dec 7 2021
33 mins
ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury
What is ksqlDB and how does Simon Aubury (Principal Data Engineer, Thoughtworks) use it to track down the plane that wakes his cat Snowy in the morning? Experienced in building real-time applications with ksqlDB since its genesis, Simon provides an introduction to ksqlDB by sharing some of his projects and use cases. ksqlDB is a database purpose-built for stream processing applications and lets you build real-time data streaming applications with SQL syntax. ksqlDB reduces the complexity of having to code with Java, making it easier to achieve outcomes through declarative programming, as opposed to procedural programming. Before ksqlDB, you could use the producer and consumer APIs to get data in and out of Apache Kafka®; however, when it comes to data enrichment, such as joining, filtering, mapping, and aggregating data, you would have to use the Kafka Streams API—a robust and scalable programming interface influenced by the JVM ecosystem that requires Java programming knowledge. This presented scaling challenges for Simon, who was at a multinational insurance company that needed to stream loads of data from disparate systems with a small team to scale and enrich data for meaningful insights. Simon recalls discovering ksqlDB during a practice fire drill, and he considers it as a memorable moment for turning a challenge into an opportunity.Leveraging your familiarity with relational databases, ksqlDB abstracts away complex programming that is required for real-time operations both for stream processing and data integration, making it easy to read, write, and process streaming data in real time.Simon is passionate about ksqlDB and Kafka Streams as well as getting other people inspired by the technology. He’s been using ksqlDB for projects, such as taking a stream of information and enriching it with static data. One of Simon’s first ksqlDB projects was using Raspberry Pi and a software-defined radio to process aircraft movements in real time to determine which plane wakes his cat Snowy up every morning. Simon highlights additional ksqlDB use cases, including e-commerce checkout interaction to identify where people are dropping out of a sales funnel. EPISODE LINKSksqlDB 101 courseA Guide to ksqlDB Fundamentals and Stream Processing ConceptsksqlDB 101 Training with Live Walkthrough ExerciseKSQL-ops! Running ksqlDB in the WildArticles from Simon AuburyWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)
Dec 1 2021
30 mins
Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger
Many of us find ourselves in the position of equipping others to use Apache Kafka® after we’ve gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream processing. Eugene’s background as a database administrator (DBA) and immense knowledge of event streaming architecture and data processing shows as he reveals his learnings from years of working with Microsoft Power BI, Azure Event Hubs, data processing, and event streaming with ksqlDB and Kafka Streams. Eugene mentions the importance of understanding your audience, their pain points, and their questions, such as why was Kafka invented? Why does ksqlDB matter? It also helps to use metaphors where appropriate. For example, when explaining what is processing typology for Kafka Streams, Eugene uses the analogy of a highway where people are getting on a bus as the blocking operations, after the grace period, the bus will leave even without passengers, meaning after the window session, the processor will continue even without events. He also likes to inject a sense of humor in his training and keeps empathy in mind. Here is the structure that Eugene uses when building courses:The first module is usually fundamentals, which lays out the groundwork and the objectives of the courseIt's critical to repeat and summarize core concepts or major points; for example, a key capability of Kafka is the ability to decouple data in both network space and in time Provide variety and different modalities that allow people to consume content through multiple avenues, such as screencasts, slides, and demos, wherever it makes senseEPISODE LINKSBuilding ETL Pipelines from Streaming Data with Kafka and ksqlDBDon't Make Me Think | Steve KrugDesign for How People Learn | Julie Dirksen Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)
Nov 23 2021
29 mins
Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell
If you ever wondered what exactly dead letter queues (DLQs) are and how to use them, Jason Bell (Senior DataOps Engineer, Digitalis) has an answer for you. Dead letter queues are a feature of Kafka Connect that acts as the destination for failed messages due to errors like improper message deserialization and improper message formatting. Lots of Jason’s work is around Kafka Connect and the Kafka Streams API, and in this episode, he explains the fundamentals of dead letter queues, how to use them, and the parameters around them. For example, when deserializing an Avro message, the deserialization could fail if the message passed through is not Avro or in a value that doesn’t match the expected wire format, at which point, the message will be rerouted into the dead letter queue for reprocessing. The Apache Kafka® topic will reprocess the message with the appropriate converter and send it back onto the sink. For a JSON error message, you’ll need another JSON connector to process the message out of the dead letter queue before it can be sent back to the sink. Dead letter queue is configurable for handling a deserialization exception or a producer exception. When deciding if this topic is necessary, consider if the messages are important and if there’s a plan to read into and investigate why the error occurs. In some scenarios, it’s important to handle the messages manually or have a manual process in place to handle error messages if reprocessing continues to fail. For example, payment messages should be dealt with in parallel for a better customer experience. Jason also shares some key takeaways on the dead letter queue: If the message is important, such as a payment, you need to deal with the message if it goes into the dead letter queue To minimize message routing into the dead letter queue, it’s important to ensure successful data serialization at the sourceWhen implementing a dead letter queue, you need a process to consume the message and investigate the errors EPISODE LINKS: Kafka Connect 101: Error Handling and Dead Letter QueuesCapacity Planning your Kafka ClusterTales from the Frontline of Apache Kafka DevOps ft. Jason BellTweet: Morning morning (yes, I have tea)Tweet: Kafka dead letter queues Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Nov 16 2021
37 mins
Confluent Platform 7.0: New Features + Updates
Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides operators the flexibility to make changes to topic replication smoothly and byte for byte without data loss. Additionally, Cluster Linking eliminates any need to deploy MirrorMaker2 for replication management while ensuring offsets are preserved. Furthermore, the release of Confluent for Kubernetes 2.2 allows you to build your own private cloud in Kafka. It completes the declarative API by adding cloud-native management of connectors, schemas, and cluster links to reduce the operational burden and manual processes so that you can instead focus on high-level declarations. Confluent for Kubernetes 2.2 also enhances elastic scaling through the Shrink API.  Following ZooKeeper’s removal in Apache Kafka 3.0, Confluent Platform 7.0 introduces KRaft in preview to make it easier to monitor and scale Kafka clusters to millions of partitions. There are also several ksqlDB enhancements in this release, including foreign-key table joins and the support of new data types—DATE and TIME— to account for time values that aren’t TIMESTAMP. This results in consistent data ingestion from the source without having to convert data types.EPISODE LINKSDownload Confluent Platform 7.0Check out the release notesRead the Confluent Platform 7.0 blog postWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)
Nov 9 2021
12 mins
Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck
Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic’s message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Confluent Developer, where he delves into what Kafka Streams is, how to use it, and how it works. Kafka Streams provides the abstraction over Kafka consumers and producers by minimizing administrative details like the need to code and manage frameworks required when using plain Kafka consumers and producers to process streams. Kafka Streams is declarative—you can state what you want to do, rather than how to do it. Kafka Streams leverages the KafkaConsumer protocol internally; it inherits its dynamic scaling properties and the consumer group protocol to dynamically redistribute the workload. When Kafka Streams applications are deployed separately but have the same application.id, they are logically still one application. Kafka Streams has two processing APIs, the declarative API or domain-specific language (DSL)  is a high-level language that enables you to build anything needed with a processor topology, whereas the Processor API lets you specify a processor typology node by node, providing the ultimate flexibility. To underline the differences between the two APIs, Bill says it’s almost like using the object-relational mapping framework (ORM) versus SQL. The Kafka Streams 101 course is designed to get you started with Kafka Streams and to help you learn the fundamentals of: How streams and tables work How stateless and stateful operations work How to handle time windows and out of order dataHow to deploy Kafka StreamsEPISODE LINKSKafka Streams 101 courseA Guide to Kafka Streams and Its UsesYour First Kafka Streams ApplicationKafka Streams 101 meetupWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse podcon19 to get 40% off "Kafka Streams in Action"Use podcon19 to get 40% off "Event Streaming with Kafka Streams and ksqlDB"Use PODCAST100 to get $100 of free Confluent Cloud usage (details)
Nov 4 2021
35 mins
Automating Infrastructure as Code with Apache Kafka and Confluent ft. Rosemary Wang
Managing infrastructure as code (IaC) instead of using manual processes makes it easy to scale systems and minimize errors. Rosemary Wang (Developer Advocate, HashiCorp, and author of “Essential Infrastructure as Code: Patterns and Practices”) is an infrastructure engineer at heart and an aspiring software developer who is passionate about teaching patterns for infrastructure as code to simplify processes for system admins and software engineers familiar with Python, provisioning tools like Terraform, and cloud service providers. The definition of infrastructure has expanded to include anything that delivers or deploys applications. Infrastructure as software or infrastructure as configuration, according to Rosemary, are ideas grouped behind infrastructure as code—the process of automating infrastructure changes in a codified manner, which also applies to DevOps practices, including version controls, continuous integration, continuous delivery, and continuous deployment. Whether you’re using a domain-specific language or a programming language, the practices used to collaborate between you, your team, and your organization are the same—create one application and scale systems.The ultimate result and benefit of infrastructure as code is automation. Many developers take advantage of managed offerings like Confluent Cloud—fully managed Kafka as a service—to remove the operational burden and configuration layer. Still, as long as complex topologies like connecting to another server on a cloud provider to external databases exist, there is great value to standardizing infrastructure practices. Rosemary shares four characteristics that every infrastructure system should have: ResilienceSelf-serviceSecurityCost reductionIn addition, Rosemary and Tim discuss updating infrastructure with blue-green deployment techniques, immutable infrastructure, and developer advocacy. EPISODE LINKS: Use PODCAST100 to get $100 of free Confluent Cloud usage (details)Use podcon19 to get 40% off “Essential Infrastructure as Code: Patterns and Practices”Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with Confluent
Oct 26 2021
30 mins
Getting Started with Spring for Apache Kafka ft. Viktor Gamov
What’s the distinction between the Spring Framework and Spring Boot? If you are building a car, the Spring Framework is the engine while Spring Boot gives you the vehicle that you ride in. With experience teaching and answering questions on how to use Spring and Apache Kafka® together, Viktor Gamov (Principal Developer Advocate, Kong) designed a free course on Confluent Developer and previews it in this episode. Not only this, but he also explains why the opinionated Spring Framework would be a good hero in Marvel. Spring is an ever-evolving framework that embraces modern, cloud-native technologies with cross-language options, such as Kotlin integration. Unlike its predecessors, the Spring Framework supports a modern version of Java and the requirements of the Twelve-Factor App manifesto for you to move an application between environments without changing the code. With that engine in place, Spring Boot introduces a microservices architecture. Spring Boot contains databases and messaging systems integrations, reducing development time and increasing overall productivity. Spring for Apache Kafka applies best practices of the Spring community to the Kafka ecosystem, including features that abstract away infrastructure code for you to focus on programming logic that is important for your application. Spring for Apache Kafka provides a wrapper around the producer and consumer to ease Kafka configuration with APIs, including KafkaTemplate, MessageListenerContainer, @KafkaListener, and TopicBuilder.The Spring Framework and Apache Kafka course will equip you with the knowledge you need in order to build event-driven microservices using Spring and Kafka on Confluent Cloud. Tim and Viktor also discuss Spring Cloud Stream as well as Spring Boot integration with Kafka Streams and more. EPISODE LINKSSpring Framework and Apache Kafka courseSpring for Apache Kafka 101Bootiful Stream Processing with Spring and KafkaLiveStreams with Viktor GamovUse kafkaa35 to get 30% off "Kafka in Action"Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Oct 19 2021
32 mins
Powering Event-Driven Architectures on Microsoft Azure with Confluent
When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He’s worked with retailers and those in the IoT space to help them adopt processes for inventory management with Confluent. Having a cloud-native, real-time architecture that can keep an accurate record of supply and demand is important in keeping up with the inventory and customer satisfaction. Israel has also worked with customers that use Confluent to integrate with Cosmos DB, Microsoft SQL Server, Azure Cognitive Search, and other integrations within the Azure ecosystem. Another important use case is enabling real-time data accessibility in the public sector and healthcare while ensuring data security and regulatory compliance like HIPAA. Alicia has a background in AI, and she expresses the importance of moving away from the monolithic, centralized data warehouse to a more flexible and scalable architecture like Kafka. Building a data pipeline leveraging Kafka helps ensure data security and consistency with minimized risk.The Confluent and Azure integration enables quick Kafka deployment with out-of-the-box solutions within the Kafka ecosystem. Confluent Schema Registry captures event streams with a consistent data structure, ksqlDB enables the development of real-time ETL pipelines, and Kafka Connect enables the streaming of data to multiple Azure services.EPISODE LINKSConfluent on Azure: Why You Should Add Confluent to Your Azure ToolkitIzzyAcademy Kafka on Azure Learning Series by Alicia MonizWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Oct 14 2021
38 mins
Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes
Autonomy is key in building a sustainable and motivated team, and this core principle also applies to DevOps. Building self-serve Apache Kafka® and Confluent Platform deployments require a streamlined process with unrestricted tools—a centralized processing tool that allows teams in large or mid-sized organizations to automate infrastructure changes while ensuring shared standards are met. With more than 15 years of engineering and technology consulting experience, Pere Urbón-Bayes (Senior Solution Architect, Professional Services, Confluent) built an open source solution—JulieOps—to enable a self-serve Kafka platform as a service with data governance. JulieOps is one of the first solutions available to realize self-service for Kafka and Confluent with automation. Development, operations, security teams often face hurdles when deploying Kafka. How can a user request the topics that they need for their applications? How can the operations team ensure compliance and role-based access controls? How can schemas be standardized and structured across environments? Manual processes can be cumbersome with long cycle times. Automation reduces unnecessary interactions and shortens processing time, enabling teams to be more agile and autonomous in solving problems from a localized team level. Similar to Terraform, JulieOps is declarative. It's a centralized agent that uses the GitOps philosophy, focusing on a developer-centric experience with tools that developers are already familiar with, to provide abstractions to each product personas. All changes are documented and approved within the change management process to streamline deployments with timely and effective audits, as well as ensure security and compliance across environments.  The implementation of a central software agent, such as JulieOps, helps you automate the management of topics, configuration, access controls, Confluent Schema Registry, and more within Kafka. It’s multi tenant out of the box and supports on-premises clusters and the cloud with CI/CD practices. Tim and Pere also discuss the steps necessary to build a self-service Kafka with an automatic Jenkins process that will empower development teams to be autonomous.EPISODE LINKSJulieOps on GitHubJulieOps documentationBuilding a Self-Service Kafka Platform as a Service with GitOps with Pere Urbón-BayesOpen Service Broker APIDrive | Daniel H. Pink Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Oct 7 2021
26 mins
Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt
Kafka Connect is a streaming integration framework between Apache Kafka® and external systems, such as databases and cloud services. With expertise in ksqlDB and Kafka Connect, Robin Moffatt (Staff Developer Advocate, Confluent) helps and supports the developer community in understanding Kafka and its ecosystem. Recently, Robin authored a Kafka Connect 101 course that will help you understand the basic concepts of Kafka Connect, its key features, and how it works.What’s Kafka Connect, and how does it work with Kafka and brokers? Robin explains that Kafka Connect is a Kafka API that runs separately from the Kafka brokers, running on its own Java virtual machine (JVM) process known as the Kafka Connect worker. Kafka Connect is essential for streaming data from different sources into Kafka and from Kafka to various targets. With Connect, you don’t have to write programs using Java and instead specify your pipeline using configuration. Kafka Connect.As a pluggable framework, Kafka Connect has a broad set of more than 200 different connectors available on Confluent Hub, including but not limited to:NoSQL and document stores (Elasticsearch, MongoDB, and Cassandra)RDBMS (Oracle, SQL Server, DB2, PostgreSQL, and MySQL)Cloud object stores (Amazon S3, Azure Blob Storage, and Google Cloud Storage),Message queues (ActiveMQ, IBM MQ, and RabbitMQ)Robin and Tim also discuss single message transform (SMTs), as well as distributed and standalone deployment modes Kafka Connect. Tune in to learn more about Kafka Connect, and get a preview of the Kafka Connect 101 course.EPISODE LINKSKafka Connect 101 courseKafka Connect Fundamentals: What is Kafka Connect?Meetup: From Zero to Hero with Kafka ConnectConfluent Hub: Discover Kafka connectors and more12 Days of SMTsWhy Kafka Connect? ft. Robin MoffattWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Sep 28 2021
31 mins
Designing a Cluster Rollout Management System for Apache Kafka ft. Twesha Modi
As one of the top coders of her Java coding class in high school, Twesha Modi is continuing to follow her passion for computer science as a senior at Cornell University, where she has proven to be one of the top programmers. During Twesha's summer internship at Confluent, she contributed to designing a new service to automate Apache Kafka® cluster rollout management—a process that releases the latest Kafka versions to customer’s clusters in Confluent Cloud.During Twesha’s internship, she was part of the Platform team, which designed a cluster management rollout service—capable of automating cluster rollout and generating rollout plans that streamline Kafka operations in the cloud. The pre-existing manual process worked well in scenarios involving just a couple hundred clusters, but with growth and the need to upgrade a significantly larger cluster fleet to target versions in the cloud, the process needed to be automated in order to accelerate feature releases while ensuring security. Under the mentorship of Pablo Berton (Staff Software Engineer I, Product Infrastructure, Confluent), Nikhil Bhatia (Principal Engineer I, Product Infrastructure, Confluent), and Vaibhav Desai (Staff Software Engineer I, Confluent), Twesha supported the design of the rollouts management process from scratch. Twesha’s 12-week internship helped her learn more about software architecture and the design process that goes into software as a service and beyond. Not only did she acquire new skills and knowledge, but she also met mentors who are willing to teach, share their experiences, and help her succeed along the way. Tim and Twesha also talk about the importance of asking questions during internships for the best learning experience, in addition to discussing the Vert.x, Java, Spring, and Kubernetes APIs. EPISODE LINKSMulti-Cluster Apache Kafka with Cluster Linking ft. Nikhil BhatiaWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Sep 23 2021
30 mins
Apache Kafka 3.0 - Improving KRaft and an Overview of New Features
Apache Kafka® 3.0 is out! To spotlight major enhancements in this release, Tim Berglund (Apache Kafka Developer Advocate) provides a summary of what’s new in the Kafka 3.0 release from Krakow, Poland, including API changes and improvements to the early-access Kafka Raft (KRaft). KRaft is a built-in Kafka consensus mechanism that’s replacing Apache ZooKeeper going forward. It is recommended to try out new KRaft features in a development environment, as KRaft is not advised for production yet. One of the major features in Kafka 3.0 is the efficiency for KRaft controllers and brokers to store, load, and replicate snapshots into a Kafka cluster for metadata topic partitioning. The Kafka controller is now responsible for generating a Kafka producer ID in both ZooKeeper and KRaft, easing the transition from ZooKeeper to KRaft on the Kafka 3.X version line. This update also moves us closer to the ZooKeeper-to-KRaft bridge release. Additionally, this release includes metadata improvements, exactly-once semantics, and KRaft reassignments. To enable a stronger record delivery guarantee, Kafka producers turn on by default idempotency, together with acknowledgment delivery by all the replicas. This release also comprises enhancements to Kafka Connect task restarts, Kafka Streams timestamp based synchronization and more flexible configuration options for MirrorMaker2 (MM2). The first version of MirrorMaker has been deprecated, and MirrorMaker2 will be the focus for future developments. Besides that, this release drops support for older message formats, V0 and V1, as well as initiates the removal of Java 8 and Scala 2.12 across all components in Apache Kafka. The universal Java 8 and Scala 2.12 deprecation is anticipated to complete in the future Apache Kafka 4.0 release.Apache Kafka 3.0 is a major release and step forward for the Apache Kafka project!EPISODE LINKSApache Kafka 3.0 release notes Read the blog to learn moreDownload Apache Kafka 3.0Watch the video version of this podcastJoin the Confluent Community Slack
Sep 21 2021
15 mins
How to Build a Strong Developer Community with Global Engagement ft. Robin Moffatt and Ale Murray
A developer community brings people with shared interests and purpose together. The fundamental elements of a community are to gather, learn, support, and create opportunities for collaboration. A developer community is also an effective and efficient instrument for exploring and solving problems together. The power of a community is its endless advantages, from knowledge sharing to support, interesting discussions, and much more. Tim Berglund invites Ale Murray (Global Community Manager, Confluent) and Robin Moffatt (Staff Developer Advocate, Confluent) on the show to discuss the art of Q&A in a global community, share tips for building a vibrant developer community, and highlight the five strategic pillars for running a successful global community:MeetupsConferencesMVP program (e.g., Confluent Community Catalysts)Community hackathonsDigital platforms Digital platforms, such as a community Slack and forum, often consist of members who are well versed on topics of interest. As a leader in the Apache Kafka® and Confluent communities, Robin expresses the importance of being respectful when asking questions and providing details to the problem at hand. A well-formulated and focused question will more likely lead to a helpful answer. Oftentimes, the cognitive process of composing the question actually helps iron out the problem and draw out a solution. This process is also known as the rubber duck debugging theory. In a global community with diverse cultures and languages, being kind and having empathy is crucial. The tone and meaning of words can sometimes get lost in translation. Using emojis can help transcend language barriers by adding another layer of tone to plain text. Ale and Robin also discuss the pros and cons of a community forum vs. a Slack group. Tune in to find out more tips and best practices on building and engaging a developer community.EPISODE LINKSUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)How to Ask Good QuestionsWhy We Launched a ForumGrowing the Event Streaming Community During COVID-19 ft. Ale MurrayMeetup HubAnnouncing the Confluent Community ForumWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with Confluent
Sep 14 2021
35 mins
What Is Data Mesh, and How Does it Work? ft. Zhamak Dehghani
The data mesh architectural paradigm shift is all about moving analytical data away from a monolithic data warehouse or data lake into a distributed architecture—allowing data to be shared for analytical purposes in real time, right at the point of origin. The idea of data mesh was introduced by Zhamak Dehghani (Director of Emerging Technologies, Thoughtworks) in 2019.  Here, she provides an introduction to data mesh and the fundamental problems that it’s trying to solve. Zhamak describes that the complexity and ambition to use data have grown in today’s industry. But what is data mesh? For over half a century, we’ve been trying to democratize data to deliver value and provide better analytic insights. With the ever-growing number of distributed domain data sets, diverse information arrives in increasing volumes and with high velocity. To remove the friction and serve the requirement for data to be consumed by operational needs in various use cases, the best way is to mesh the data. This means connecting data through a peer-to-peer fashion and liberating data for analytics, machine learning, serving up data-intensive applications across the organization, and more. Data mesh tackles the deficiency of the traditional, centralized data lake and data warehouse platform architecture. The data mesh paradigm is founded on four principles: Domain-oriented ownershipData as a productData available everywhere in a self-serve data infrastructureData standardization governanceA decentralized, agnostic data structure enables you to synthesize data and innovate. The starting point is embracing the ideology that data can be anywhere. Source-aligned data should serve as a product available for people across the organization to combine, explore, and drive actionable insights. Zhamak and Tim also discuss the next steps we need to take in order to bring data mesh to life at the industry level.To learn more about the topic, you can visit the all-new Confluent Developer course: Data Mesh 101. Confluent Developer is a single destination with resources to begin your Kafka journey.  EPISODE LINKSZhamak Dehghani: How to Build the Data Mesh FoundationData Mesh 101 CourseSaxo Bank’s Best Practices for a Distributed Domain-Driven Architecture Founded on the Data MeshPlacing Apache Kafka at the Heart of a Data Revolution at Saxo BankWatch the video version of this podcastJoin the Confluent CommunityLearn Kafka on Confluent DeveloperLive demo: Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of Confluent Cloud usage (details)
Sep 9 2021
34 mins