Kafka Training Course (3 days) 

Note: this outline is our proposal, but the training can be tailored to your specific requirements upon prior request ahead of the proposed course date.

Why Learn Kafka?

Apache Kafka is a real-time message broker that allows you to publish and subscribe to message streams. Kafka is a powerful distributed streaming platform for working with extremely huge volumes of data. An individual Kafka broker can manage hundreds of megabytes of read/write per second on large number of clients. It is highly scalable and has exceptionally high throughput making it ideal for enterprises working on Big Data problems involved in messaging systems.

Some of the topics included in this training course are the Kafka API, creating Kafka clusters, integration of Kafka with the Big Data Hadoop ecosystem along with Spark, Storm and Maven integration.

At the end of this training, you will be able to understand:

  • Kafka characteristics and salient features

  • Kafka cluster deployment on Hadoop and YARN

  • Real-time Kafka streaming

  • The fundamentals of the Kafka API

  • Storing of records using Kafka in fault-tolerant way

  • Producing and consuming message from feeds like Twitter

  • Solving Big Data problems in messaging systems

  • Kafka high throughput, scalability, durability and fault-tolerance

  • Deploying Kafka in real world business scenarios

Audience

 

  • Big Data Hadoop Developers, Architects and other professionals

  • Testing Professionals, Project Managers, Messaging and Queuing System professionals.

Requirements

  • Knowledge of Java would be an advantage

Course details

The agenda covers both fundamentals and advanced topics.

The final training outline will be designed depending on your particular requirements.

The practical exercises constitute a big part of the course time, besides demonstrations and theoretical presentations. Discussions and questions can be asked throughout the course.

 

Course Outline

Introduction

 

  • understanding what is Apache Kafka

  • various components and use cases of Kafka

  • implementing Kafka on a single node

Multi Broker Kafka Implementation

  • Kafka terminology

  • deploying single node Kafka with independent Zookeeper 

  • adding replication in Kafka

  • working with Partitioning and Brokers

  • understanding Kafka consumers

  • the Kafka Writes terminology

  • various failure handling scenarios in Kafka

Multi Node Cluster SetupPreview

  • introduction to multi node cluster setup in Kafka

  • the various administration commands

  • leadership balancing and partition rebalancing

  • graceful shutdown of kafka Brokers and tasks

  • working with the Partition Reassignment Tool

  • cluster expending

  • assigning Custom Partition

  • removing of a Broker

  • improving Replication Factor of Partitions

Integrate Flume with Kafka

  • understanding the need for Kafka Integration

  • successfully integrating it with Apache Flume

  • steps in integration of Flume with Kafka as a Source

Kafka API Preview

  • detailed understanding of the Kafka and Flume Integration

  • deploying Kafka as a Sink and as a Channel

  • introduction to PyKafka API

  • setting up the PyKafka Environment

Producers & Consumers

  • connecting Kafka using PyKafka

  • writing your own Kafka Producers and Consumers

  • writing a random JSON Producer

  • writing a Consumer to read the messages from a topic

  • writing and working with a File Reader Producer

  • writing a Consumer to store topics data into a file