In This blog we will explains the setup of the Kafka & ZooKeeper Multi-Node cluster on a distributed environment.
What is Apache Kafka?
A high-throughput distributed messaging system is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumer.
What is ZooKeeper?
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
Learn more about ZooKeeper on the ZooKeeper Wiki.
- Install Java if you do not have it already. You can get it from here
- Kafka Binary files : http://kafka.apache.org/downloads.html
- Now first download the Kafka Tarball or binaries on your all instances and extract them
$ tar -xzvf kafka_2.11-0.9.0.1.tgz $ mv kafka_2.11-0.9.0.1 kafka
- On Both the Instances, you only need two properties to be changed i.e. zookeeper.properties & server.properties
Lets start to edit “zookeeper.properties” on all the instances
$ vi ~/kafka/config/zookeeper.properties # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # zoo servers server.1=x.x.x.x:2888:3888 server.2=x.x.x.x:2888:3888 server.3=x.x.x.x:2888:3888 #add here more servers if you want
Now edit all instances “server.properties” and update the following this
$ vi ~/kafka/config/server.properties broker.id=1 //Increase by one as per node count host.name=x.x.x.x //Current node IP zookeeper.connect=x.x.x.x:2181,x.x.x.x:2181,x.x.x.x:2181
- After this go to the /tmp of every instance and create following things
$ cd /tmp/ $ mkdir zookeeper #Zookeeper temp dir $ cd zookeeper $ touch myid #Zookeeper temp file $ echo '1' >> myid #Add Server ID for Respective Instances i.e. "server.1 and server.2 etc"
- Now all is done, Need to start ZooKeeper and Kafka Server on all instances
$ bin/zookeeper-server-start.sh ~/kafka/config/zookeeper.properties $ bin/kafka-server-start.sh ~/kafka/config/server.properties
We would look at how we can provide more useful tutorials to grow it , then we would be adding more content to it together. If you have any suggestion feel free to suggest us 🙂 Stay tuned.