Data ingestion from Google spreadsheet to Elasticsearch

In this blog we are elaborate how to ingest data from Google spreadsheet to Elasticsearch.

So, There are 5 steps to ingest data from Google spreadsheet to Elasticsearch. Please follow the below steps:

Step – 1)  Login to your account .

Step – 2) Open Spreadsheet and follow step.

Open the spreadsheet and click on Add one and type elasticsearch in search box.You would see below screen.

SearchToAddonsElasticsearch

 

Now click to add elasticsearch plugin. After adding ,you have to give permission to it.After giving permission, elasticsearch plugin would be added into your account.

 

Step – 3) Add elasticsearch plugin :

–  Now click on  Add-ons , you would see below screen.

ClickToAddOnes.png

 

Step – 4) Fill Cluster Information :  

Click on send to cluster.Now  you would below screen

TypeHostAndPassword.png

Here ,in right hand side ,you have to type Host and Port along with Username and Password.

Step -5) Test the Connection :

Test to check connection with elasticsearch. After filling all the things, click on Test. You would see this message  “Successfully connected to your cluster”. Click to Save and click to Edit Data Details.

Step – 6) Edit Details :

After clicking Edit Data Details   ,Select id column and type index name and type name in which you want to ingest this spreadsheet data. You would  see below screen.

EditDataDetailsES.png

 

Step – 7) Push to Cluster :

After filling all the things ,click on Push to Cluster . You would see below screen

SuccessfulllIngestDateIntoES.png

 

After pushing data into cluster .You would see this message  “Success! The data is accessible here”.

Now click to link here of receive message and see your ingested data into ES.

 

Advertisements

Kafka & ZooKeeper | Multi Node Cluster Setup

TODO

In This blog we will explains the setup of the Kafka & ZooKeeper Multi-Node cluster on a distributed environment.

What is Apache Kafka?

A high-throughput distributed messaging system is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumer.

What is ZooKeeper?

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Learn more about ZooKeeper on the ZooKeeper Wiki.

Prerequisites

  1. Install Java if you do not have it already. You can get it from here
  2. Kafka Binary files : http://kafka.apache.org/downloads.html

Installation

  • Now first download the Kafka Tarball or binaries on your all instances and extract them
$ tar -xzvf kafka_2.11-0.9.0.1.tgz
$ mv kafka_2.11-0.9.0.1 kafka
  • On Both the Instances, you only need two properties to be changed i.e. zookeeper.properties & server.properties

Lets start to edit “zookeeper.properties” on all the instances

$ vi ~/kafka/config/zookeeper.properties
# The number of milliseconds of each tick
tickTime=2000
 
# The number of ticks that the initial synchronization phase can take
initLimit=10
 
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5

# zoo servers
server.1=x.x.x.x:2888:3888
server.2=x.x.x.x:2888:3888
server.3=x.x.x.x:2888:3888
#add here more servers if you want

Now edit all instances “server.properties” and update the following this

$ vi ~/kafka/config/server.properties
broker.id=1 //Increase by one as per node count
host.name=x.x.x.x //Current node IP
zookeeper.connect=x.x.x.x:2181,x.x.x.x:2181,x.x.x.x:2181
  • After this go to the /tmp of every instance and create following things
$ cd /tmp/
$ mkdir zookeeper #Zookeeper temp dir
$ cd zookeeper
$ touch myid  #Zookeeper temp file
$ echo '1' >> myid #Add Server ID for Respective Instances i.e. "server.1 and server.2 etc"
  • Now all is done, Need to start ZooKeeper and Kafka Server on all instances

$ bin/zookeeper-server-start.sh ~/kafka/config/zookeeper.properties

$ bin/kafka-server-start.sh ~/kafka/config/server.properties

We would look at how we can provide more useful tutorials to grow it , then we would be adding more content to it together. If you have any suggestion feel free to suggest us 🙂 Stay tuned.