Hadoop on Multi Node Cluster

Step 1: Installing Java:

Java is the primary requirement to running hadoop on system, so make sure you have Java installed on your system using following command:

$ java -version

If you don’t have Java installed on your system, use one of following link to install it first.

Step 2: Creating Hadoop User :

We recommend to create a normal (nor root) account for hadoop working. So create a system account using following command:

$ adduser hadoop
$ passwd hadoop

Step 3 : Generate SSH Keys

After creating account, it also required to set up key based ssh to its own account. To do this use execute following commands.

[root@rklick01 ~]# su hadoop
[hadoop@rklick01 root]$ cd
[hadoop@rklick01 ~]$
[hadoop@rklick01 ~]$ ssh-keygen -t rsa

We would see these types of logs and follow these instructions
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
f2:fb:77:5a:e3:e3:9d:b6:03:40:04:ad:2a:be:c9:37 hadoop@rklick01
The key's randomart image is:
+--[ RSA 2048]----+
|         .+.     |
|           o     |
|          o      |
|         . .     |
|      . S   .    |
|     . +     .   |
|    . . .     +  |
|    ...E .  .oo=.|
|     +o o....++++|
+-----------------+
To access Worker nodes via SSH without providing password
copy SSH key to first nodes

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick01

We would see these types of logs

Now try logging into the machine, with "ssh 'hadoop@rklick01'", and 
check in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.
 copy SSH key to second nodes

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick02

We would see these types of logs

hadoop@rklick02's password: 
Now try logging into the machine, with "ssh 'hadoop@rklick02'", and
check in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.
 copy SSH key to third nodes

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick03

We would see these types of logs

hadoop@rklick03's password: 
Now try logging into the machine, with "ssh 'hadoop@rklick03'", and check 
in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

 How to check SSH keys working

[hadoop@rklick01 ~]$ ssh ‘hadoop@rklick02’

We would see these type of logs and follow instruct

Last login: Thu May  5 05:22:56 2016 from rklick01
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ exit
logout
Connection to rklick02 closed.
[hadoop@rklick01 ~]$ 

i.e. After successfully SSH implemented, you would reach to rklick02 without password. After exit command , you would be again back to rklick01 .

Step 3. Downloading Hadoop 2.6.0

$ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
$ tar xzf hadoop-2.6.0.tar.gz

Now rename Hadoop 2.6.0 to Hadoop

$ mv hadoop-2.6.0 hadoop

Step 4. Configure Hadoop Pseudo-Distributed Mode

4.1. Setup Environment Variables

4.1.1. Edit the bashrc file

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.

$ vi ~/.bashrc

Adding these lines in bashrc

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

 

Now apply the changes in current running environment

$ source ~/.bashrc

4.1.2. Edit the Hadoop Env. file

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable. Change the JAVA path as per install on your system.

$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Add these line in this file

export JAVA_HOME=/usr/

4.2. Edit Configuration Files

Hadoop has many of configuration files, which need to configure as per requirements of your hadoop infrastructure. Lets start with the configuration with basic hadoop single node cluster setup. first navigate to below location

$ cd $HADOOP_HOME/etc/hadoop

4.2.1 Edit core-site Files

[hadoop@rklick01 hadoop]$ vi core-site.xml

Edit like this

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://23.227.167.180:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

After updated this file looks like this

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://23.227.167.180:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

4.2.2 Edit hdfs-site Files

[hadoop@rklick01 hadoop]$ vi hdfs-site.xml

Edit like this

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

After updated this file looks like this

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

4.2.3 Edit mapred-site Files

[hadoop@rklick01 hadoop]$ vi mapred-site.xml

Edit like this

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

After updated this file looks like this

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

4.2.4 Edit yarn-site Files

[hadoop@rklick01 hadoop]$ vi yarn-site.xml

Edit like this

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

After updated this file looks like this

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
<!-- Site specific YARN configuration properties -->

</configuration>                   

4.3. Copy Configuration file to all other node

Copy all config to rklick02

[hadoop@rklick01 ~]$ scp -r hadoop rklick02:/home/hadoop/

Copy all config to rklick03

[hadoop@rklick01 ~]$ scp -r hadoop rklick03:/home/hadoop/

We would see these types of logs

LICENSE.txt                                                                                                   100%   15KB  15.1KB/s   00:00    
README.txt                                                                                                    100% 1366     1.3KB/s   00:00    
libhadoop.so                                                                                                  100%  787KB 787.1KB/s   00:00    
....     
ETC

4.4. Copy SSH Key to all node

Start from rklick01

[hadoop@rklick01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick01

We would see these types of logs

Now try logging into the machine, with "ssh 'hadoop@rklick01'", and
check in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.


Start from rklick02

[hadoop@rklick01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick02

We would see these types of logs

Now try logging into the machine, with "ssh 'hadoop@rklick02'", and
check in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

Start from rklick03

[hadoop@rklick01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@rklick03

We would see these types of logs

Now try logging into the machine, with "ssh 'hadoop@rklick03'", and
check in:
  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

4.5. Give authentication

[hadoop@rklick01 ~]$ chmod 0600 ~/.ssh/authorized_keys

How to test SSH is set Successfully
[hadoop@rklick01 ~]$ ssh 'hadoop@rklick02'
Last login: Thu May  5 05:22:56 2016 from rklick01
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ 
[hadoop@rklick02 ~]$ exit
logout
Connection to rklick02 closed.
[hadoop@rklick01 ~]$

4.6. Format Namenode

Now format the namenode using following command, make sure that Storage directory is

[hadoop@rklick01 hadoop]$ hdfs namenode -format

We would see these types of logs

16/05/05 05:35:08 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = rklick01/24.111.123.456&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.0 STARTUP_MSG: classpath = /home/hadoop/hadoop/etc/hadoop:
/home/hadoop/hadoop/share/hadoop/common/lib/htrace-core-3.0.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.7.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/activation-1.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-1.3.9.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/junit-4.11.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-annotations-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/curator-recipes-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/curator-client-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/xz-1.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/
home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jettison-1.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-el-1.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/curator-framework-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:
/home/hadoop/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:
/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/hadoop-nfs-2.6.0.jar:
/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.6.0-tests.jar:
/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/htrace-core-3.0.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:
/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/ha

ETC 

Step 5. Start Hadoop Cluster

Lets start your hadoop cluster using the scripts provides by hadoop. Just navigate to your hadoop sbin directory and execute scripts one by one.

$ cd $HADOOP_HOME/sbin/

Now run start-dfs.sh script.

[hadoop@rklick01 hadoop]$ sbin/start-dfs.sh

Now run start-yarn.sh script.

[hadoop@rklick01 hadoop]$ sbin/start-yarn.sh

Step 6. Access Hadoop Services in Browser

Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser.

http://24.111.123.456:50070/ 

Now access port 8088 for getting the information about cluster and all applications

http://24.111.123.456:8088/ 

Access port 50090 for getting details about secondary namenode.

http://24.111.123.456:50090/ 

Access port 50075 to get details about DataNode

http://24.111.123.456:50075/ 

EXTRA POINTS

1. How to change Hadoop user password

[root@rklick01 ~]# passwd hadoop
Changing password for user hadoop.
New password: 
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password: 
passwd: all authentication tokens updated successfully.

2. Extra point to setup after deleting user hadoop

Delete hadoopdata folder before format Namenode

i.e Before running this command

[hadoop@rklick01 hadoop]$ hdfs namenode -format

You should delete hadoopdata folder.

Here these lines works for HDFS:

val hPath = s"hdfs://$host:$port$path/$filename.$format"

We would look at how we can create more useful to grow it, then we would be adding more content to it together. If you have any suggestion feel free to suggest us :) Stay tuned.

Advertisements

2 thoughts on “Hadoop on Multi Node Cluster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s