Sunayan Saikia's Blog - Only Coding: Fully Distributed mode Cloudera Hadoop installation and configuration

Thursday, 2 June 2011

Fully Distributed mode Cloudera Hadoop installation and configuration

Cloudera Hadoop Distribution installation guide on a Cluster

#note: All quoted sentences should be typed in terminal without the quotes.

For all machines (master + slaves):

1. Download JDK1.6.0_*.rpm.bin file from oracle’s sun java site.

Verify successful installation by typing “java -version” in the

terminal. Reboot.

2. Download Cloudera’s hadoop distribution by typing

“yum install hadoop-0.20”,

verify by typing “hadoop version”; should show the hadoop

version currently installed. Reboot.

3. Type “jps” to verify all the daemons are running. This to verify if hadoop is

running in pseudo-distributed mode.

4. Now stop all daemons by typing,

“for service in /etc/init.d/hadoop-0.20-*; do sudo $service stop;done;

5. Edit the hosts file by typing “vi /etc/hosts” and append the following into it,

master 192.168.1.34(or whatever is the master IP)

slave1 192.168.1.67(or whatever is the slave1 IP)

slave2 192.168.1.67(or whatever is the slave2 IP)

….

….

6. Change directory to /usr/lib/hadoop-0.20/conf and edit following files,

Modify only the following property in core-site.xml for the time being,

<property>

<name>fs.default.name</name>

<value>hdfs://master:8020</value>

</property>

Modify only the following property in mapred-site.xml for the time being,

<property>

<name>mapred.job.tracker</name>

<value>master:8021</value>

</property>

Now, modify masters and slaves file in each machines respectively,

In masters file write,

master

In slave file write,

slave1

slave2

…

…

Setting up password-less ssh from master to all slaves

In the namenode machine i.e. in the master in our case following should be done,

ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

        chmod go-w $HOME $HOME/.ssh
        chmod 600 $HOME/.ssh/authorized_keys
        chown `whoami` $HOME/.ssh/authorized_keys

Copy the generated id_dsa.pub key to all slave machines using,

         scp ~/.ssh/id_dsa.pub slave1:~/.ssh/master.pub (provide password
         of slave1 when asked)
         scp ~/.ssh/id_dsa.pub slave2:~/.ssh/master.pub (provide password
         of slave2 when asked)
        ………
        ………

Do the following in all slaves, (by any means)

chmod go-w $HOME $HOME/.ssh

chmod 600 $HOME/.ssh/authorized_keys

chown `whoami` $HOME/.ssh/authorized_keys

Now, typing “ssh slaveN”, where N=1,2…n(slaves), should result in

password-less login.

      7.  To format the namenode in the master machine type

                         “sudo -u hdfs hadoop namenode –format”

      8.  Start the following daemons in master,

                sudo service hadoop-0.20-namenode start

                sudo service hadoop-0.20-jobtracker start

                sudo service hadoop-0.20-secondarynamenode start

       9. Start the following daemons in all slaves,

                sudo service hadoop-0.20-tasktracker start

                sudo service hadoop-0.20-datanode start

10. Verify by typing in browser by typing http://IPAddress:50070 and http://IPAddress:50030

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)