Thursday, 2 June 2011

Fully Distributed mode Cloudera Hadoop installation and configuration


Cloudera Hadoop Distribution installation guide on a Cluster



#note: All quoted sentences should be typed in terminal without the quotes.

For all machines (master + slaves):

1. Download JDK1.6.0_*.rpm.bin file from oracle’s sun java site. 
    Verify successful installation by typing “java -version” in the 
    terminal. Reboot.

2. Download Cloudera’s hadoop distribution by typing
            “yum install hadoop-0.20”, 
    verify by  typing “hadoop version”; should show the hadoop 
    version currently installed. Reboot.

3. Type “jps” to verify all the daemons are running. This to verify if hadoop is 
    running in pseudo-distributed mode.

4. Now stop all daemons by typing,
           “for service in /etc/init.d/hadoop-0.20-*; do sudo $service stop;done;

5. Edit the hosts file by typing “vi /etc/hosts” and append the following into it,
  
       master    192.168.1.34(or whatever is the master IP)
       slave1     192.168.1.67(or whatever is the slave1 IP)
       slave2     192.168.1.67(or whatever is the slave2 IP)
       ….
       ….
6. Change directory to /usr/lib/hadoop-0.20/conf and edit following files,
    
      Modify only the following property in core-site.xml for the time being,

     <property>
     <name>fs.default.name</name>
     <value>hdfs://master:8020</value>
     </property>
  
     Modify only the following property in mapred-site.xml for the time being,

     <property>
     <name>mapred.job.tracker</name>
     <value>master:8021</value>
     </property>

     Now, modify masters and slaves file in each machines respectively,
    
      In masters file write,
                                     master
      In slave file write,
                                     slave1
                                     slave2
                                     
                                     

     Setting up password-less ssh from master to all slaves

      In the namenode machine i.e. in the master in our case following should be done,

        ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
 
        cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

        chmod go-w $HOME $HOME/.ssh
        chmod 600 $HOME/.ssh/authorized_keys
        chown `whoami` $HOME/.ssh/authorized_keys

      Copy the generated id_dsa.pub key to all slave machines using,

         scp ~/.ssh/id_dsa.pub slave1:~/.ssh/master.pub (provide password 
         of slave1 when asked)
         scp ~/.ssh/id_dsa.pub slave2:~/.ssh/master.pub (provide password 
         of slave2 when asked)
        ………
        ………
         Do the following in all slaves, (by any means)            

         chmod go-w $HOME $HOME/.ssh
         chmod 600 $HOME/.ssh/authorized_keys
         chown `whoami` $HOME/.ssh/authorized_keys

  
         Now, typing “ssh slaveN”, where N=1,2…n(slaves), should result in 
         password-less login.

  
      7.  To format the namenode in the master machine type
               
                         “sudo -u hdfs hadoop namenode –format”
 
      8.  Start the following daemons in master,
 
                sudo service hadoop-0.20-namenode start
                sudo service hadoop-0.20-jobtracker start
                sudo service hadoop-0.20-secondarynamenode start
 
       9. Start the following daemons in all slaves,
 
                sudo service hadoop-0.20-tasktracker start
                sudo service hadoop-0.20-datanode start
 

      10. Verify by typing in browser by typing http://IPAddress:50070 and http://IPAddress:50030

No comments:

Post a Comment