Steps to install Hadoop in cluster (Two Machines - Master and Slave)
attempted by me
1. I installed Hadoop in machines where master and slave had the same username ‘sunayan’
Make sure PATH environment variable contains path to JAVA_HOME and
JAVA_HOME/bin
Verify by typing ‘java -version‘ in termianal; should show the java version.
2. Make sure Hadoop is installed in the same directory path in all nodes (machines)
3. Make sure /home/user_name/HadoopTmpDir/dfs path is same in all the machines
4. Setting up SSH in both Master and Slave.
Edit the file in each machine /etc/ssh/sshd_config to following,
# Package generated configuration file
# See the sshd(8) manpage for details
# What ports, IPs and protocols we listen for
Port 22
# Use these options to restrict which interfaces/protocols sshd will bind to
#ListenAddress ::
#ListenAddress 0.0.0.0
Protocol 2
# HostKeys for protocol version 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
#Privilege Separation is turned on for security
UsePrivilegeSeparation yes
# Lifetime and size of ephemeral version 1 server key
KeyRegenerationInterval 3600
ServerKeyBits 768
# Logging
SyslogFacility AUTH
LogLevel INFO
# Authentication:
LoginGraceTime 120
PermitRootLogin yes
StrictModes yes
RSAAuthentication yes
PubkeyAuthentication yes
#AuthorizedKeysFile %h/.ssh/authorized_keys
# Don't read the user's ~/.rhosts and ~/.shosts files
IgnoreRhosts yes
# For this to work you will also need host keys in /etc/ssh_known_hosts
RhostsRSAAuthentication no
# similar for protocol version 2
HostbasedAuthentication no
# Uncomment if you don't trust ~/.ssh/known_hosts for RhostsRSAAuthentication
#IgnoreUserKnownHosts yes
# To enable empty passwords, change to yes (NOT RECOMMENDED)
PermitEmptyPasswords yes
# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no
# Change to no to disable tunnelled clear text passwords
#PasswordAuthentication yes
# Kerberos options
#KerberosAuthentication no
#KerberosGetAFSToken no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes
X11Forwarding yes
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
#UseLogin no
#MaxStartups 10:30:60
#Banner /etc/issue.net
# Allow client to pass locale environment variables
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server
UsePAM yes
restart ssh using command service sshd restart as a root user
In both master and slave do the following machines,
Master:
$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
$ scp ~/.ssh/id_dsa.pub slave:~/.ssh/master.pub
Now, from the master node try to ssh to slave.
$ssh slave
If you are still prompted for a password (which is most likely) then it is very often just a simple permission issue. Go back to your slave node again and as the sunayan user run this
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
Try again from your master node.
$ssh slave
And you should be good to go.
Configuration in MASTER and SLAVES:
In Hadoop installation folder in all the machines, modify conf/core-site.xml, as
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/sunayan/HadoopTmpDir</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://masterIP:9100</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
modify conf/hdfs-site.xml, as
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/sunayan/HadoopTmpDir/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/sunayan/HadoopTmpDir/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
modify, conf/mapred-site.xml as,
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>masterIP:9101</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Furthermore, edit the hadoop-env.sh file and uncomment the following to point to JDK installation directory,
export JAVA_HOME=/full path to jdk here
Now, modify conf/masters and conf/slaves file in each machines respectively,
masters file,
IP address to host secondary namenodes ,e.g.
192.168.1.34
slave file,
IP addresses of slaves, e.g.
192.168.1.67
After that append to the file /etc/hosts in all the master PCs and slave PCs the following,
master 192.168.1.34(or whatever is the master IP)
slave 192.168.1.67(or whatever is the slave1 IP)
Rest of the steps,
5. Format namenode in master as HadoopInstallDir/bin/hadoop namenode –format
or for cloudera : su -s /bin/bash - hdfs -c 'hadoop namenode -format'
6. Format start Hadoop as HadoopInstallDir/bin/start-all.sh
type jps in master and slave terminals to verify if daemons are working properly
7. Verify in browser //localhost:50030 and //localhost:50070 to see all nodes running
and also verify whether you could browse the HDFS file system
8. Try the command /bin/Hadoop fs -copyFromLocal LICENSE.txt output/license.txt
9. Also, try using bin/Hadoop jar hadoop-*-example.jar pi 2 1000000 to test mapreduce framework.
Reference:
http://allthingshadoop.com/2010/04/20/hadoop-cluster-setup-ssh-key-authentication/
That’s it! It may not be perfect though,waiting for all your feedbacks.
Thank you,
Sunayan Saikia
No comments:
Post a Comment