Steps to Install Hadoop 0.20.2 in Linux Kernelled OS
-Extract jdk1.6.0_23 folder to /usr/java
Chmod +x jdk-6u23-linux-i586.bin
And ./ jdk-6u23-linux-i586.bin
-type in terminal vi /etc/profile file and add the following,
JAVA_HOME=”/usr/java/jdk1.6.0_23”
EXPORT JAVA_HOME
-Extract hadoop0.20.2 to /home/username/ folder
-Go inside Hadoop0.20.2 folder and then under conf modify these files,
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/sunayanH/HadoopTmpDir</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9100</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/sunayanH/HadoopTmpDir/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/sunayanH/HadoopTmpDir/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>127.0.0.1:9101</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
Make sure file sshd_config inside /etc/ssh is has set
PubkeyAuthentication set to yes
#RSAAuthentication yes
PubkeyAuthentication yes
#AuthorizedKeysFile .ssh/authorized_keys
#AuthorizedKeysCommand none
#AuthorizedKeysCommandRunAs nobody
-In the terminal type ssh-keygen press enter to every question asked
-now type
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys-type
ssh localhost to check connection to localhost, answer yes for any ConfirmationPossible Error condition at this state, if you were not careful enough in earlier sections, -make sure file sshd_config inside /etc/ssh has
PubkeyAuthentication set to yes. Reload Configuration with /etc/init.d/ssh reload
-connection to localhost refused at Port 22
Type /etc/init.d/sshd start at the terminal and check the error and
act accordingly
-Now time to start Hadoop,
- Make a directory in /home/yourUserName/ , e.g. HadoopTmpDir
- Under HadoopTmpDir create another folder dfs. Further create two more
directories data and name under dfs.
- so, now the complete path becomes /home/YourUserName/HadoopTmpDir/dfs/name( and data)
-now goto /home/user/hadoop0.20.2 folder and type,
bin/hadoop namenode –format
bin/start-all.sh
-Move to a browser and type http://localhost:50030
Under NameNode configuration page click ‘Browse the filesystem’
If you cannot access DFS from browser, check the logs in under Namenode
Logs or under /hadoop0.20.2/logs directory for datanodes log files
Common problems troubleshoot:
Java.io.IOException: Incompatible namespaceID
Sol:
Goto -> home/HadoopTmpDir/dfs/data/current and open version in txt
editor
Open the log for datanode problem from log folder and
Look for namespaceID ‘random num’ e.g. 232313213
Copy that number into version file NamespaceID
#Fri Feb 25 03:28:22 IST 2011
namespaceID=1767545107
storageID=DS-1606386236-127.0.0.1-50010-1298581538273
cTime=0
storageType=DATA_NODE
layoutVersion=-1
Sample version file
#Fri Feb 25 03:28:22 IST 2011
namespaceID=1767545107
storageID=DS-1606386236-127.0.0.1-50010-1298581538273
cTime=0
storageType=DATA_NODE
layoutVersion=-1
-Redo from bin/hadoop namenode –format if found above error
No comments:
Post a Comment