Apache Hadoop Installation in Linux Kernel OS (non Cloudera)

Steps to Install Hadoop 0.20.2 in Linux Kernelled OS

-Extract jdk1.6.0_23 folder to /usr/java

Chmod +x jdk-6u23-linux-i586.bin

And ./ jdk-6u23-linux-i586.bin

-type in terminal vi /etc/profile file and add the following,

JAVA_HOME=”/usr/java/jdk1.6.0_23”

EXPORT JAVA_HOME

-Extract hadoop0.20.2 to /home/username/ folder

-Go inside Hadoop0.20.2 folder and then under conf modify these files,

core-site.xml

<name>hadoop.tmp.dir</name>

<value>/home/sunayanH/HadoopTmpDir</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

hdfs-site.xml

<value>/home/sunayanH/HadoopTmpDir/dfs/name</value>

</property>

<value>/home/sunayanH/HadoopTmpDir/dfs/data</value>

</property>

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

mapred-site.xml

<name>mapred.job.tracker</name>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

Make sure file sshd_config inside /etc/ssh is has set

PubkeyAuthentication set to yes

#RSAAuthentication yes

PubkeyAuthentication yes

#AuthorizedKeysFile .ssh/authorized_keys

#AuthorizedKeysCommand none

#AuthorizedKeysCommandRunAs nobody

-In the terminal type ssh-keygen press enter to every question asked

-now type cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

-type ssh localhost to check connection to localhost, answer yes for any

Confirmation

Possible Error condition at this state, if you were not careful enough in earlier sections,

-make sure file sshd_config inside /etc/ssh has

PubkeyAuthentication set to yes. Reload Configuration with /etc/init.d/ssh reload

-connection to localhost refused at Port 22

Type /etc/init.d/sshd start at the terminal and check the error and

act accordingly

-Now time to start Hadoop,

- Make a directory in /home/yourUserName/ , e.g. HadoopTmpDir

- Under HadoopTmpDir create another folder dfs. Further create two more

directories data and name under dfs.

- so, now the complete path becomes /home/YourUserName/HadoopTmpDir/dfs/name( and data)

-now goto /home/user/hadoop0.20.2 folder and type,

bin/hadoop namenode –format

bin/start-all.sh

-Move to a browser and type http://localhost:50030

And http://localhost:50070

Under NameNode configuration page click ‘Browse the filesystem’

If you cannot access DFS from browser, check the logs in under Namenode

Logs or under /hadoop0.20.2/logs directory for datanodes log files

Common problems troubleshoot:

Java.io.IOException: Incompatible namespaceID

Sol:

Goto -> home/HadoopTmpDir/dfs/data/current and open version in txt

editor

Open the log for datanode problem from log folder and

Look for namespaceID ‘random num’ e.g. 232313213

Copy that number into version file NamespaceID

#Fri Feb 25 03:28:22 IST 2011

namespaceID=1767545107

storageID=DS-1606386236-127.0.0.1-50010-1298581538273

cTime=0

storageType=DATA_NODE

layoutVersion=-1

Sample version file

#Fri Feb 25 03:28:22 IST 2011

namespaceID=1767545107

storageID=DS-1606386236-127.0.0.1-50010-1298581538273

cTime=0

storageType=DATA_NODE

layoutVersion=-1

-Redo from bin/hadoop namenode –format if found above error

Sunayan Saikia's Blog - Only Coding

Thursday, 2 June 2011

Apache Hadoop Installation in Linux Kernel OS (non Cloudera)

No comments:

Post a Comment

Jobble

Page Views