Thursday, 2 June 2011

Apache Hadoop Installation in Linux Kernel OS (non Cloudera)



Steps to Install Hadoop 0.20.2 in Linux Kernelled OS

  -Extract jdk1.6.0_23   folder to /usr/java
     Chmod +x  jdk-6u23-linux-i586.bin
     And    ./ jdk-6u23-linux-i586.bin
 -type in terminal vi  /etc/profile file and add the following,
                   JAVA_HOME=”/usr/java/jdk1.6.0_23”
                   EXPORT JAVA_HOME


-Extract hadoop0.20.2 to /home/username/ folder
-Go inside Hadoop0.20.2 folder and then under conf modify these files,

core-site.xml
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/sunayanH/HadoopTmpDir</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9100</value>
  <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem.</description>
</property>




hdfs-site.xml
<property>
 <name>dfs.name.dir</name>
 <value>/home/sunayanH/HadoopTmpDir/dfs/name</value>
</property>

<property>
 <name>dfs.data.dir</name>
 <value>/home/sunayanH/HadoopTmpDir/dfs/data</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
  </description>
</property>

<property>
 <name>dfs.permissions</name>
  <value>false</value>
  </property>





mapred-site.xml
<property>
  <name>mapred.job.tracker</name>
  <value>127.0.0.1:9101</value>
  <description>The host and port that the MapReduce job tracker runs
     at.  If "local", then jobs are run in-process as a single map
    and reduce task.
  </description>
</property>


Make sure file sshd_config inside /etc/ssh is has set   
                       PubkeyAuthentication set to yes
#RSAAuthentication yes
PubkeyAuthentication  yes
#AuthorizedKeysFile         .ssh/authorized_keys
#AuthorizedKeysCommand none
#AuthorizedKeysCommandRunAs nobody




-In the terminal type ssh-keygen press enter to every question asked
-now type cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
-type ssh localhost to check connection to localhost, answer yes for any
 Confirmation

Possible Error condition at this state, if you were not careful enough in earlier sections,
          -make sure file sshd_config inside /etc/ssh has
           PubkeyAuthentication set to yes. Reload Configuration with /etc/init.d/ssh reload
          
             -connection to localhost refused at Port 22
            Type /etc/init.d/sshd start  at the terminal and check the error and
            act accordingly

  -Now time to start Hadoop,
  - Make a directory in /home/yourUserName/ , e.g. HadoopTmpDir
  - Under HadoopTmpDir create another folder dfs. Further create two  more
     directories data and name under dfs.

  - so, now the complete path becomes    /home/YourUserName/HadoopTmpDir/dfs/name( and data)
 

-now goto  /home/user/hadoop0.20.2 folder and type,
    bin/hadoop namenode –format
    bin/start-all.sh
 -Move to a browser and type http://localhost:50030
   Under NameNode configuration page click ‘Browse the filesystem’
      If you cannot access DFS from browser, check the logs in under Namenode
     Logs or under /hadoop0.20.2/logs directory for datanodes log files

    Common problems troubleshoot:
         Java.io.IOException: Incompatible namespaceID
        Sol:
              Goto -> home/HadoopTmpDir/dfs/data/current and open version in txt    
              editor
            
               Open the log for datanode problem from log folder and
               Look for namespaceID ‘random num’ e.g. 232313213
               Copy that number into version file NamespaceID
               #Fri Feb 25 03:28:22 IST 2011
                namespaceID=1767545107
                storageID=DS-1606386236-127.0.0.1-50010-1298581538273
                cTime=0
                storageType=DATA_NODE
                layoutVersion=-1
                Sample version file
                  #Fri Feb 25 03:28:22 IST 2011
                  namespaceID=1767545107
                  storageID=DS-1606386236-127.0.0.1-50010-1298581538273
                  cTime=0
                  storageType=DATA_NODE
                  layoutVersion=-1
   
 
-Redo from bin/hadoop namenode –format if found above error
 

No comments:

Post a Comment