Thursday, 30 June 2011

Problems related to namenode formating

I was using Cloudera's hadoop distribution, the version was cdh3u0, I guess and the OS was Fedora desktop edition 14. While I was trying to format the namenode with the usual command with which I formatted the namenode earlier, sudo -u hdfs hadoop namenode -format, I was not able to do that. Later I saw an article and found that the command that works is su - hdfs -c "hadoop namenode -format".

Next, you might not get namenode fomatted if the versions of the slaves and the master is not same. That includes the patch version too.

Furthermore, if you have incorrect permission for different directories used in hadoop data storage .i.e. namenode data, datanode data and  mapred data , you might face problems. In one of my cases, while working with one of my colleagues, I was trying to format the namenode and it was not getting formatted saying, it could not write to the folder 'current' inside the namenode storage directory. I thought, that it might be a problem with the permission. I checked the permission for that namenode folder (/var/lib/hadoop-0.20/cache/hadoop in mycase) and found that it got root permission. I changed the permission to hdfs using chown hdfs:hdfs hadoop. That did the trick! I was able to format the namenode after that.


Wednesday, 22 June 2011

script to setup a password-less ssh to multiple machines/PCs from a single master machine/PC

Before using this script install except
In fedora it is installed by executing the command yum install except.
Copy the below part in any text editor, save is as file_name.sh and
execute it using ./file_name.sh

#!/bin/sh
#Author Sunayan Saikia. Free to use but must keep the Author's name
#Setting Password-less ssh in a multi node cluster script
#Reference: @wangrui, http://www.linuxforums.org/forum/programming-scripting/24126-ssh-username-password-script.html

stty -echo;
read -p "Input password:" A;
stty echo;
echo;


ssh-keygen -t dsa -P "" -f /root/.ssh/id_dsa
cat /root/.ssh/id_dsa.pub >> /root/.ssh/authorized_keys
chmod go-w /root/ /root/.ssh
chmod 600 /root/.ssh/authorized_keys
chown root /root/.ssh/authorized_keys

for SLAVE in {192.168.1.1,192.168.1.2};do
#modifying /etc/ssh/sshd_config
echo "Connecting to $SLAVE ";
echo "$SLAVE: Part-1 *********";
echo "Modifying the /etc/ssh/sshd_config file...";
expect -c "set timeout -1;\
spawn ssh -o StrictHostKeyChecking=no $SLAVE -l root \"sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config;sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config;sed -i 's/#PermitEmptyPasswords no/PermitEmptyPasswords yes/g' /etc/ssh/sshd_config;\";\
match_max 100000;\
expect *password:*;\
send -- $A\r;\
interact;";
#copying the master public key to the slave
echo "$SLAVE: Part-2 *********";
echo "Copying the master public key...";
expect -c "set timeout -1;\
spawn scp /root/.ssh/id_dsa.pub root@$SLAVE:/root/.ssh/master.pub;\
match_max 100000;\
expect *password:*;\
send -- $A\r;\
interact;";
#setting required permissions
echo "$SLAVE: Part-3 *********";
echo "Setting required permissions..";
expect -c "set timeout -1;\
spawn ssh $SLAVE -l root \"
ssh-keygen -R '$SLAVE | cut -d \" \" -f 4';cat /root/.ssh/master.pub >> /root/.ssh/authorized_keys;chmod go-w /root/ /root/.ssh;chmod 600 /root/.ssh/authorized_keys;chown root /root/.ssh/authorized_keys\";\
match_max 100000;\
expect *password:*;\
send -- $A\r;\
interact;";
echo "Finished job on $SLAVE ***********";
done

Wednesday, 8 June 2011

How to write a MapReduce job?

Skeleton of MapReduce basic program:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyClass /*extends Configured implements Tool*/{
/**
 * The map class of  MyClass
 */
public static class MyClassMapper
    extends Mapper<Object, Text, Text, IntWritable> {
      
  
    public void map(Object key, Text value, Context context) {
        throws IOException, InterruptedException {
             /* your code goes here */
             context.write (your_output_key_type key, your_output_value_type value);
        }
    }
}
/**
 * The reduce class of  MyClass
 */
public static class MyClassReducer
    extends Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterable<IntWritable> values, Context context){
               /* your code goes here */
               context.write (your_output_key_type key, your_ouput_value_type value);
    }
}
/**
 * The main entry point.
 */
public static void main(String[] args) throws Exception {
    
      Configuration conf = new Configuration();
      job job = new Job(conf, "skeleton");
      job.setJarByClass(MyClass.class);
      job.setMapperClass(MyClassMapper.class);
      job.setReducerClass(MYClassReducer.class);
      job.setOutputKeyClass(your_output_key_type.class);
      job.setOutputValueClass(your_output_value_type.class);
      FileInputFormat.addInputPath(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));
   
      System.exit(job.waitForCompletion(true) ? 0 : 1);
   }

Note: your_output_key_type can and your_output_value_type can be of Data Types Class Text, LongWritable etc etc. provide by hadoop. Mapper's output (key, value) types should be similar to that of Reducer's input (key, value) types. But, it isn't necessary that out Reducer's (key,value) types be similar to Mapper's output (key, value) types.

Friday, 3 June 2011

Exception Handling


Exception:
The term exception is shorthand for the phrase "exceptional event."
An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions.

Throwing an exception:
When error occurs inside a method, the method creates an object (called exceptional object) and hands it over to the runtime system. The Exceptional object contains information about the error, type of error and state of the program where the error occurred.


Catching an exception:
When an exceptional object is thrown it should be handled by a Handler.
The exception handler chosen is said to catch the exception. If the runtime system exhaustively searches all the methods on the call stack without finding an appropriate exception handler, as shown in the next figure, the runtime system (and, consequently, the program) terminates.

The Three Kinds of Exceptions:
Checked exception:
These are exceptional conditions that a well-written application should anticipate and recover from.
For e.g.: if a user is prompted for providing a file and if the user provides a non-existent file, a well-written program will catch the error and notify the user.
java.io.FileReader and  java.io.FileNotFoundException
Error:
These are exceptional conditions that are external to the application, and that the application usually cannot anticipate or recover from.
For e.g.: For example, suppose that an application successfully opens a file for input, but is unable to read the file because of a hardware or system malfunction.
java.io.IOError

Errors are not subject to the Catch or Specify Requirement, as asking the correct file. Errors are those exceptions indicated by Error and its subclasses.

runtime exception:
These are exceptional conditions that are internal to the application, and that the application usually cannot anticipate or recover from.
Usually,  Programming bug, logic errors and improper use of an API.
For example, consider the application described previously that passes a file name to the constructor for FileReader. If a logic error causes a null to be passed to the constructor, the constructor will throw NullPointerException.
Runtime exceptions are not subject to the Catch or Specify Requirement. Runtime exceptions are those indicated by RuntimeException and its subclasses.

*Errors and runtime exceptions are collectively known as unchecked exceptions.

Advantages of Exceptions
Advantage 1: Separating Error-Handling Code from "Regular" Code
Exceptions provide the means to separate the details of what to do when something out of the ordinary happens from the main logic of a program. In traditional programming, error detection, reporting, and handling often lead to confusing spaghetti code. For example, consider the pseudocode method here that reads an entire file into memory.
readFile {
    open the file;
    determine its size;
    allocate that much memory;
    read the file into memory;
    close the file;
}
At first glance, this function seems simple enough, but it ignores all the following potential errors.
·         What happens if the file can't be opened?
·         What happens if the length of the file can't be determined?
·         What happens if enough memory can't be allocated?
·         What happens if the read fails?
·         What happens if the file can't be closed?
To handle such cases, the readFile function must have more code to do error detection, reporting, and handling. Here is an example of what the function might look like.
errorCodeType readFile {
    initialize errorCode = 0;
   
    open the file;
    if (theFileIsOpen) {
        determine the length of the file;
        if (gotTheFileLength) {
            allocate that much memory;
            if (gotEnoughMemory) {
                read the file into memory;
                if (readFailed) {
                    errorCode = -1;
                }
            } else {
                errorCode = -2;
            }
        } else {
            errorCode = -3;
        }
        close the file;
        if (theFileDidntClose && errorCode == 0) {
            errorCode = -4;
        } else {
            errorCode = errorCode and -4;
        }
    } else {
        errorCode = -5;
    }
    return errorCode;
}
There's so much error detection, reporting, and returning here that the original seven lines of code are lost in the clutter. Worse yet, the logical flow of the code has also been lost, thus making it difficult to tell whether the code is doing the right thing: Is the file really being closed if the function fails to allocate enough memory? It's even more difficult to ensure that the code continues to do the right thing when you modify the method three months after writing it. Many programmers solve this problem by simply ignoring it — errors are reported when their programs crash.
Exceptions enable you to write the main flow of your code and to deal with the exceptional cases elsewhere. If the readFile function used exceptions instead of traditional error-management techniques, it would look more like the following.
readFile {
    try {
        open the file;
        determine its size;
        allocate that much memory;
        read the file into memory;
        close the file;
    } catch (fileOpenFailed) {
       doSomething;
    } catch (sizeDeterminationFailed) {
        doSomething;
    } catch (memoryAllocationFailed) {
        doSomething;
    } catch (readFailed) {
        doSomething;
    } catch (fileCloseFailed) {
        doSomething;
    }
}
Note that exceptions don't spare you the effort of doing the work of detecting, reporting, and handling errors, but they do help you organize the work more effectively.

Advantage 2: Propagating Errors Up the Call Stack

    A second advantage of exceptions is the ability to propagate error reporting up the call stack of methods. Suppose that the readFile method is the fourth method in a series of nested method calls made by the main program: method1 calls method2, which calls method3, which finally calls readFile.

        method1 {
            call method2;
        }

        method2 {
            call method3;
        }

        method3 {
            call readFile;
        }

    Suppose also that method1 is the only method interested in the errors that might occur within readFile. Traditional error-notification techniques force method2 and method3 to propagate the error codes returned by readFile up the call stack until the error codes finally reach method1—the only method that is interested in them.

        method1 {
            errorCodeType error;
            error = call method2;
            if (error)
                doErrorProcessing;
            else
                proceed;
        }

        errorCodeType method2 {
            errorCodeType error;
            error = call method3;
            if (error)
                return error;
            else
                proceed;
        }

        errorCodeType method3 {
            errorCodeType error;
            error = call readFile;
            if (error)
                return error;
            else
                proceed;
        }


Advantage 3: Grouping and Differentiating Error Types

Because all exceptions thrown within a program are objects, the grouping or categorizing of exceptions is a natural outcome of the class hierarchy. An example of a group of related exception classes in the Java platform are those defined in java.ioIOException and its descendants. IOException is the most general and represents any type of error that can occur when performing I/O. Its descendants represent more specific errors. For example, FileNotFoundException means that a file could not be located on disk.
A method can write specific handlers that can handle a very specific exception. The FileNotFoundException class has no descendants, so the following handler can handle only one type of exception.
catch (FileNotFoundException e) {
    ...
}
A method can catch an exception based on its group or general type by specifying any of the exception's superclasses in the catch statement. For example, to catch all I/O exceptions, regardless of their specific type, an exception handler specifies an IOException argument.
catch (IOException e) {
    ...
}
This handler will be able to catch all I/O exceptions, including FileNotFoundException, EOFException, and so on. You can find details about what occurred by querying the argument passed to the exception handler. For example, use the following to print the stack trace.
catch (IOException e) {
    e.printStackTrace();  //Output goes to System.err.
    e.printStackTrace(System.out);  //Send trace to stdout.
}
You could even set up an exception handler that handles any Exception with the handler here.
catch (Exception e) {    //A (too) general exception handler
    ...
}



E.g:

public void openFile(){
        try {
            // constructor may throw FileNotFoundException
            FileReader reader = new FileReader("someFile");
            int i=0;
            while(i != -1){
                //reader.read() may throw IOException
                i = reader.read();
                System.out.println((char) i );
            }
            reader.close();
            System.out.println("--- File End ---");
        } catch (FileNotFoundException e) {
            //do something clever with the exception
        } catch (IOException e) {
            //do something clever with the exception
        } catch (Exception e){} //will catch any exception
        finally{
                reader.close();

        }

    }



Method throwing exception:

public int divide(int noToDiv,int noToDivBy) throws Exception, someOtherException

{
                 
if (noToDivBy==0){
                  throw new Exception("cannot divide by 0");
                 
            }
            return (noToDiv/noToDivBy);
}
public void callDivide() {
              try {
                  int result = divide(2,0);
                  System.out.println(result);
              } catch (Exception e) {
                 //do something clever with the exception
                //System.out.println("hmmmm");
                System.out.println(e.getMessage());
              }
              System.out.println("Division attempt done");
          }
















What are Chained Exceptions in Java?

Whenever in a program the first exception causes an another exception, that is termed as Chained Exception. Java provides new functionality for chaining exceptions. Exception chaining (also known as "nesting exception") is a technique for handling the exception, which occur one after another i.e. most of the time is given by an application to response to an exception by throwing another exception. Typically the second exception is caused by the first exception. Therefore chained exceptions help the programmer to know when one exception causes another.
The constructors that support chained exceptions in Throwable class are:
Throwable initCause(Throwable)
Throwable(Throwable)
Throwable(String, Throwable)
Throwable getCause()

METHOD
DESCRIPTION
toString()
Returns the exception followed by a message string (if one exit) .
getMessage()
Returns the message string of the Throwable object.
printStackTrace()
Returns the full name of the exception class and some additional information apart from the information of first two method.
getCause()
Returns the exception that caused the occurrence of current exception.
initCause()
Returns the current exception due to the Throwable constructors and the Throwable argument to initCause.


import java.io.*;
import java.util.*;
class MyException extends Exception{
MyException(String msg){
         super(msg);
   }
}
public class ChainExcep{
              public static void main(String args[])throws MyException, IOException{
              try{
           int rs=10/0;
               }
catch(Exception e){
            System.err.println(e.getMessage());
         System.err.println(e.getCause());
        throw new MyException("Chained ArithmeticException");
           }
          }
  }