Step1: Passwordless ssh.
Try ssh-ing your own machine
$ ssh localhost
Without passwordless ssh, you will need to enter your password to login into your own system through ssh. To so
$ ssh-keygen
the command will ask for the location of the id_rsa and the id_rsa.pub keys. Press enter to accept the default location.
The command will ask for the passphrase. Just press enter to not have any password at all (afterall, we are trying to achieve passwordless-ssh)
The command sometimes ends with a wierd random art of the key like this.
+--[ RSA 2048]----+
| -.|
| .+o|
| A=o|
| . ..=|
| P .. . O |
| . . .o |
| . F.. .|
| +.o.o |
| o O=+ |
+-----------------+
or just a cryptic fingerprint like this a2:b1:5e:6f:2a:a2:d7:3f:d1:e5:5a:aa:ab:c5:e8:2a
But yea, don't get scared. you do not have to remember them :P
now go to your home directory and copy the pub file to authorized_keys.
$ cd /Users/rajgopalv
$ cd .ssh
$ cp id_*.pub authorized_keys
$ ssh localhost
some times, when you login for a first time to the host, you will be asked with a security warning of some kind...
just type "yes" and continue. The system should not ask a password, and you should be able to login successfully.
Step-2 : Download the CDH4 tarballs.
The Hadoop and its family of softwares are available in tarball format in here : https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs
Go ahead and download the stuff you want.
But to begin with, let me start with downloading the "hadoop-2.0.0+922" tarball. If you want to run mapreduce version1 (i.e. nmot Yarn, ) then download the "hadoop-0.20-mapreduce-0.20.2+1341" tar ball too (recomended)
Now, unzip these tarballs and place them wherever you wnat them to get installed. I personally prefer a "Softwares" folder in my home directory.
$ pwd
/Users/rajgopalv/Softwares
$ ls -ld hadoop*
drwxr-xr-x@ 14 rajgopalv 1668562246 476 Feb 7 07:00 hadoop-2.0.0-cdh4.1.3
Step-3: Configure
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/rajgopalv/hadoop/data</value>
<!-- ofcourse you can use any directory you want. -->
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Similarly, configure the map-reduce too. Go to hadoop-2.0.0-mr1-cdh4.1.3/conf/ directory and edit the mapred-site.xml to look like this.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Step-4: Run!
The important thing to notice is "Exiting with status 0". Status 0 indicates all is well. :)
Now start the DFS.
To start the mapreduce module., go to hadoop-2.0.0-mr1-cdh4.1.3 directory in your terminal.
$ cd /Users/rvaithiyanathan/Softwares/hadoop-2.0.0-mr1-cdh4.1.3
$ bin/start-mapred.sh
now, http://localhost:50030/jobtracker.jsp should show your mapreduce jobs.
Bravo! you are good to go.