Monday, March 11, 2013

Installing Cloudera Hadoop on Mac

Step1: Passwordless ssh.


Try ssh-ing your own machine

$ ssh localhost

Without passwordless ssh, you will need to enter your password to login into your own system through ssh. To so

$ ssh-keygen

the command will ask for the location of the id_rsa and the id_rsa.pub keys. Press enter to accept the default location.
The command will ask for the passphrase. Just press enter to not have any password at all (afterall, we are trying to achieve passwordless-ssh) 

The command sometimes ends with a wierd random art of the key like this. 

+--[ RSA 2048]----+
|               -.|
|              .+o|
|              A=o|
|       .      ..=|
|        P .. . O |
|       . . .o    |
|          . F.. .|
|           +.o.o |
|          o O=+  |
+-----------------+

or just a cryptic fingerprint like this a2:b1:5e:6f:2a:a2:d7:3f:d1:e5:5a:aa:ab:c5:e8:2a

But yea, don't get scared. you do not have to remember them :P
now go to your home directory and copy the pub file to authorized_keys. 

$ cd /Users/rajgopalv
$ cd .ssh
$ cp id_*.pub authorized_keys
$ ssh localhost


some times, when you login for a first time to the host, you will be asked with a security warning of some kind... 


The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is <<[then finger print]>>
Are you sure you want to continue connecting (yes/no)?

just type "yes" and continue. The system should not ask a password, and you should be able to login successfully.


Step-2 : Download the CDH4 tarballs.


The Hadoop and its family of softwares are available in tarball format in here : https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs

Go ahead and download the stuff you want. 
But to begin with, let me start with downloading the "hadoop-2.0.0+922" tarball. If you want to run mapreduce version1 (i.e. nmot Yarn, ) then download the "hadoop-0.20-mapreduce-0.20.2+1341" tar ball too (recomended)

Now, unzip these tarballs and place them wherever you wnat them to get installed. I personally prefer a "Softwares" folder in my home directory.

$ pwd
/Users/rajgopalv/Softwares
$ ls -ld hadoop*
drwxr-xr-x@ 14 rajgopalv  1668562246  476 Feb  7 07:00 hadoop-2.0.0-cdh4.1.3

drwxr-xr-x@ 29 rajgopalv  1668562246  986 Feb  6 11:20 hadoop-2.0.0-mr1-cdh4.1.3

Step-3: Configure

DFS configuration : 
go to the hadoop hadoop-2.0.0-cdh4.1.3/etc/hadoop directory and edit the core-site.xml to look like this . 

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/rajgopalv/hadoop/data</value>
        <!-- ofcourse you can use any directory you want. -->
    </property>
</configuration>

and hdfs-site.xml to look like this:

<configuration>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Similarly, configure the map-reduce too. Go to hadoop-2.0.0-mr1-cdh4.1.3/conf/ directory and edit the mapred-site.xml to look like this.

<configuration>
 <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

Step-4: Run!

Now its time to format and start the DFS: 
go to the hadoop-2.0.0-cdh4.1.3 folder in your terminal.

$ cd /Users/rajgopalv/Softwares/hadoop-2.0.0-cdh4.1.3
$ bin/hdfs namenode -format

13/03/12 00:27:06 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Nucleus/192.168.2.106
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.0.0-cdh4.1.3
STARTUP_MSG:   classpath = /Users/rajgopalv.......... [etc etc..]

************************************************************/

blah bhal blah

13/03/12 00:27:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/03/12 00:27:07 INFO util.ExitUtil: Exiting with status 0
13/03/12 00:27:07 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Nucleus/192.168.2.106
************************************************************/

The important thing to notice is "Exiting with status 0". Status 0 indicates all is well. :)
Now start the DFS.


$ sbin/start-dfs.sh

now, http://localhost:50070/dfshealth.jsp must display the health of your DFS 
To start the mapreduce module.,  go to hadoop-2.0.0-mr1-cdh4.1.3 directory in your terminal.

$ cd /Users/rvaithiyanathan/Softwares/hadoop-2.0.0-mr1-cdh4.1.3
$ bin/start-mapred.sh

now, http://localhost:50030/jobtracker.jsp should show your mapreduce jobs. 

Bravo! you are good to go. 


Possible things that could go wrong: 

Set your java home before you start anything. 

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home/

Try different port numbers in the configuration.

Although, I 've shown here that the DFS is configured on 8020 and mapreduce is configured on 8021, some other softwares might be using these ports. So feel free to try different ports.


Do you have permissions to the hadoop.tmp.dir ?

the directory that you specified in hadoop.tmp.dir must be writable by you. This is the reason i've specified a directory under my home-directory itself.

I say, check out the *.log files in the directories. hadoop-2.0.0-mr1-cdh4.1.3/logs and hadoop-2.0.0-cdh4.1.3/logs/ .. They could be a little cryptic if you are a beginner, but you will get used to it :)

Let me know if there are any doubts.!



16 comments:

  1. got this :

    smaikap:hadoop-2.0.0-cdh4.3.0 Smaikap$ bin-mapreduce1/start-mapred.sh
    +================================================================+
    | Error: HADOOP_HOME is not set correctly |
    +----------------------------------------------------------------+
    | Please set your HADOOP_HOME variable to the absolute path of |
    | the directory that contains hadoop-core-VERSION.jar |
    +================================================================+
    smaikap:hadoop-2.0.0-cdh4.3.0 Smaikap$ env
    TERM_PROGRAM=Apple_Terminal
    SHELL=/bin/bash
    TERM=xterm-256color
    TMPDIR=/var/folders/8j/4lsh8bvs6pjbs4_0nk9xryyc0000gn/T/
    Apple_PubSub_Socket_Render=/tmp/launch-JvtmhX/Render
    TERM_PROGRAM_VERSION=303.2
    OLDPWD=/Users/Smaikap/hadoop/tmpDir
    TERM_SESSION_ID=393EE8AC-0435-44DB-924C-5400D0D1E627
    USER=Smaikap
    COMMAND_MODE=unix2003
    SSH_AUTH_SOCK=/tmp/launch-8vEwVe/Listeners
    Apple_Ubiquity_Message=/tmp/launch-Ny09iQ/Apple_Ubiquity_Message
    __CF_USER_TEXT_ENCODING=0x1F5:0:0
    PATH=/Volumes/Apps&Data/devTools/sbt/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/devtools/ez:/Applications:.:/opt/X11/bin:/usr/X11/bin
    PWD=/Users/Smaikap/hadoop/hadoop-2.0.0-cdh4.3.0
    HOME=/Users/Smaikap
    SHLVL=1
    LOGNAME=Smaikap
    LC_CTYPE=UTF-8
    DISPLAY=/tmp/launch-0avV0c/org.macosforge.xquartz:0
    _=/usr/bin/env
    smaikap:hadoop-2.0.0-cdh4.3.0 Smaikap$ env | grep hadoop
    OLDPWD=/Users/Smaikap/hadoop/tmpDir
    PWD=/Users/Smaikap/hadoop/hadoop-2.0.0-cdh4.3.0
    smaikap:hadoop-2.0.0-cdh4.3.0 Smaikap$

    ReplyDelete
  2. As of CDH4.3, there is no separate tarball for MRv1: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Release-Notes/cdh4rn_topic_3_3.html

    ReplyDelete
  3. The information which you have provided is very good and easily understood. It is very useful who is looking for hadoop Training.
    Hadoop Training in hyderabad

    ReplyDelete
  4. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!


    Hadoop Training in Chennai

    ReplyDelete
  5. Your article is very useful for me. Thanks for sharing the wonderful information. AWS course chennai | AWS certification in chennai | AWS cerfication chennai

    ReplyDelete
  6. Greens Technology's the leading software Training & placement centre Chennai & ( Adyar)
    hyperion training in chennai

    ReplyDelete
  7. Thanks For Your valuable posting, it was very informative

    Guest posting sites
    Technology

    ReplyDelete
  8. Nutra Mini - So how do you move beyond the rest of the pack? I'm #1 in this area. Why will consumers need to complicate stuff? I am so happy. It is not going to be a lecture on Keto BodyTone Weight Loss. What would you do if you had a Weight Loss? Now we're off to Never Never Land. Perhaps I may not be somewhat mistaken respecting this. Hurry, they might withdraw this offer at anytime.

    Prostate 911
    Control X Keto
    Keto Plus latam
    Nolatreve Anti Aging
    Peau Jeune Creme
    Vital Keto
    BitCoin Era Chile
    CryptoEngine

    ReplyDelete
  9. We'll keep that just between you and me. Although I, in practice, partially be against this wondrous theory. Let's see if this

    is a Ecommerce that you would need to explore. They're also trying to use Ecommerce. I suspect that communities who write about this should take some time to proofread

    what they're writing in the matter of.


    AOL support number
    powervolt energy saver

    ReplyDelete