web
You’re offline. This is a read only version of the page.
close
Skip to main content
Community site session details

Community site session details

Session Id :

QUICK HADOOP SINGLE NODE CLUSTER (pseudo distributed mode) ON AMAZON EC2 INSTANCE

MYGz Profile Picture MYGz 1,794

This blog is for you if you want to QUICKLY!! build a hadoop cluster on Amazon EC2 free tier t2.micro instance and play around with hadoop.

WARNING!!! The cluster is only for practice purposes!! Its not highly secure. If you want a highly secure cluster then you have to apply more strict security settings. The security settings are purposely kept low so as to deploy the cluster smoothly without errors.

In the single node (pseudo distributed mode) each hadoop daemon runs in its own JVM. The Hadoop Version we will use is 1.2.1

Before starting you MUST! download these two tools and hadoop configuration bash script from the urls given below:

1. PuTTY: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

2. PuTTYGEN: http://the.earth.li/~sgtatham/putty/latest/x86/puttygen.exe

3. Bash Script for Single Node (Hadoop 1.2.1): https://drive.google.com/file/d/0B2T8Pye0P7e5Uy0zU29GX05VWG8/view?usp=sharing

Lets start with our single node cluster.

STEP 1: Login in to your Amazon Web Services account

If you do not have an AWS account then make one. You will have to add a credit/debit card and send a fax copy of any government ID proof of yours to Amazon. It will take 1 or 2 days to activate.

After logging into your amazon web services account you need to do two things:

A. Create a Keypair for logging into your instances.

B. Create a security group.

Important!! Name your Amazon Private key as “Key1″ so that everything goes smooth through the tutorial. If you want you can name it as you like but just carefully edit the Bash script according to the name of your key.

Below is the video on how to create a key pair and a security group:

STEP 2: Creating an EC2 Instance

We will choose the t2.micro instance with 8GB EBS (Elastic Block Storage) drive andLinux Ubuntu Server 14.04. The Bash Script will/might not work if you choose any other Operating System other than Ubuntu. t2.micro instance, 8GB EBS and Ubuntu 14.04 are free under free tier scheme of Amazon Web Services. If you dont know much about free tier please carefully take a look on all the things that are free under the free tier scheme (Link: http://aws.amazon.com/free/). As far as our tutorial is considered t2.micro instance upto 750hrs/month, Ubuntu Server 14.04 LTS, 30 GB of EBS and 5GB of S3 Storage are free (As of January 2015).

Upload the Key1.pem file to the s3 bucket. Its shown in the video how to do it.

Here is the video on how to create the instance:

Important!!! Do not forget to paste the bash script in User Data in Advance Details while configuring the t2.micro instance.

Wait till you get 2/2 checks in the Status Checks column. It will take around 2-4 mins.

STEP 3: Logging in into your EC2 instance and start using hadoop

User name is ubuntu and hostname is nn. Here is the video on how to log in and start using hadoop:

You can check the cluster summary from your browser over here:

1. Namenode summary at [Namenode Public DNS]:50070

2. Jobtracker summary at [Namenode Public DNS]:50030

STEP 4: Hadoop Commands

Hadoop Shell commands: https://hadoop.apache.org/docs/r1.2.1/file_system_shell.html#cat

You can google and find more use cases of hadoop single node cluster.

Please drop a comment if you get stuck anywhere in the tutorial.

I hope this blog was informative for you. Thank you for reading!

-Mohammad Yusuf Ghazi


This was originally posted here.

Comments

*This post is locked for comments