Cloudera Hadoop (Day 2)

So today is day 2 of Cloudera Hadoop.

Today I learned to set the ClassPath by

export HADOOP_CLASSPATH=path of JAR / Class Files

Learned that the file this gets saved to is called hadoop-env.sh and can be found in the directory:


Learned a basic Linux command "pwd" tells you your current path in the directory structure.

To modify the contents of the file:
sudo vi /usr/lib/hadoop-0.20-mapreduce/conf/hadoop-env.sh
cntl-A to get to end

To simulate the admin:
sudo su

To kick off a Hadoop job you enter:

hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/input /user/cloudera/output

the input directory must already be created in HDFS

hadoop dfs -mkdir /user/cloudera/input

and you must copy your test files to that location on the HDFS server:

hadoop dfs -copyFromLocal /home/cloudera/wordcount/input /user/cloudera/input

assuming you change your directory / folder name accordingly.

and you must first clear the contents of the output directory:

hadoop dfs --rm -r /user/cloudera/output
hadoop dfs -rmdir /user/cloudera/input

if you log onto the Hadoop website, localhost, you will see your input folder as follows:

Here's a posting from day 1 http://www.bloomconsultingbi.com/2013/01/first-try-at-cloudera-hadoop.html

And I got the tutorial example to work on my virtual machine! Yippie!!!!