1/23/2013

My Intro to Pentaho Big Data PDI Kettle

Pentaho's Big Data solution revolves around Kettle PDI.

Here's the webpage I viewed to get started:

http://infocenter.pentaho.com/help/index.jsp?topic=%2Fgetting_started_with_pentaho%2Ftopic_introducing_bi_suite.html

And here's the download page:

http://wiki.pentaho.com/display/BAD/Downloads

I'm working on a Cloudera VM with Linux Ubuntu so I chose that version to download:

Next step is the install...

Set the permissions on the downloaded .bin file:$ chmod +x ./pdi-ce-4.3.0-stable.tar.gz

To extract the archive to the current folder:

tar -xvf archive_name.tar.gz

To Copy the contents of the new folder to the /opt/Pentaho/server directory:

cp -r ./data-integration/* /opt/pentaho/server/data-integration

To remove the directory's in Linux use the command:

rm -rf

To launch the Pentaho Data Integration utility, type ./spoon.sh from the /pentaho/server/Data Integration/ folder:


Open one of the Sample File using the File --> Open drop down:



 

For a new report, select the repository Type:



I clicked the 'NEW' button which spawned a new screen:



http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions

http://diethardsteiner.blogspot.com/2011/05/kettle-sourcing-data-from-hadoop-hive.html

Creating my own Hadoop Job Executor, it seemed to run without exceptions, however, it didn't produce the expected output directory contents.


It seems there are many options involved with this utility and I only got to the tip of the iceberg.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Thoughts to Ponder