11/18/2014

#Pentaho #Kettle #PDI CE Offering

There's a product from Pentaho to perform Extract, Transform and Load, ETL / ELT, which you can use on Linux or Windows, called Kettle.

There's the free Community Edition CE:
http://community.pentaho.com/

Download latest version of Kettle CE Community Edition:
http://sourceforge.net/projects/pentaho/files/Data%20Integration/5.2/

As well as the Enterprise Edition with a 30 day trial:
http://www.pentaho.com/download

I wanted to load on a Linux environment, so I downloaded a Hadoop VM from Hortonworks CentOS Sandbox version:
http://hortonworks.com/products/hortonworks-sandbox/#install

Link for installing Kettle:
http://wiki.pentaho.com/display/EAI/01.+Installing+Kettle

So I logged onto my Hyper-V Hadoop Sandbox single node cluster as root/Hadoop.

Was logged in as root, created a new user called pentaho:
To sudo to the Pentaho user
$ su - pentaho
 
 
 


And back to root:
su - root

I created a folder in the root called /Pentaho

There's no easy way to copy the zip file from the host computer to the Hyper-V, so I used curl:

curl 'http://tcpdiag.dl.sourceforge.net/project/pentaho/Data%20Integration/5.2/pdi-ce-5.2.0.0-209.zip'

Checked to see the Zip file was downloaded as Archive file, not HTML:

To Unzip the Zip file, first install Unzip
yum install unzip -y from the command line on the Hadoop VM


Then key in the command:
unzip pdi-ce-5.2.0.0-209.zip -d /Pentaho


Extracted:

Files were loaded:

Next step is to get Kettle PDI running...

In the meantime, I wanted to explore the Windows version and sure enough the same bits work for both Linux and Windows, in my case, Windows 8.1.


You'll want to click on the spoon.bat file:

And within a few seconds, the app loads, no installation required:

There're existing Samples to get started:
 
There're tons of transformations:

And the thing to notice is the Big Data Transformations:
 
And some connection types:

To create a Database Connection to Hadoop, follow this URL instructions:
http://wiki.pentaho.com/display/BAD/Configuring+Pentaho+for+your+Hadoop+Distro+and+Version

What I like about the Pentaho Kettle PDI solution is the ability to get installed and up to speed quickly.  Once the application is running, you have an arsenal of Extract, Transform and Load functionality at your disposal.  And the best part is there's little to no actual coding.  It's drag and drop WYSIWYG interface allows Rapid Application Development robust solutions.

This blog was how get started and some of the features on the Community Edition of Kettle PDI from Pentaho.

Happy coding~!