Re-Intro to Pentaho Data Integration Kettle

I've used the Pentaho Data Integration product, also known as Kettle a few years ago.

My Intro to Pentaho Big Data PDI Kettle

#Pentaho #Kettle #PDI CE Offering.

And this post, which is one of my all time most viewed blogs at 6,000 reads: Follow Up Post

So tonight I wanted to get re-acquainted the Kettle.  You can find the download and info page here.  And the project Jira here.  And Pentaho Data Integration (Kettle) Tutorial here.

Initiate the download:

Unpacked the Zip file:

Click on SpoonConsole.bat

And the Windows application opens:

The best part about using this free open source software from Pentaho are the Big Data features:

I opened a sample project called "test-job.kjb":

Project loaded in the IDE:

Modified one of the steps to include an Excel file:

Save, then Run / Execute the Job:

It produced a message box:

And the job ended:

Opened another sample, "Generate product data.ktr"

Save, Run & Execute:

The project generated an output file after creating a directory above in the Transformation directory called "Output":

And looking at the output file contents:

The data was generated randomly using the Java script function:


var description=Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(4, "", "-desc", false);

var code = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(3, "PRD-", "", true);
var category = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(1, "", "", true);

var price = 150.00 + ( Math.random() * 1000 );

And mapping the field:

Pentaho Data Integration Kettle is a nice piece of software that's open source and works with Hadoop.  Lot's of nice features.