9/07/2015

Re-Intro to Pentaho Data Integration Kettle

I've used the Pentaho Data Integration product, also known as Kettle a few years ago.

My Intro to Pentaho Big Data PDI Kettle


#Pentaho #Kettle #PDI CE Offering.


And this post, which is one of my all time most viewed blogs at 6,000 reads: Follow Up Post


So tonight I wanted to get re-acquainted the Kettle.  You can find the download and info page here.  And the project Jira here.  And Pentaho Data Integration (Kettle) Tutorial here.


Initiate the download:




Unpacked the Zip file:




Click on SpoonConsole.bat




And the Windows application opens:




The best part about using this free open source software from Pentaho are the Big Data features:




I opened a sample project called "test-job.kjb":




Project loaded in the IDE:




Modified one of the steps to include an Excel file:




Save, then Run / Execute the Job:




It produced a message box:





And the job ended:






Opened another sample, "Generate product data.ktr"




Save, Run & Execute:




The project generated an output file after creating a directory above in the Transformation directory called "Output":




And looking at the output file contents:




The data was generated randomly using the Java script function:


java;


var description=Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(4, "", "-desc", false);

var code = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(3, "PRD-", "", true);
var category = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(1, "", "", true);

var price = 150.00 + ( Math.random() * 1000 );


And mapping the field:




Pentaho Data Integration Kettle is a nice piece of software that's open source and works with Hadoop.  Lot's of nice features.