I've used the Pentaho Data Integration product, also known as Kettle a few years ago.
My Intro to Pentaho Big Data PDI Kettle
#Pentaho #Kettle #PDI CE Offering.
And this post, which is one of my all time most viewed blogs at 6,000 reads: Follow Up Post
So tonight I wanted to get re-acquainted the Kettle. You can find the download and info page here. And the project Jira here. And Pentaho Data Integration (Kettle) Tutorial here.
Initiate the download:
Unpacked the Zip file:
Click on SpoonConsole.bat
And the Windows application opens:
The best part about using this free open source software from Pentaho are the Big Data features:
I opened a sample project called "test-job.kjb":
Project loaded in the IDE:
Modified one of the steps to include an Excel file:
Save, then Run / Execute the Job:
It produced a message box:
And the job ended:
Opened another sample, "Generate product data.ktr"
Save, Run & Execute:
The project generated an output file after creating a directory above in the Transformation directory called "Output":
And looking at the output file contents:
The data was generated randomly using the Java script function:
var description=Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(4, "", "-desc", false);
var code = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(3, "PRD-", "", true);
var category = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(1, "", "", true);
var price = 150.00 + ( Math.random() * 1000 );
And mapping the field:
Pentaho Data Integration Kettle is a nice piece of software that's open source and works with Hadoop. Lot's of nice features.