1/31/2013

Dabbling in C#

Today I wrote some c# code - an ASPX project.

I inherited the half baked app and was asked to complete it.

It's basically a 1 page ASPX app with 4 or 5 classes.

It's got about 5 stored procedures.

I added a new page and a new Sproc today.

It reads in the SalesForce Regions and Territory's and looks to see if any are missing from the Quota table.

If so, it creates a list on the web page.

Next step will be to write some code for user to click and enter generate 24 entries into the database.

User can go back to first page to assign quota's to regions / territory's one per month / year for both New and Renewal business.

The problem now is that SalesForce must be synched with the table, if not, report is wrong and I get tickets from the Europe Sales people.

I've been massaging this report for 8 months now.  It used to run in 7 hours, got it down to 16 minutes, then to 4.  It now runs in less than minute.  By using temp tables and table variables and indexes on temp tables.

So we are getting closer to having an accurate report.

This has been a tough exerciser because there's so many points of failure:

incorrect data entry, timing, SalesForce and Quota table not matching (spelling counts), new Regions / Territory's, changing personnel.

This report could cause me to drink.

Because every week i can be sure to get a ticket saying the report is incorrect.

And every week I send an email saying the report logic hasn't changed in over a year, it's the data causing the issue.

And today I found out the last report guy quit because of this exact report.

Nice.

1/30/2013

Becoming A Mentor

We've got a junior level web developer.

Who's been tasked with getting a web site up in running.

In Visual Studio .net.

And I heard that he's been struggling.

So I offered to assist.

So I went upstairs and asked what he was doing.

Turns out he was trying to connect to SQL-Server via .net.

And I reviewed his code.

And added sufficient privileges to SQL-Server user id.

So now he could see the database from VS.

I modified his connection string.

Ran the web, sure enough he was connected.

So I asked IT if they could load SQL-Server locally so he wouldn't have to log into VM to access the DB.

And my boss asked me to mentor the guy

So I agreed.

Funny thing, he calls me "Sir".

And so it goes!

My Intro to Linux

I grew up on Dos and Basic in 1982.

So command line is no stranger to me.

I remember when Windows first appears, rather it was OS2 when my father informed me on a new 'window' like feature to have multiple windows open within a single screen.

So working on the Windows platform for the majority of my career, with the exception of 1999-2001 we developed on Windows and deployed to Unix (Actuate Server).

So this past week I got thrown into the Linux side of the world.

And the command line is your friend.

Except it took me forever to search Google for basic command line functions.

After a few days, I was humming through the directory structure setting permission on files, copying folders, installing licenses and such.

And my cube mate brought in his Linux in a Nutshell book for me to reference - he probably got tired of me asking questions.

So after a weeks time I enjoy the world of Linux and all it has to offer.

Just imagine, learning a technology 20+ years ago and still being able to use your knowledge without much change.

Seems like an anomaly in the ever changing world of technology.

And so it goes!

1/29/2013

AI and Robotics

From where I sit, we are immersed in new technology at a staggering pace.

So what should we learn?

I stick with my core skills in order to earn a descent living:

1. Data 

  • Reporting
  • Business Intelligence
  • Big Data
  • Databases

2. Programming

  • Microsoft c#
  • ASP.net
  • Web Services
  • jQuery

3. The Cloud (nice to have)

However, what should I be learning?

  • Artificial Intelligence (AI)
  • Robotics
There may come a time in the not so distant future, where we have two groups: the humans and the machines.

The humans could become deprecated as many jobs are automated.  Denying this trend is a form of ignorance.

For computers, increased memory, larger hard drives, complex reasoning based on massive amounts of data, computers could develop brains based on logic at first.  Then move into "ability to learn and recognize patterns" as in IBM Watson.  Finally perhaps the "ability to perceive and display emotions".

Cutting edge stuff, you bet!  Possibility, sure, why not.  Reality, maybe sooner than later.

http://www.bloomconsultingbi.com/2012/05/artificial-intelligence-is-future.html

Follow Up Post

So yesterday I posted a blog entry where I got Pentaho working for Data Integration.

http://www.bloomconsultingbi.com/2013/01/solved-it.html

By upgrading to the latest version 4.4.

However, that was only partially correct statement.

I COULD connect from a CentosVM to the Cloudera Hadoop Cluster VM, that is true.

I could see the file structure, delete folders, etc.

However, I COULD NOT execute a Data Integration Spoon job for some reason.

So today, it was discovered that the /etc/host files need to be updated on both destination and calling VMs. (sudo vi hosts)

I had to translate the calling server to map it's IP address with a host name.

I had to modify the hosts.allow file to allow certain IPs.

And I had to modify the Hadoop Cluster VM to accept incoming calls from the calling IP address.

And presto, we got files in the output directory of the wordcount folder on the HDFS cluster:



1/28/2013

Solved It!

Well, I have to say this was a tough one.

And it turns out that I needed to download the most recent version of the software.

Because I originally downloaded the Cloudera cdh4 image, got that working.

Then downloaded Pentaho BI server and got that working.

Then downloaded Pentaho Data Integration server and was not able to connect to Hadoop.

So on today's conference call I mentioned the file I downloaded was the 4.3 and the Pentaho sales person said I should download the 4.4 because that has all the settings pre-configured.

So by doing it the hard way I essentially learned the guts of the infrastructure, similar to removing a car's engine and re-assembling it just to see how it works.

So tonight aroun 8:45 after working on this all afternoon, the thing connected to the Hadoop server.

So tomorrow I'll try and kick off a Hadoop Map Reduce job.

Success!  The hard way.

http://www.bloomconsultingbi.com/2013/01/download-correct-version-of-software.html

Download Correct Version of Software

So today I had conversation with the folks at Pentaho.

The conference call was to help me install the Pentaho Data Integration eval software.

I basically downloaded the incorrect version the other day, so I actually need to install the 4.4 version.

Which should download the necessary software components I need to connect to Hadoop.

However, the call went well and the product allows a graphical user friendly canvas to create connections to and from Hadoop without having to write much, if any, straight java code.

It looks promising.  So I'll be downloading the software ASAP.

And that's the story!

Right Data, Wrong Analysis

After reading this article, it validates what I've been saying.

You can have the right data, at the same time, you can have the wrong analysis.

Because people have inherent biases.

They may have personal business experience which can skew the results.

They may lack certain analysis skills which produce faulty results.

I say let the machine's analyse the data and produce the results.

Leaving the analysis up to error prone humans can be risky.

And so it goes!

1/27/2013

Find the Root Cause

In today's society, when confronted with an issue we try to solve the problem in the wrong way.

We treat the symptoms and not the actual problem.

For example, if a faucet is running, instead of find the root cause, which would be to turn the valve to shut it, instead we treat the symptoms, excessive water.

That's one of the biggest problems I see today.

And many so called experts are treating the symptoms instead of the problem.

And earning lots of money in doing so.

But I'd go a step further and say that no only do they attempt to solve the symptoms instead of the problem, they in fact make the situation worse off than if they did nothing.

You can apply this rationale to programming, IT, to the medical industry, veterinarian and dental, plumbers, you name it.

Treat the symptoms, not the problem, charge ridiculous prices and make the problem worse in the end.

Solution, find the root cause and don't take the lazy way out.

1/25/2013

Planes, Trains and Big Data

If you've ever seen the movie with John Candy called Planes, Trains and Automobiles, then maybe you can relate.

You see, I've been producing my own comedy of errors this week.

I've been attempting to get Pentaho working by connecting to a Hadoop cluster on Cloudera.

I ran into every possible road block that you can imagine.

I was able to connect to hadoop via the command line:

hadoop fs -ls hdfs://localhost:8020/

As well as the web page.

However, I could not connect via the Pentaho Spoon application on Linux.

So I customized several settings after researching all the articles on Google with people claiming to know the solution.

Well, as luck would have it, after applying one change bug fix, 3 more bugs cropped up.

Think spiraling downward into the infinity of the abyss.

Well, maybe not that dramatic.

What I did do was to learn many aspects of Linux Hadoop settings as well as Pentaho ecosystems.

I'm fairly intimate with these new products, we are on a first name basis.

And I never did Linux prior to this week minus a few ls commands.

So the Linux guy responsible for Big Data initiated a conference call with one of the sales/technical people.

Very nice guy and very smart, however, it's still broken

And I put in 5 additional hours since the conference call.

I basically started over, a clean slate, by installing a new VM with Cloudera, downloading the Spoon application.

No luck, different errors now.

Son now I've tried just about every combination there is to try.

I had to modify xml config files through VI, edit properties files, move jar files, download and install jar files, try "localhost" "localhost.localdomain" "127.0.0.1" "192.168.186.129".

I've read hundreds of potential bug fixes on Google.

I've probably tried 1000 different combinations in the past few days.

And you now what, I'm a heck of a lot smarter now that before I started.

Sure the project isn't working 100% yet, but I did get about 75% there.

And maybe the problem is some undocumented fix or typo or something trivial.

I'll try again next week.

Regardless, I feel good about this weeks accomplishments.

Although I may watch the movie Planes, Trains and Automobiles this weekend just to relive my work week.

However, I do see Big Data on the horizon and all the research I'm doing now should pay dividends down the road!

1/24/2013

Summary of Hadoop (1st week)

Working with Hadoop the past week, really getting some good exposure to Linux.

First time for me basically so every step, which should take 10 seconds, takes me 10 minutes or even hours.

Researching commands on Google.

However I'm getting the hang of some of it and can really fly through the directories.

Reminds me of the good old days in DOS in the early 1980's.

So to summarize, I got Hadoop cluster up and running on the Cloudera VM.

Got a sample WordCount class turned into a Jar file and initiated a Map Reduce job in Hadoop file system producing output files.

Then played with Hive and Pig.

Then worked with Pentaho dashboards and reports.

And then Pentaho Spoon which is a graphical tool to connect to a variety of data sources including Hadoop, running jobs, and exporting to output datasource.

I tried to get my first example to run in this environment and ran into several stumbling blocks.

Turns out the out of the box configuration needs to be tweaked and I think I over tweaked it.

However I got it back to it's original state by setting the fs.default back to 0.0.0.0:8020.

In order to get the Pentaho job to run though I think it will need to be modified to perhaps localhost or 127.0.0.1 - time will tell.

It's been quite a learning experience.

Similar to being dropped off from a helicopter in foreign territory and having to find your way back.

It can be done, but it's slow going.  I've run into just about every obstacle so far except I just keep pushing forward in the hopes that everything will work.

And so it goes!

1/23/2013

My Intro to Pentaho Big Data PDI Kettle

Pentaho's Big Data solution revolves around Kettle PDI.

Here's the webpage I viewed to get started:

http://infocenter.pentaho.com/help/index.jsp?topic=%2Fgetting_started_with_pentaho%2Ftopic_introducing_bi_suite.html

And here's the download page:

http://wiki.pentaho.com/display/BAD/Downloads

I'm working on a Cloudera VM with Linux Ubuntu so I chose that version to download:

Next step is the install...

Set the permissions on the downloaded .bin file:$ chmod +x ./pdi-ce-4.3.0-stable.tar.gz

To extract the archive to the current folder:

tar -xvf archive_name.tar.gz

To Copy the contents of the new folder to the /opt/Pentaho/server directory:

cp -r ./data-integration/* /opt/pentaho/server/data-integration

To remove the directory's in Linux use the command:

rm -rf

To launch the Pentaho Data Integration utility, type ./spoon.sh from the /pentaho/server/Data Integration/ folder:


Open one of the Sample File using the File --> Open drop down:



 

For a new report, select the repository Type:



I clicked the 'NEW' button which spawned a new screen:



http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions

http://diethardsteiner.blogspot.com/2011/05/kettle-sourcing-data-from-hadoop-hive.html

Creating my own Hadoop Job Executor, it seemed to run without exceptions, however, it didn't produce the expected output directory contents.


It seems there are many options involved with this utility and I only got to the tip of the iceberg.

1/22/2013

Learning Growth Spurts

Working in IT we learn the skills needed to do our jobs.

And then we taper off.

I learned Visual Basic, Oracle and Crystal Reports in 1995/6.

Learned Microsoft ASP in 1999.  Vantive and Actuate too.

Picked up .net early 2000s.  SQL-Server & DTS packages too.

Learned Java, web services, project management in 2006.

And soon after learned Microsoft BI (SSIS, SSRS and some SSAS).

So since 1996, I've only had a few growth spurts.

Except in the past month, I've picked up Cloud, Azure, OData, some jQuery, Linq because I installed Visual Studio 2010 and got some of the examples working which I blogged about.

And this past week, I've dabbled in Cloudera Big Data, Map Reduce, Hive, Pig and today got some Pentaho as well.

This year is rocking and rolling with new technology.

It's good to have that feeling of adventure and curiosity burning again.

I hope the trend continues!

Getting Started with Pentaho Big Data Analytics

First thing to get started with Pentaho is download their product:

http://www.pentaho.com/download/

I chose the Big Data Analytics option:



Set the permissions on the downloaded .bin file:

$ chmod +x ./pentaho-business-analytics-4.8.0-GA-x64.bin

Run file:

$ ./pentaho-business-analytics-4.8.0-GA-x64.bin









However, the BI server did not load, so I decided to uninstall everything.  Done.

Then re-installed, invalid or missing product keys, uh oh.  So I called Pentaho sales people, Todd, who immediately pulled me up in SalesForce.com on their side and knew all my contact info.  He sent over the 7 key files and I downloaded them.

However, next task was to upload them to the VM, which I used a product called WinSCP:



And then copy the files to the correct folder: cp -a /source/ /destination/

And then had to re-install the license keys - found this article and followed the steps...


 
you must be using the postgres user, then start the server...
 
To start the Enterprise Web Service:  ./start-pec.sh

Once installation completes, log onto the web URL: LocalHost:8089
UserName: admin
Password: #######


Pentaho Data Integration Server URL: LocalHost:9080




Creating something new: 








So far so good.  Except now I have to figure out the Big Data features and how it can generate Java Jar files using just visual interface point and click (I hope!).



1/21/2013

Cloud-Sourcing is the Future

With all the talk lately about the Cloud finally taking root, I think the next big thing will be 'Cloud-Source'.

What that is, is using the cloud to leverage resources, assets, time and money, by moving work to the cloud.

Because by outsourcing your infrastructure, you pay as you go, based on usage.

And you free your need to buy servers, maintain servers, patch servers, disaster recover servers, backup servers.

And you can now hire developers from anywhere who do work anytime per the project or long term.

So you can outsource to different states, countries, you name it.

And you can host your database in the cloud, aside from legal issues in other countries and the possible chance that your vendor goes out of business, you are safe and secure with data in the cloud.

You can have your Business Intelligence in the cloud as well, generating reports up in the sky, against databases there as well.

You can have your big data initiatives in the cloud.  And hire people with the skills to work on it, they can be located anywhere on the planet.

And of course the webs can be hosted there too.  Not only that, with virtual machines, you can basically run any on just about any platform with any language and any web server.

And it's all elastic, meaning you can start off small and grow as big as you want.  You can rent a server for an hour or 30 servers for an hour.

You only get charged for what you use.

Lastly, I believe there's a market for middlemen, who know how to best navigate in the cloud.  And they can sell their services to small to mid size company's, even large ones down the road, where these service providers migrate apps to the web, can explain pricing models and estimates, and have their teams of programmers take care of the leg work up in the cloud.

That's the way I see it.  It's basically come down to this.

There are really no more barriers to moving to the cloud so what are you waiting for?

Put your infrastructure in the cloud, and forgetta about it.

Make it so!

My Intro to Cloudera Pig (Day 3)

To install Pig visit this URL:

https://ccp.cloudera.com/display/CDHDOC/Pig+Installation

 Typing in the commands we get...


Set the classpath to include the Pig/lib directory:

 sudo vi /usr/lib/hadoop-0.20-mapreduce/conf/hadoop-env.sh
cntl-A to get to end

append line Hadoop_Classpath:   :/usr/lib/pig/lib
:WQ
quit

And here's the documentation for Pig:

https://ccp.cloudera.com/display/CDHDOC/Pig+Installation

*(Here's my Day 3 Post) http://www.bloomconsultingbi.com/2013/01/my-intro-to-cloudera-hive-day-3.html
*(Here's my Day 2 Post) http://www.bloomconsultingbi.com/2013/01/cloudera-hadoop-day-2.html
*(Here's my Day 1 Post) http://www.bloomconsultingbi.com/2013/01/first-try-at-cloudera-hadoop.html
*(Here's my Day 0 Post) http://www.bloomconsultingbi.com/2013/01/getting-started-with-cloudera-hadoop.html

My Intro to Cloudera Hive (day 3)

Today I installed the Hive component of the Cloudera Hadoop ecosystem.

Hive Documentation:

http://archive.cloudera.com/cdh/3/hive/

Hive Installation:

https://ccp.cloudera.com/display/CDHDOC/Hive+Installation


Install the necessary files to the directory: /usr/lib/hive/lib

 
Add:  /usr/lib/hive/lib/* to the HADOOP_CLASSPATH so it can reference the necessary JAR files to run HIVE...
 
Running Hive:
 
 
$ hive


Run the script:


and it gives the results:


And now you can download sample data files from this URL:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ExampleQueries

My first impression is that HIVE is a pseudo SQL like language.

You 'create' the table, load the data, do 'selects' against the data and then 'drop' the table.

It still has a 'structured' feel to it.

So much to learn.

Having fun!

*(Here's my Day 2 Post) http://www.bloomconsultingbi.com/2013/01/cloudera-hadoop-day-2.html
*(Here's my Day 1 Post) http://www.bloomconsultingbi.com/2013/01/first-try-at-cloudera-hadoop.html
*(Here's my Day 0 Post) http://www.bloomconsultingbi.com/2013/01/getting-started-with-cloudera-hadoop.html

1/18/2013

Is Business Intelligence Mission Critical?

Today I was asked if my software was mission critical.

Can the business continue if we don't have access to Business Intelligence?

Um, to me it is.

In an ideal world, the data is driving the business.

Except in reality the business can continue even if the BI system goes down for a day or week.

Maybe one day this won't be the case but for now it's true.

Sure the internal customer's won't be happy if they can't skim the data.

But truth be told, business can continue, although briefly, without reports.

And there you have it!

Cloudera Hadoop (Day 2)

So today is day 2 of Cloudera Hadoop.

Today I learned to set the ClassPath by

export HADOOP_CLASSPATH=path of JAR / Class Files

Learned that the file this gets saved to is called hadoop-env.sh and can be found in the directory:



/Usr/lib/hadoop-0.20-mapreduce/conf

Learned a basic Linux command "pwd" tells you your current path in the directory structure.

To modify the contents of the file:
sudo vi /usr/lib/hadoop-0.20-mapreduce/conf/hadoop-env.sh
cntl-A to get to end
:WQ
quit

To simulate the admin:
sudo su
exit

To kick off a Hadoop job you enter:

hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/input /user/cloudera/output

the input directory must already be created in HDFS

hadoop dfs -mkdir /user/cloudera/input

and you must copy your test files to that location on the HDFS server:

hadoop dfs -copyFromLocal /home/cloudera/wordcount/input /user/cloudera/input

assuming you change your directory / folder name accordingly.

and you must first clear the contents of the output directory:

hadoop dfs --rm -r /user/cloudera/output
hadoop dfs -rmdir /user/cloudera/input

if you log onto the Hadoop website, localhost, you will see your input folder as follows:


Here's a posting from day 1 http://www.bloomconsultingbi.com/2013/01/first-try-at-cloudera-hadoop.html

And I got the tutorial example to work on my virtual machine! Yippie!!!!

1/17/2013

First Try at Cloudera Hadoop

Today I entered the world of Big Data.

By installing Cloudera VM to my work PC.

And posted a blog about that:  http://www.bloomconsultingbi.com/2013/01/getting-started-with-cloudera-hadoop.html

This is research and development for my job as I'll be working with Hadoop and possibly Splunk in my new role.

In order to mash weblog data in the form of Big Data with our current Enterprise Business Intelligence data.

So I started on the initial project to get acclimated with the environment.

They provide the sample code, however, that doesn't necessarily mean 'easy'.

First thing I did was to create a text file, add some Java code into a class and save the file.

Next step was to compile the java class into a Jar file and associated java files.

Easier said than done.  Because getting familiar with the Linux file structure takes some time.  As well as the commands and classpaths and what have you.

I eventually compiled everything which led to my next road block.

It seems the Hadoop system was not reading my new JAR file, probably because the path location was not indicated in the Hadoop Classpath, which it wasn't.

I found out we have a resource in house that is familiar with this and I pinged her.

She said to create an environment variable and append my new path to the current Hadoop Classpath.

Key thing is to use `back ticks' in the form of:

export MY_CLASSPATH=`hadoop classpath`:/home/cloudera/wordcount_classes/\*

or something to that effect which you can then reference as $MY_CLASSPATH

So that worked and I'm on to my next roadblock.

Which is my understanding of how the sample is supposed to work.

However that is enough for today.

The cool thing is this project is sponsored internally so I'm authorized to learn this stuff.

I'd have to say my 4 years of Java coding is paying off a bit, however, it would probably go smoother if I had more experience on Linux.

I'm real excited to get into this space I've been reading and blogging about it for a long time.

And there you have it!

Getting Started with Cloudera Hadoop

Getting started with Cloudera Hadoop.

First go to this site:

https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM+for+CDH4

I chose to download the VM:

UnRar the package:


Creates the following Directory / files:


Download the VM Player:

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/5_0








Running the VM:





Hadoop Tutorial:

https://ccp.cloudera.com/display/DOC/Hadoop+Tutorial

Now we're ready to begin (the fun)!

Enjoy!

Get Sh#t Done!