6/29/2014

I Just Want Some Coffee

Customer: I'd like a coffee please.

Barista: Regular or decaf?

Customer: Regular please.

Barista: What size?

Customer: Small.

Barista: We don't sell small.  Our small is called medium.

Customer: Medium Regular coffee please.

Barista: Room for cream?

Customer: Yes, Medium, regular coffee with room for cream.

Barista: Would you like mild, medium or dark roast?

Customer: Dark roast , medium size, caffeinated coffee with room for cream.

Barista: Would you like an extra shot.

Customer: Yes please, dark roast , medium size, caffeinated coffee with room for cream with an extra shot.

Barista: Any flavors?

Customer: Yes please, dark roast , medium size, caffeinated coffee with room for cream with an extra shot, with Hazelnut flavor.

Barista: How many pumps?

Customer: Two pumps of hazelnut, dark roast , medium size, caffeinated coffee with room for cream with an extra shot.

Barista: How about something to eat?

Customer: I'd like a blueberry muffin.

Barista: How about a breakfast sandwich to go with that?

Customer: Ok, I'll have a egg sandwich, along with a blueberry muffin, with dark roast , medium size, caffeinated coffee with room for cream with an extra shot and two pumps of hazelnut.

Barista: Would you like that heated?

Customer: Yes please.

Barista: The egg sandwich and the blueberry muffin, or just the egg sandwich?

Customer: Both please.  Along with dark roast , medium size, caffeinated coffee with room for cream with an extra shot and two pumps of hazelnut.

Barista: How about buying some coffee to go?

Customer: Uh, sure.  What kind do you offer?

Barista: Would you like instant, k-kup or our traditional grinds?

Customer:  I'll take a box of k-kups, along with egg sandwich and the blueberry muffin, both heated, along with dark roast , medium size, caffeinated coffee with room for cream with an extra shot and two pumps of hazelnut.

Barista: Would you like to purchase a gift card?

Customer: Sure, how about $10.

Barista: Which card would you like, there 35 of them to choose from?

Customer: How about this green one here.

Barista: What is your name so I can write it on the cup?

Customer: My name is Barney.  Barney J. Rubble.

Barista: Ok, thank you Barney, that will be $74.54.  Would you like to use your Bux card for this purchase?

Customer: Sure, here you go.

Barista: I'm sorry sir, you only have $24 on this card, would you like to add more money to the card?

Customer: Listen, all I really wanted was a freakin' cup of coffee.  You've asked me question after question, up-selling me on every item and now the bill is approaching $75(.  You've got to be out of your mind.  I'll just go to the gas station and purchase a $1 cup of black coffee.

Barista: What was his problem?  Next customer please.

6/27/2014

Sometimes the Data is Incorrect

I bought a house in 1998.  This house cost $79k.  At the time my salary wasn't super high, so I had to cut back on expenses.  However, I wanted a cat.  So I went to the SPCA to look for a cat.  There, I saw this wonderful, big, fat, black cat.  The sign said this cat was spayed, in good health and ready to go.  So i paid the $35 and off we went.  The cat was so big that it couldn't go #2 in the litter box.  She would go right outside the box.

One day I threw a party at the new house.  All my friends showed up, brought house warming gifts, and we drank pretty good.  One of my buddy's picked up the cat, held it belly up and said, this cat has some lumps under it's belly.  That didn't sound good.  So the next day I brought the cat to the vet.  I said this cat has some lumps under her belly, are they tumors?  Then I headed to work.

Later that day, I got a call from the vet.  They said, Mr. Bloom, we have some good news and some bad news.  First of all, the cat does not have tumors.  Secondly, the cat is a boy.  Those are its genitals.  Because the cat is so fat and it's fur is all black, they're difficult to see.  Ay caramba.  Wasn't expecting that.  So I picked up the cat, whose name was AJay, luckily the name could go either way, and we went home.  There I went back to look for the paperwork from the SPCA, sure enough the cat had a girls name, the female box was checked and it indicated she was spayed, not neutered.

So what the moral of the story?  Sometimes the data is bad from the beginning.  And assumptions are made based on that information.  And sometimes the action resulting from bad data is totally incorrect.  Sometimes we discover the root cause, other times not.  So beware.

6/25/2014

Gutting Hadoop with YARN

There's a building near my home.  The building is owned by a company.  The building was probably built in the 1960's or earlier.  Lately, they've been remodeling the building to give it a fresh new feel.  The thing is, they did not bulldoze the existing building.  They simply kept the structure in place, and build around it.

Now take Hadoop.  It's core was based upon two things.  HDFS files system.  MapReduce framework.  Along comes YARN.  A layer between the HDFS (now HDFS2) and MapReduce.  Although MapReduce is still bundled into the product, it's basically one of many applications which sit atop of YARN.  So it's no longer the sole way of getting at the data / file system.

In fact, with TEZ, the code actually gets compiled down to a lower level than MapReduce, which explains the speed up.  And yes, MapReduce still runs it's legacy applications.  However, it will be compiled down to a lower level to take advantage of YARN (not sure if this is available now).

The way I heard it is that MapReduce has to spawn off multiple jobs/threads to handle complex SQL like Join or Group By.  Those extra threads have to be monitored, they have to write to disk, then aggregated and shuffled, then brought back together, slowing it down.  YARN does not use as many jobs to get the same results.  And that's one way it was able to return the queries faster.

So what they've done, is the same thing as that building down the street from me.  They've built around it using the existing framework, added a basement, new walls and made it two stories.

Very impressive.

Likewise, Microsoft has found an alternate method of storing the data instead of HFDS file system.  They use BlobStorage which allows easy scale out on the web.

So are things changing in the world of Hadoop?  Yes siree Bob.  Change is a good thing.

And so it goes~!

Just Give Me the Insights Please

Everyone is jockeying for position in the Big Data space.  As there's a lot of loot to be made.  Developers are scrambling to learn as fast as possible.  Business knows they need large data but not sure why.

So let's cut to the chase.  People want insights.  Getting from raw data to insight takes resources and technology.  Yet if people could go straight to insights, they would.  Bypass the middle man.

So let's say you need insights on a particular subject.  Wouldn't it be a lot easier to go out and purchase those insights.  "Insights as a Service".

Your CEO probably isn't too technical.  Once you start blabbing about the intricacies, the costs and the risk involved with a big data project, they don't want that.  They want insights.  Results.  Which translates to how can this big data give us more sales, less costs and streamline process.

In a sense how can it bring us more money, in order to increase the stock price, so I can get a 7 figure bonus this year.

However, getting to insight will require a person to collect the data, mash it, cleanse it, work with it, to bake the insights.

And that's where the real value is at this point in time.  Personally, I plan to be one of those people who create insights.  There's a lot of value to be added and you can earn a nice living doing so.

6/21/2014

No Computer Science Degree?

My degree is in Anthropology.  Thing is, I probably couldn't do that for a living.  First off, I only learned the basics, didn't go for a Masters degree, and I'm not sure what jobs are out there in that industry.  Perhaps work in a museum, work for an archaeology company doing digs or leading projects or doing administration.

Still, I have a 4 year degree from a major university (University of Florida, Gainesville, FL).

Even though I attended a few computer courses, my learning has been mostly self taught.  My skill set has grown over the years, revolving mostly around data, programming and reporting.  My current job requires me to build data warehouses full life cycle.  And soon I hope to be programming in Hadoop, just a matter of time.

So what do I think about the IT industry?  It changes fast.  Too fast.  The speed at which it evolves is surpassed by the complexity in which technology unfolds.  So not only is there more to learn, the pace at which one must learn is increasing as is the complexity.  Because no longer can you do one technology and survive, each technology is tightly integrated with other technology.  So if you're into data for example, you have to know about relational database, NoSQL databases, Hadoop.  Plus SQL, all the reporting tools out there, the dashboards and visualizations, the ETL tools, the reporting tools.  Plus the web, security, authority, project management, agile methodology, sales, presentation skills, interact with management and clients, the list goes on and on.

And you must keep pace with learning in addition to doing your full time job, plus have a life outside of work.

So it is a difficult career indeed.  You can never stop learning.  Or become un-marketable.

Now, I'm here to say that if you are looking for the best programming in the world, one who writes flawless code, who knows everything about everything, they may be out there.  I've seen a lot of talented people in my years and there are some brilliant people in the workforce.  Just looking at my twitter connections, it blows your mind to see how smart some people really are. 

My code style is based on maintainability so the next person can pick it  easily and figure it out.  I use the coding techniques which I know, when running into difficulty, I first scan the internet to look for existing solutions.  When none can be found, I can roll up the sleeves and troubleshoot with the best of them.  My level of knowledge is okay, I'm not really an expert in anything at a deep level, so no super star status for me.  But at the end of the day, I do quality work and the clients are generally satisfied with the results.

Had my degree been in computer science, there probably wouldn't be much change to my coding style, problem solving ability or work ethic.  And besides, I graduated in 1991, things were a bit different back then. 

So at the end of the day, nobody knows everything in the world of IT.  I make an effort to keep current, my skills are descent and I don't think having a degree in Anthropology has limited me in any capacity.

Successful Data Warehouse Projects

Working as a consultant who builds data warehouses, you get to see a lot of different organizations and how they operate.

You go in, assess the project, estimate a scope and begin work.  From my perspective one of the biggest challenges seems to be the business rules.

More times than not, the business rules are not documented.  They are embedded in people's heads or buried deep within the code.  Deciphering the business logic is the toughest part of the project, in my opinion.

At the end of the day the numbers have to match theirs.  Identifying and locating the data sources is sometimes difficult.  Translating the business logic from Access or Excel or in the programmers noggin are quite difficult.  And the exceptions.  That's the one that gets you.

What makes sense is to grab the data from the source systems.  Many times we are given views or tables pre loaded.  There are business rules hidden from view, or the views are outdated or there's missing or incorrect data, or the timing of the loads are not in synch.

There are many reasons why it makes sense to get the data from the source data repository.

Finally, to create a data warehouse you need access to the correct data, you need to understand the business rules as well as business processes and you need someone from the organization to assist with questions as they arise.

Otherwise you're just asking for trouble.  Enough said.

6/18/2014

Generalize or Specialize

The question is, do you generalize or specialize?

Do you learn surface level stuff about everything.  Or do you go deep on specific technologies and become an expert.

In today's day and age, things are changing so fast nobody can learn everything there is to know about everything.  So, you can take the approach where you learn enough to talk about any subject.  Or you pick a technology that interest you and learn everything there is to know.

So with Hadoop, there's a lot to learn.  Hadoop version 1 is already legacy, now there's version 2.  And there must be 20 individual projects associated with it now.  And there's the installation, administration, support, loading data, getting the data, security, graph databases, machine learning, SQL like actions, ETL languages, ingesting data tools, streaming data, and so on.

Plus there's dozens of third party tools which integrate with Hadoop.


How can one become an expert on everything and stay current?  So do you generalize or specialize?  That is the question.

Performance Point Dashboards

Lately I've been working on Performance Point dashboards.  A simple easy to use development tool which allows drag and drop of dimension fields and measures onto a web part widget, which goes into an Dashboard or Scorecard.

The tough part is extracting the data from the source systems through E(xtract) T(ransform) and L(oad) into a Data Warehouse.  And from the EDW into the Cube.  The Cube exposes the data which allows Performance Point to report on it.  The Dashboard gets deployed to SharePoint web in real time upon saving.  It also allows for KPIs.

I believe the Performance Point utility was aquired and integrated into SharePoint, it's a bit quirky and has some limitations especially on sizing and custom modifications.  The upside is the ease in which to get up to speed and start creating dashboards.

I think Performance Point is a good product although who knows what the plans are going forward as there's not much talk about it from what I see.

6/12/2014

3 Fears Blocking Your Move to the #Cloud

I say the Cloud is taking off.

Yet I hear many people shy away.

Hiding behind "data security", "data breaches", "HIPPA", "PCI", etc.

Fact is if you store your data with a top vendor, it's probably more secure than your on-premise data.

What's the deeper concern with the Cloud? 
  1. Change (a 4 letter word)
  2. Fear of losing their job (stagnant cushy jobs, no learning involved = lazy)
  3. It's not how we've always done it (see #1)
There's a dozen reasons why you should be thinking about the Cloud now.

Once you get passed the 3 fears.

6/11/2014

Attended Hadoop Class in Orlando

I attended the Orlando Data Science Meetup Group today 6/11/2014.

Great turnout.  I was working in Lakeland about an hour away, so the drive to Orlando wasn't too bad, even with the rain and rush hour downtown traffic.

Started with introductions, then dove into an clean install of Hadoop on a Linux box.

Note: diff between Hadoop v.1 and v.2 is they've reduced the number of folders which comprise Hadoop.

For me Linux isn't an everyday environment so I just watched the install on projector.

Meanwhile I already had Hyper-V Hortonworks VPN running a 1.3 Linux install, a 2.0 Linux install and as well as a Windows 2.0 version of Hadoop, and I don't have a Linux operating system to do the install.



After a short time the environment was setup and Hadoop was up and running.

The presenter who did the Hadoop installation, really knew his stuff and managed to do the install from scratch seamlessly.  I've poked around in the folders and config files and there's really a lot of things to know so I understand the level of knowledge and details involved.

Understanding Linux is required for the install including commands, editors, RSA keys, passwords, users, folder structures and a ton more.

They recommend using a bunch of Mac minis to set up a small cluster, cool huh?

For larger clusters, they recommend renting time on someone else's cluster.

There was mention of creating a cluster on VMs but overall there's no cheap solutions for creating, running and administrating a large cluster.

In speaking with some attendees, they recommend to download a Linux ISO and install from scratch for Hyper-V.

Here's a link: http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/10/21/installing-ubuntu-server-10-10-on-hyper-v.aspx

This was the first of three sessions, next up is Yarn, then MapReduce.

Looking forward to it~!

New SlideShare Intro to Hadoop

I just posted a new SlideShare on "Intro to Hadoop"

http://www.slideshare.net/JonathanBloom1/intro-to-hadoop-35714467

Hopefully I'll get to present it soon~!

6/07/2014

Tampa Bay Technology Meetup Groups

There's a lot of Meetup groups forming in the Tampa Bay area for advanced technology.

I'm currently member of:

Tampa Bay BI User Group (just renamed to Tampa Bay Analytics User Group) (member)
http://tampabaybi.sqlpass.org/

Tampa Bay SQL Server User Group (Pinellas + Tampa) (member)
http://www.tampasql.com/

Tampa Analytics Professionals (co-organizer)
http://www.meetup.com/Analytics-Professionals-of-Tampa/ 

Tampa Bay Cloud User Group (member)
http://www.meetup.com/TampaCloud/

Tampa Bay Hadoop User Group (asked to speak at this event)
http://www.meetup.com/Tampa-Hadoop-Meetup-Group/


Tampa Bay MongoDB User Group (member)
http://www.meetup.com/Tampa-Bay-MongoDB-User-Group/

Orlando Data Science (member)
http://www.meetup.com/orlandodata/

Tampa Tableau User Group
http://community.tableausoftware.com/groups/tampa-bay

The Tampa Bay area is definitely experiencing an up tick in advanced data technology.

Stay tuned for more updates~!

6/06/2014

My Personal Computers Over Time

What was the first interactive computer game I played?  Pong:



My next interactive electronic game?  Atari:



Our friend down the street had Intelivision which connected somehow to the main server:



What was the first personal computer I worked on?  An IBM PC original:




In the 8th or 9th grade, not sure which, we used the TRS-80:





In college, I used the VAX:





My first laptop, IBM (suitcase sized, orange screen):





My first computer on my own, an IBM PS2 (connected to AOL):



My next computer, HP:


Then a Dell:



And finally a Dell Laptop 17 in monitor:



 And a Samsung Tablet:



Then an IPhone 4:



And finally a Windows Phone:




And there you have it, my personal computer choices over a lifetime.

A blast from the past.


Mountain Living