To Data, a Verb

Google is a thing, it's a place, it's a company.

And it's a Verb.

I'm going to Google this.

How about data.

Data is a thing, it's a place and it takes up space, has definition and structure.

And I say it's a Verb as well.

How is that.  Any statement can be backed up with facts.

And facts are found in the data.

Let's "data" that.

What that means is, Let's back that statement up with some data.

Verb: To Data

I data
you data (informal)
he/she data's
we data
they data

Used in a sentence, "Personally I think he's full of crap, but I data'd it and he's correct."

So going forward, please feel free to use data as a verb.

Just don't forget to give me some credit for the idea.

Data Warehousing 101

To work with a Data Warehouse there are many aspects to consider.

First off, where is the source data from, what database, schema, what tables are there, and what information are you trying to bring in.

And what other data sources are available.

How can all the data sets be combined to form a logical set of data.

Once your data sources are defined, how can the data be modeled.  Are there people, things, events, time/dates, what can be measured, and how would the users like to slice it.

And you begin to build your fact tables, your sums, counts, averages, mins and maxes, often referred to as "verbs" or action.

And you design your dimension tables, typically your "nouns".

And you will need to connect your Fact tables to your Dim tables, by using keys, surrogate SK which are unique identifier integers typically and your AK fields which point back to the unique identifier of the original data.

And the data warehouses I've been involved with, around 10 - 20 tables, they form a "star schema" in that the fact tables remain in the center and the dimension tables surround it, joined by SK fields.

However there's an interim step, in that the source data must be brought into the ecosystem by use of Staging tables.  These tables don't have SK fields, they usually have a unique identifier knows as the AK field.  The raw data is brought in through ETL, Extract Transform and Loan.

From stage, the ETL process determines if the record already exists in the data warehouse, if so, it performs an Update, if not it does an Insert, sometimes using the MERGE statement, depending on data volume, as it doesn't scale well past 10,000 records in SQL Server.

So the flow is an ETL process which pulls the data from the source(s), staged in interim tables, where business rules can be applied, and then brought into the actual data warehouse where the Fact tables join to the Dim tables using Surrogate keys.

From there, they can be ported into OLAP cubes, for reporting and dashboards, written in a language called MDX.

I've been doing data warehousing for a few years now, and every day I learn something new.  There's definitely a pattern to it, which is repeatable, however, it seems to me every data warehouse is unique.

Lots of orgs could benefit by building a data warehouse, because when you have access to your data, you can discover patterns, look for trends, spot anomaly's, drill down to the details, have Key Performance Metrics, Charts, Graphs, etc.

Please contact if interested in more information!


Latest Project

I've been enjoying my latest project.

Building reports off a data warehouse using the Microsoft BI stack.

So there's searching for business rules, looking for data sources, writing SQL in SQL-Server and Oracle, building tables in Stage and Data Warehouse, ETL in SSIS, SSAS cubes and finally writing SSRS reports in MDX and Pivot Tables in Excel.

I was asking all the people I could find, where is the source of some data.  Everybody said the data didn't exist.  Except the head honcho said the data is in some of the existing reports.

So what I concluded is the data doesn't exist in electronic format, and what we were building would eventually become the source of that data.

So we built some Models, Entity's and Attributes in MDS and that is now the official source of this data.  Master Data Services, which stores the source data, is updatable from an Excel plug-in, and populates the Staging data which eventually flow through the data warehouse into the Cubes, which end up in the Dashboards.

Overall we got a lot of work done in a short amount of time.  Some long hours, good teamwork and camaraderie.

I hope after this project concludes on Tuesday, they have more work lined up for us.

And so it goes!


MDX Query Builder in SSRS

This weekend I worked on some Data Warehousing and Reports.

The SSRS reports pointed to an SSAS Cube.

So instead of writing Transact SQL, it required code in MDX.

However, there is a Query Builder which simplifies things.

It's sort of like drag and drop, from the list of Dimensions, Facts and Measures, but you can also add Parameters.

It builds the MDX parameter query for you, which is hidden from the results sets, however you can show the query and modify it.

And then set the Query as "Shared" so all reports can view the same query.

The reports each had two datasets pointing to the same cube, one query for the Chart and one for the tabular grid.

It's amazing how many levels deep the chart configuration settings go, they have a customizable setting for everything in SSRS.

At the end of the day, 6 reports were created in SSRS pointing to SSAS Cube.

Apparently in MDX a problem can be solved many ways, luckily, the MDX Query Builder does most of the heavy lifting for you.

On to the next assignment!


Misconception - Lack of Qualified Data Scientists

People say there's a lack of qualified Data Scientists to meet the current demand.

I would disagree with this statement.

I've met several intelligent qualified MIS students, some Graduate, some Bachelors, who are about to graduate, and talk of the difficulty finding a job.

They have programming knowledge, reporting knowledge as well as Data Science statistical modeling knowledge.

They claim that the number of entry level jobs are diminishing, every company is looking for well seasoned employees.

The classic, how do I get experience if nobody will hire me.

So with all these qualified candidates lined up ready to work who can't find jobs, something is not correct when people say there's a lack of supply.

It's a lack of willingness of employers to hire college grads with no experience.

And then what exactly is the point of getting a degree, even an advanced degree, if no one is willing to hire.

Just saying!


Root Cause Analysis

Root cause analysis.

I read this on a tweet this morning:

"You can state a problem a thousand ways, but you will solve nothing until you come to understand the root cause."

We've had plumbing issues since we moved into our home Jan 2008, I posted about it here:


And so, our plumbing problems still exist.  And my current plumber, who is awesome, decided to call his buddy to bring in another camera, down the drain.

So he identified the "root cause".  Apparently this house has interesting design, which doesn't allow for much room, which means they used 3" pipes instead of 4" pipes.  Issue #1.  Next, while viewing the video camera of the snake, he noticed there's a T-Square connector about 10 feet down.  Which means everything is getting caught on that, so when it does, the 2 sinks, the shower and the toilet all back up.  Issue #2.

Luckily a quick plunger will alleviate the back up rather quickly.

The down side, in order to fix the T-Square, they'll need to open up the wall in the garage ceiling and find the pipe, and then replace it with a bunch of Y pipes.

So I asked the plumber for 2 estimates, one high end the other low end.  If it costs too much, we can hold off for a while, at least we know what's causing the issue.

And after 6 years of trial and error, mostly error, we've identified the "root cause".  Not sure how the pipes got past inspection many years ago.  And I wonder how the previous owners of this house handled the issue, because it's been there since 1989, classic case of pass-the-buck I suppose.

So never give up on solving a problem, the answer is simply waiting there to be discovered.!


Top 8 Reason #BigData not Implemented at Your Org

Big Data is great.

It's captured the hearts and minds of the greatest data people today.

I've done my best to learn the concepts, the technology and the lingo.

Except I've yet to use Hadoop on the job for a real client.

And I bet there's lots of similar stories in this space.

1. Change is difficult.
2. Unknown costs both short term and long term
3. Difficult to find knowledgeable resources
4. Unproven ROI
5. Current system works just fine
6. Lack business case
7. Too much fluxuation / players in this space
8. Top management won't buy in

From my consulting view point, I still see Data Warehousing going strong.

Many / most org's don't even have a good mechanism for viewing production data, let alone need for Big Data.

Do you realize how difficult it is to concoct accurate, reliable data, in viewer friendly display, in a timely manor?  And then mash data between datasets.  Many companies are starving for traditional BI. 

They can get by with a mid size luxury sedan, they don't need the Ferrari, in order to go to the grocery store, pick up the kids for carpool and attend Church every Sunday with the entire family.  The Ferrari would be great, maybe as a second car, but for everyday living, it's just not practical for some orgs.

Would I like to program Big Data eventually?  Definitely!  Will it become mainstream?  Definitely.  Can orgs gain competitive advantage, reduce costs and find insight?  Definitely.

Perhaps just a matter of time.


Goodbye Maddie

This week, our oldest dog Maddie died.  She had stage 5 cancer, leukemia.  She was on Chemotherapy but got sick after a few weeks and could no longer walk.

Maddie was a great friend.  A super companion.  Had a heart of gold.

She will be missed.


Corporate Structure Similar to Chess Board

If you think about it, the corporate structure in any organization is similar to a chess board.

You have the King CEO at the top.  Very well protected.
The Queen, versatile yet powerful, your SVPs.
The Knights can jump and wonder all around the board, your middle managers.
Your Rooks, straight forward, deliberate with much strength, your supervisors.
And your Bishops, move diagonally, can zip across and capture like nobody's business.

And finally, the front line of pawns, your help desk support, your programmers, your sales & marketing and accounting.

The Pawns do most of the grunt work, are somewhat expendable, not much effort placed on them since they cross the entire board and are limited in their movement.  Some pawns can apply checkmate so can play a pivotal role in your org.

One team, working together with a common goal, perhaps.

So where do you fit in the chessboard of your org?


Who's Listening? #BigData

The thing is, had the powers that be went before the proper procedures and asked for permission in a public forum before those authorized to grant permission, it would never have happened.

That's why the "Data Tapping" occurred in secrecy without formal permission.

You see, it's easier to ask for forgiveness than permission.

And you cloak it around "National Security" who's going to question it.

And now that cat's out of the bag, who can stop it?

It's all happening behind the curtains.

So public figures are telling people that the "Data Tapping" is simply looking for patterns of communication.

And perhaps crossing the lines of surveillance by reading people's private emails, text, etc.

Big Data, if used for dark purposes, is a weapon.  Unfortunately there's a good chance this is the case.  Even more unfortunate, they're using it against the people they are supposed to be protecting.


Self Service Quality Assurance Team

Self Service is the newest thing since slice bread.

You can slice it, dice it, you can puree it, boil it, toast and even grill the data you need to identify insights to help run your business.

Uh, one question when you have a minute.

When you were mashing up your data sources, joining disparate data sets and creating pretty dashboards, did you happen to have the Self Service Quality Assurance team look over your numbers.

Because I last heard that your cool Visualization was just presented to the CEO and Board of Directors, and it seems they have some concern about the numbers.

You see, we have an entire Business Intelligence department who went to school for years to train, get a degree, attend seminars on weekends, read blogs and are very analytical, and they have the official numbers for the organization.

Your Visualization created in this fancy expensive software, although pretty and colorful, is void of any accuracies and contains flagrant errors.

Although you created it in between meetings, running the business and between sales calls, your effort is commendable and highly appreciated.

Just one thing though, don't ever publish your reports / dashboards to the corporate intranet site without permission.

Have a great day!

What if the Internet Went Down?

Here's something.

What if the entire internet went down?

For an entire day.

Boy that sure would be havoc and mayhem.

It's just a hypothetical question.  I don't see how it possible could.

But what if it did.  We have been adapted to the internet like a frog slowly boiled in water.  Got hot when we weren't looking.

And now we can't live without it.  Sure there's some people who stay away from computers and they'll never adapt.  Over time this number will dwindle.

Our entire lives are dependent on this free network full of massive information connecting people globally.

Could you possibly imagine the internet going down for an entire day across the entire globe?

Data is Alive

Data is alive.  It's a living thing.  It grows.  It multiplies.  It has depth.  Structure.  Free form.  Unlimited capacity.

People are hung up on the term Big Data.  If you think about it, it's actually "Macro Data".  And it's opposite is "Micro Data".  Similar to the structure of the universe.

You add up all the Micro to derive the Macro.

You can view data through a microscope or a telescope.

That would describe the "Volume" of data.

However, there's also "Velocity".  How fast is the data growing.  What is the influx rate at which the data increases in size.

Lastly there's "Variety".  And to me this seems to be the best attribute of data.  What we are really after is "Insight".  In order to derive the most insight, it helps to have lots of data or speed of new data, but the key factor in generating insight is Variety.

What we are really after is the "Holistic" view of the data.  We don't just want to know the frequency in which you called the Support Center, and the reason, how long it took to solve the problem, by whom, etc.

We want to know what products you already own, so we can tailor your customer experience to supply additional products.  We want to know you bank account info to know how much you can afford.  We want to know your purchasing habits to see how frequent you buy.  We want to know the demographics of your company, your location, your partners, your investors, your employee representation, you political beliefs, you religious beliefs, how many kids you have, did you go to college, what kind of car do you drive, do you drink milk, wine or whiskey, where did you attend high school, who were your teachers, what were your grades.

In the world of data, the more data sets available are better.  And integrating those data sets is the key.

Because every database has their own unique identifiers.  Mashing data is perhaps the most logistically challenging aspect.

I'm a firm believe if want to get good at something, watch how the experts do it.  And who are the data experts in my opinion.  The NSA.

They have access to all data.  They can mash it up.  They can keep mega warehouses of everything about you, in real time.  Their goal is National Security (perhaps).

Except they are using data as it was meant to be.  To unite the Micro Data to form Macro Data to create Holistic Data, that which comprises All Data.

To create a blueprint, a hierarchy, a network, a risk factor.  A neural network of everything, connecting all the dots, dotting every I and crossing every T.

To simulate the Universal Mind.  http://www.mind-your-reality.com/universal_mind.html

Like any tool it can be used to unite, heal, and grow.  Or can be used for dark purposes.

Either way, the goal is the same.  Holistic knowledge based on All Data available at any given time.

The insight part is more difficult, because bias, experience, motives get in the way.

Like an Artist there is room for interpretation.  Like a Scientist, there is logical practical approach to solving problems.

And the blend is done through the department know as the Chief Data Officer, the go between for IT and the Business, who reports directly to the CEO, who steers the organization Business Intelligence program, who can leverage internal employees or contract out, is responsible for data quality and data governance, who stays current with Data trends, who can hire Data Scientist, who knows the business and includes Resident Business Process experts, and will be the most important position in the org.

We are witnessing the Data Revolution, where every action will be recorded electronically, for mining purposes.

There is no more land to concur and plunder (on this planet), we've used up most of the natural resources, the next logical step is to create a new resource to mine, and that is Data.

And the Data is Alive.


Azure SQL Reporting Services In the Cloud Going Away

Recently I saw this article in the social stratosphere.

Windows Azure SQL Reporting Services will be going away.


However, Microsoft has provided an alternative, running SSRS Reporting Services on a VM in the Cloud.

I feel that approach gives the user more flexibility and more control.  Spin up the server when you need it, shut it down when you don't.

And there's less of a learning curve become most developers already know SSRS and how to maintain the server.

Except what message does that send to the community, how important is Reporting Services going forward.  What will take it's place or will there be much development to SSRS going forward.

Perhaps the new PowerBI integrated into SharePoint in Office 365 may be a new direction.

At this point, current Azure SQL Reporting Service users have some time to migrate off the cloud.

There's always change, right?


Successful Project

For our latest project, we were tasked with performance issues using Fuzzy Logic in SSIS.

Apparently, Fuzzy Grouping and Fuzzy Lookup tax the server heavily.

Both components do a great job for small data sets, except when the number increase to let's say 13,000,000 rows.

What it does it store everything in TempDB temp tables, using cursors (uhg!) and the package we were supposed to troubleshoot looked 16 times x 4 for 48 passes over the 13 million rows.

It ran in 5 days before we got there.  For both Fuzzy Grouping and Fuzzy Lookup.

Our plan of attack, investigate everything.

Hardware, software, SQL Server, Memory usage, Server configuration, background processes, VM settings, Host settings, Indexes, Queries, Locks, Blocks, Latches, Memory allocation, Paritioned tables, tweaks to SSIS components, Parallelization, etc., etc.

The first week we investigated, base lined and documented.  Second week also.  Then we provided a document to the Client with recommendations and findings and suggestions.

Third week we tried various combinations of suggestions.  Fourth week, we were still getting errors while running the package.  We ran Perfmon each run to gather metrics.

At the end of the fourth week of the contract, we finally had some success.  A combination of all the recommendations reduced the package runtime to just over 11 hours, down from 5 days.

Our job was to get it under 24 hours so they could run it twice on the weekend if necessary.

So we succeeded with the contract in budget and on time.

I still think that Fuzzy logic is not the most efficient tool in the shed, however, it serves a purpose.

I enjoyed the contract.  And I believe next week I'll start with a new client, building a Data Warehouse from scratch with some Crystal Reports.  Can't wait!

Presenting to Audiences

I presented two sessions this past weekend at SQL Saturday Tampa BI / Big Data edition.  One on Big Data and one on Office365 PowerBI.

A few weeks prior I presented on Intro to SQL and Big Data at the local IT Pro Camp.

So I was thinking, am I an expert in each of these topics?

What is an expert? 

Because someone who claims to be one usually isn't.  Same with self proclaimed Visionary's.

Big Data is hot right now and the industry is constantly changing.  Surely there are people who know the ins and outs and minute details at a deeper level.

PowerBI is brand new so the knowledge I do have from experimentation is okay to present to those who haven't been exposed to it.

And my Intro to SQL was very basic, however when asked by someone in the audience, does it get more complicated, I responded that these are the basics, like notes on a scale, you can play twinkle twinkle or you can play Mozart, depending on your skill level.

I wouldn't consider myself an expert at anything really, I know what I know, and there is so much to learn about everything.

However, I feel I know enough about a variety of topics to speak intelligently to an audience for a period of time.

I went looking for the guy who knew everything about everything and I still haven't found him yet.  In the meantime, I'll gladly discuss the knowledge I do have and give back to the community.


#SQLSat248 Tampa BI - Big Data Edition Recap

SQL Saturday #248 was yesterday.  It was a great event.  I was a volunteer and presenter and I attended 2 of the pre-conferences.

On Thursday, Bill Pearson showed us Power Pivot.  I have to say Bill is a great presenter and knows his material well.  Articulate and a great story teller.

On Friday, I attended Tim Mitchell's SSIS presentation, a definite SSIS wizard.

Saturday the day started at 5am, picking up the Krispy Kreme donuts and bringing some of the supplies to USF College of Business Building including 12 cases of water, coffee cups, 10 dozen cookies, 4 cases of chips, etc.  The volunteers pitched in getting everything set up and checking people in, did a great job.

I presented on Big Data at the 10am hour.  I had two Virtual Machines running in Hyper-V.  Hortonworks Sandbox 2.0 and Hortonworks HDP 1.3 for Windows.  I went through some slides and then a demo on Hive, Hue, moving data from Hadoop using ODBC through SSIS and then through Excel Power Query.  Overall I was happy with the presentation.

Lunch was great, tacos from Tia's Tex Mex.

Then I was a bit nervous about the 4pm presentation I was giving on PowerBI.  First off, half the lecture was doing the demo, connecting to the Office365 in the Cloud.  To prepare I charged my Verizon MiFi in order to have good connection.  However while presenting the demo, the connection got lost and there was about 5 minutes of "dead air".  The audience was patient with me until connection was restored.  I showed off Power Map, Q&A, Power Query to Hadoop HDFS, creating a Gateway and Data Source to an On-Premise SQL Server 2012 database.  Granted some of PowerBI is still in "Preview" mode.  The audience asked some questions, seemed interested in the topic, maybe next time I'll have a backup plan in case the internet connection goes down again.

Then it was raffle time, many Vendors giving away some great stuff.  There was good energy throughout the day.  I manned the Agile Bay booth on and off so I got to speak to many of the attendees.  And I chatted with the volunteers while getting ready to present.  And then it was cleanup time which went quick and easy.

I thought the venue was terrific, many USF students attended and the presentation by USF professor on Big Data was worthwhile. I had a chance to speak with him afterword's we discussed Big Data present and future, it seems they do a lot of work with Business to analyze data and write papers.

And a big "Shout Out" to Jose Chinchilla for another exceptional SQL Saturday Tampa Bay BI / Big Data edition!

Everyone get ready for the next SQL Saturday in Tampa February 2014:



Top 10 Work Related Observations

  1. Workers on the lower rungs carry most of the heavy lifting for fraction of the pay
  2. Many full time employers hire skills from the outside instead of training from within
  3. In some company's the only real way to get a big increase in salary is leave and come back
  4. When climbing the ladder, it really is who you know
  5. Most workers get labeled early in the career and it's tough to shed the stereotype
  6. Workers who excel in the technical space have difficulty moving up the ladder
  7. If you do great work, you'll end up doing most of the work
  8. It's best not to accept a counter offer when resigning from a position
  9. There may be more pressure at the top, but that's where the bonus' and stock options are
  10. Everyone is replacable


Store All Data in #Hadoop

If you've been keeping up with things lately, you'll notice that Hadoop is taking off like wildfire.

And why is that?  Because it can handle huge volumes of data, variety of different kinds of data and data acquired at rapid rates.

And how does it process data?  Batch oriented parallel processing across commodity hardware.

SQL on Hadoop
So the past year, all the excitement has surrounded SQL on Hadoop.  There 's one version called Impala which is fast because it completely by-passes Map-Reduce all together.

And there's another version called Stinger, which speeds up the queries by re-architecting some of the back end, preparing processes in advance so Map-Reduce doesn't have to spin up processes every time as well as leveraging memory.

So you can apply a metadata layer on top of HDFS, which resides in the HIVE data warehouse in H-Catalog.  The metadata is a framework to get the underlying data which can be called as Managed Data (data resides in HIVE workspace) or External where it's a pointer to the actual data.

What's Next
So what's next for Hadoop?  My guess is it's gaining ground on traditional Databases.  Perhaps Hadoop in the future will contain 'all data', including transactional.

The main barrier so far in my opinion is you can only insert data, you really can't do Updates or Deletes to the raw data in the files, you can only add to it.

There is work going on to make HiveSQL Ansi-compliant which means it will be very similar to traditional databases.  This will reduce the need for complex Map-Reduce jobs and allow more developers to get up to speed more quickly as well as leverage all the decades of experience writing SQL.

One Location
Think about the scenarios, if you already have the transactional data within Hadoop, there's less need for importing and exporting huge volumes of data, which will speed up development time.  And you won't have to structure the data on 'write', you can structure the data on 'read', if ever.

Hadoop and Business Intelligence
If you think about it, the Hadoop ecosystems contains just about ever facet of Business Intelligence today.  You can store data, cleanse data, ETL data, report on data, create dashboards on data, you can mine it, use it for predicting and clustering and you can machine learn with it.

The underlying processes for both traditional databases and Hadoop are similar.  The main difference, traditional databases max out at some point because of volume and processing power, and that's where Hadoop gets started.  So if Hadoop can handle lower volume transactional data, it can really do both functions, thus, less of a need for traditional database.  Perhaps it wouldn't extinguish them, just offer more functionality in a single ecosystem.

And we still use the mainframe today, as we use data warehousing, as we use traditional databases.  In the world of IT, nothing really goes away.  However Hadoop offers a lot of the things we need to work with data and it's gaining traction every day.

All Data
So the hype is actually turning into everyday processes and real people are getting up to speed quickly.  Time will tell how things pan out, as anything could happen.  Just saying that one day in the not too far future, Developers may be using Hadoop a lot more than they expected.


What Have I been Learning Lately?

In the past few weeks, I've been learning lots of cool things.

I learned how to create Hyper-V VMs.  In doing so, I learned how to create an Active Directory host which authenticates users to other VMs running on the laptop.

In addition, I loaded another Hyper-V for SharePoint 2010 with Performance Point.  In doing so I loaded AdventureWorks2012 and AdventureWorksDW databases in SQL-Server.  Then downloaded a sample project on the web to push the data to SSAS cubes.  Which are then exposed for querying in MDX or Performance Point or PowerView.  Or even Excel.

Next, I've downloaded Hortonworks Hadoop Sandbox 1.3 Hyper-V.  I got that working with some slight tweaks and could connect from the host machine.

Next I downloaded and installed Hortonworks HDP for Windows 1.3.  That took some time to install as there's a lot of steps.  However that's now working.  In addition, downloaded the Hortonworks ODBC and connected Microsoft SSIS to HDFS and moved data to SQL Server table.

After that, I began work with Microsoft Power Pivot, Power View and Power Map for Excel 2013.  After requesting and getting authorized, I signed up for PowerBI, which includes free downloadable Office 2013, SharePoint in the Cloud, PowerBI, Email Exchange and a lot more.  I created a Gateway to my local pc, created a Data Source pointing to SQL-Server and was able to query that data, from the Cloud, from Excel Power Query, after signing into the Account.  Not too shabby.  From there, can load into Power Pivot, Power View and Power Map.

I was also able to query HDFS from Excel Power Query.  I actually did a live demo this past Saturday with mixed results, my Laptop wasn't cooperating and Power Query froze after setting first column as headers.  I continued with the demo and showed HIVE and PIG (Grunt) from the Command Line.  And then showed off querying the web from Power Query.

Next I signed up for Windows Azure free account for 30 days.  It has SQL Servers, VMs, Mobile, Networks as well as Storage Accounts and HDInsight.  I haven't yet created anything yet, not sure how the actual billing works even on free accounts.

Lastly, I created a Yammer account for my side business Bloom Consulting.  It's cool, has lots of features, just want to get familiar with it and learn the basics.

And tonight I downloaded Hortonworks Sandbox 2.0 Hyper-V and was querying the HIVE tables within minutes of starting the VM.

There's still so much to learn.  Go deeper into Hadoop with Pig, Hive, Sqoop, Oozie, Flume, etc.  And I want to learn Power Shell still.  And lastly I'd like to kick off a c# Map Reduce job against Hortonworks Hadoop HDP 1.3 for Windows for my demo in two weeks.

So never a dull moment.  And so much to learn!

The IT Spats Continue

Since I got into IT, there has been raging debates over just about everything.

First there was VB6 vs .  Object Oriented purists wanted more than Visual Basic 4,5,6 could offer.  Microsoft ventured forward with Dot net without much concern for the VB developers.  Move foward or get left behind.  However, VB6 and ASP still exist in many IT Shops today.

Then c# vs Java.  Microsoft got in trouble with their J# so they conjured up a new (almost identical) language called c sharp.  It competes head on with Java.  What I like is MS offers a single IDE with multiple languages (VB.net, F#, C#, they even have Cobol for .net).  Java has many IDE and many flavors of Java.  I find it's easier to stay current with MS.  Also Java is no longer open source, it's been acquired and is somewhat propritary.

Then we have the battle between SQL Server vs Oracle.  Two giants going head to head, buying up every company they can find to integrate into a mammoth arsenol of offerings.  I worked with Oracle first 10 years of my career, front end seemed non existant except command line editing in SQL Plus, or purchase an expensive 3rd party vendor tool.  MS has grown from a fun app to an enterprise app since 2000.  Personally, I like SQL Server.

Traditional SSAS vs Tabular Model was more recent.  The cube purist who made some good bank $ for the past 10 years were upset that their bread and butter was being challenged by a more user friendly verison called Tabular.  You could get up to speed in hours, not months.  It ran in memory and was super fast.  Bottom line, SSAS cubes aren't going away anytime soon.  Learn both.

And now we are at Relational DB vs Hadoop. Vendor proprietary datbases verse Open Source unlimited storage, distributed architecture across commodity servers.  Hadoop picks up where Relational stops.  Huge volumes, complex data, unstructured.  For  now they both serve a purpose.  If you want transactional data with instant writes, choose OLTP.  If you want cubes choose OLAP.  And if you want to crunch huge volumes of data, use Hadoop.

And so, like sands through an hourglass, so are the days of our IT lives.

To the #Microsoft #Cloud

Within the past 2 days, I've signed up for Microsoft Cloud offerings:
  • Office365
  • Windows Azure
  • Yammer
  • PowerBI
Why so much activity lately.  While attending the SQL Pass summit, speaking with some of the Microsoft employees, it seems to me that all new offerings will be found in the Cloud first.

That is the easiest deployment model to get the max software to the people as quickly as possible.

Once vetted in the Cloud, based on user feedback, they will most likely create an On-Premise version next.  Once they determine a workable pricing methodology, as it gets complex rather quick.

What is enterprise, small business, educational, non-profit, personal use vs. business use, etc.

With that said, I'm jumping onboard the Cloud wagon with no plans of turning back.

Microsoft has provided a solid career since 1996 and their software offerings cover just about every facet of technology out there.

Just saying that soon everything will be in the Cloud as people's fear of privacy, data exploits and pricing are contained.

The Cloud offeres reliability, ease of use, availability, redundency, fail over, you name it.

To the Cloud!

Intro to #Microsoft #Azure

Today I created a Microsoft Azure account.

It was really easy.

Just click here:

I clicked on the Free Trial.

It knew who I was and pre-populated some fields.

You do have to enter your real phone number and have it send you a text message.

Enter the # and you are prompted to enter your real Credit Card info.

I did.  Then accept the terms and conditions and instantly your account is created.

I'm eager to get started, perhaps create a Database online, connect to it with SQL Reporting Services and maybe even spin up an HDInsight cluster.  Or maybe create a sample web site, create an online VM, who knows.  The possibilities are endless.
How simple was that?

Hedging Bets on #Data Space

With technology changing so rapidly, where are you putting your effort?

For me, working in the Data space, I'm hedging my bets.

I think Self Service BI is taking off.

Microsoft is introducing PowerBI, which is putting the data in the Cloud for collaboration, ease of use, User Security, Data Refreshes, which essentially point to On-Premise or other Cloud Data Sources.

Plus you have all the add-ons for Excel like Power Map, Power View and Power Pivot, a complete arsenol to have business users do the entire BI stack.  Give the end user the ablity to pull in their own data, mash it up and apply business rules, create reports and dashboards and push to the Cloud for collaboration and user assigned roles.

Along side Self Service BI, there's the big elephant in the room, Hadoop.  I've been learning this for some time now, attended a two week Cloudera course about 6 months ago.  Then dabbled in HDInsight and finally Hortonworks.  What I like the best about Hortonworks is they have ported the entire ecosystem to Windows.  Not only do the processes run in Windows Services, but we don't need to learn Linux at this point in our careers.  Win-win.  Also, you can now right .net code for Map-Reduce jobs, Java/Python not required.

Big Data has been a buzzword, I think it's gaining traction.  All CIO knew they should be doing Big Data to gain insight, except there was no roadmap and only a handful of exports.  The Hadoop offerings available today are plenty and more and more people are learning it's intracies.

Lastly, there's the bread and butter, traditional BI.  Reporting, ETL, Cubes, Dashboards.  These are the things that pay th bills, because tons of company's are swimming in this code and need experts to maintain, support, streamline and enhance.

So you could say I've got some skin in the game in all three scenarios.

How about you?  How are you ramping up your skills for the next wave in demand for IT skills?


Up and Running #Microsoft #PowerBI #MSBI

Here's a great tutorial on how to get started with Microsoft #PowerBI:

After requesting access to PowerBI, a few days later, I was granted permissions.

So I logged onto their site and began the registration processs:


Service Center:

Quick Start:



Service Health:


Message Center:


 Setup Licenses:
 Subscription Details:
Subscription Details2:
 Installation Complete:

 SharePoint in the Cloud:

Exchange Admin Center:
And finally, PowerBI in the Cloud!!!

From here, you can create a Gateway to establish a link between your PowerBI in the Cloud to your On-Premise data source:

Next, Install & Register:


Choose the download you want:

Thank you for downloading...
Finished Installing...
Register Gateway...
Enter Unique Code so they know it's you...
Specify Endpoint...
Congrats, it's installed...
Welcome, let's get started... 
 New Gateway (your connection to the host computer)...
Data Source Usage...
 Create your "data source"...
Run the wizard...
Test Connection was Successful...
Add Users/Groups who can access this new data...
View new Data Source...
Found here:  Use this link to download Power Query:
 Expose the tables in SQL-Server...
 Be sure you have the correct version of PowerQuery (there's 2 versions of Power Query)...
Now sign in using lower case OnMicrosoft account...

 Enter OData Feed... can find this on the Admin...Data Source...Edit...O Data Feed URL...
 Now sign in using Online Microsoft ID...
 Power Query found the tables from SQL Server located on my Hard Drive on the Laptop at my house...
Load Table/Data into Excel...

Goto Insert --> Power View -->

Quite impressive!