To Data, a Verb

Google is a thing, it's a place, it's a company.

And it's a Verb.

I'm going to Google this.

How about data.

Data is a thing, it's a place and it takes up space, has definition and structure.

And I say it's a Verb as well.

How is that.  Any statement can be backed up with facts.

And facts are found in the data.

Let's "data" that.

What that means is, Let's back that statement up with some data.

Verb: To Data

I data
you data (informal)
he/she data's
we data
they data

Used in a sentence, "Personally I think he's full of crap, but I data'd it and he's correct."

So going forward, please feel free to use data as a verb.

Just don't forget to give me some credit for the idea.

Data Warehousing 101

To work with a Data Warehouse there are many aspects to consider.

First off, where is the source data from, what database, schema, what tables are there, and what information are you trying to bring in.

And what other data sources are available.

How can all the data sets be combined to form a logical set of data.

Once your data sources are defined, how can the data be modeled.  Are there people, things, events, time/dates, what can be measured, and how would the users like to slice it.

And you begin to build your fact tables, your sums, counts, averages, mins and maxes, often referred to as "verbs" or action.

And you design your dimension tables, typically your "nouns".

And you will need to connect your Fact tables to your Dim tables, by using keys, surrogate SK which are unique identifier integers typically and your AK fields which point back to the unique identifier of the original data.

And the data warehouses I've been involved with, around 10 - 20 tables, they form a "star schema" in that the fact tables remain in the center and the dimension tables surround it, joined by SK fields.

However there's an interim step, in that the source data must be brought into the ecosystem by use of Staging tables.  These tables don't have SK fields, they usually have a unique identifier knows as the AK field.  The raw data is brought in through ETL, Extract Transform and Loan.

From stage, the ETL process determines if the record already exists in the data warehouse, if so, it performs an Update, if not it does an Insert, sometimes using the MERGE statement, depending on data volume, as it doesn't scale well past 10,000 records in SQL Server.

So the flow is an ETL process which pulls the data from the source(s), staged in interim tables, where business rules can be applied, and then brought into the actual data warehouse where the Fact tables join to the Dim tables using Surrogate keys.

From there, they can be ported into OLAP cubes, for reporting and dashboards, written in a language called MDX.

I've been doing data warehousing for a few years now, and every day I learn something new.  There's definitely a pattern to it, which is repeatable, however, it seems to me every data warehouse is unique.

Lots of orgs could benefit by building a data warehouse, because when you have access to your data, you can discover patterns, look for trends, spot anomaly's, drill down to the details, have Key Performance Metrics, Charts, Graphs, etc.

Please contact if interested in more information!


Latest Project

I've been enjoying my latest project.

Building reports off a data warehouse using the Microsoft BI stack.

So there's searching for business rules, looking for data sources, writing SQL in SQL-Server and Oracle, building tables in Stage and Data Warehouse, ETL in SSIS, SSAS cubes and finally writing SSRS reports in MDX and Pivot Tables in Excel.

I was asking all the people I could find, where is the source of some data.  Everybody said the data didn't exist.  Except the head honcho said the data is in some of the existing reports.

So what I concluded is the data doesn't exist in electronic format, and what we were building would eventually become the source of that data.

So we built some Models, Entity's and Attributes in MDS and that is now the official source of this data.  Master Data Services, which stores the source data, is updatable from an Excel plug-in, and populates the Staging data which eventually flow through the data warehouse into the Cubes, which end up in the Dashboards.

Overall we got a lot of work done in a short amount of time.  Some long hours, good teamwork and camaraderie.

I hope after this project concludes on Tuesday, they have more work lined up for us.

And so it goes!


MDX Query Builder in SSRS

This weekend I worked on some Data Warehousing and Reports.

The SSRS reports pointed to an SSAS Cube.

So instead of writing Transact SQL, it required code in MDX.

However, there is a Query Builder which simplifies things.

It's sort of like drag and drop, from the list of Dimensions, Facts and Measures, but you can also add Parameters.

It builds the MDX parameter query for you, which is hidden from the results sets, however you can show the query and modify it.

And then set the Query as "Shared" so all reports can view the same query.

The reports each had two datasets pointing to the same cube, one query for the Chart and one for the tabular grid.

It's amazing how many levels deep the chart configuration settings go, they have a customizable setting for everything in SSRS.

At the end of the day, 6 reports were created in SSRS pointing to SSAS Cube.

Apparently in MDX a problem can be solved many ways, luckily, the MDX Query Builder does most of the heavy lifting for you.

On to the next assignment!


Misconception - Lack of Qualified Data Scientists

People say there's a lack of qualified Data Scientists to meet the current demand.

I would disagree with this statement.

I've met several intelligent qualified MIS students, some Graduate, some Bachelors, who are about to graduate, and talk of the difficulty finding a job.

They have programming knowledge, reporting knowledge as well as Data Science statistical modeling knowledge.

They claim that the number of entry level jobs are diminishing, every company is looking for well seasoned employees.

The classic, how do I get experience if nobody will hire me.

So with all these qualified candidates lined up ready to work who can't find jobs, something is not correct when people say there's a lack of supply.

It's a lack of willingness of employers to hire college grads with no experience.

And then what exactly is the point of getting a degree, even an advanced degree, if no one is willing to hire.

Just saying!


Root Cause Analysis

Root cause analysis.

I read this on a tweet this morning:

"You can state a problem a thousand ways, but you will solve nothing until you come to understand the root cause."

We've had plumbing issues since we moved into our home Jan 2008, I posted about it here:


And so, our plumbing problems still exist.  And my current plumber, who is awesome, decided to call his buddy to bring in another camera, down the drain.

So he identified the "root cause".  Apparently this house has interesting design, which doesn't allow for much room, which means they used 3" pipes instead of 4" pipes.  Issue #1.  Next, while viewing the video camera of the snake, he noticed there's a T-Square connector about 10 feet down.  Which means everything is getting caught on that, so when it does, the 2 sinks, the shower and the toilet all back up.  Issue #2.

Luckily a quick plunger will alleviate the back up rather quickly.

The down side, in order to fix the T-Square, they'll need to open up the wall in the garage ceiling and find the pipe, and then replace it with a bunch of Y pipes.

So I asked the plumber for 2 estimates, one high end the other low end.  If it costs too much, we can hold off for a while, at least we know what's causing the issue.

And after 6 years of trial and error, mostly error, we've identified the "root cause".  Not sure how the pipes got past inspection many years ago.  And I wonder how the previous owners of this house handled the issue, because it's been there since 1989, classic case of pass-the-buck I suppose.

So never give up on solving a problem, the answer is simply waiting there to be discovered.!


Top 8 Reason #BigData not Implemented at Your Org

Big Data is great.

It's captured the hearts and minds of the greatest data people today.

I've done my best to learn the concepts, the technology and the lingo.

Except I've yet to use Hadoop on the job for a real client.

And I bet there's lots of similar stories in this space.

1. Change is difficult.
2. Unknown costs both short term and long term
3. Difficult to find knowledgeable resources
4. Unproven ROI
5. Current system works just fine
6. Lack business case
7. Too much fluxuation / players in this space
8. Top management won't buy in

From my consulting view point, I still see Data Warehousing going strong.

Many / most org's don't even have a good mechanism for viewing production data, let alone need for Big Data.

Do you realize how difficult it is to concoct accurate, reliable data, in viewer friendly display, in a timely manor?  And then mash data between datasets.  Many companies are starving for traditional BI. 

They can get by with a mid size luxury sedan, they don't need the Ferrari, in order to go to the grocery store, pick up the kids for carpool and attend Church every Sunday with the entire family.  The Ferrari would be great, maybe as a second car, but for everyday living, it's just not practical for some orgs.

Would I like to program Big Data eventually?  Definitely!  Will it become mainstream?  Definitely.  Can orgs gain competitive advantage, reduce costs and find insight?  Definitely.

Perhaps just a matter of time.


Goodbye Maddie

This week, our oldest dog Maddie died.  She had stage 5 cancer, leukemia.  She was on Chemotherapy but got sick after a few weeks and could no longer walk.

Maddie was a great friend.  A super companion.  Had a heart of gold.

She will be missed.