Data Solutions Depend on the Problem You are Solving

Data is the raw resource.

Along the factory line, it gets mashed, converted, manipulated, updated, aggregated and finally exported into a final product.

A production line similar to a manufacturing plant, which takes raw materials, creates something in a series of repeatable steps, into a finely finished product.

This method gets applied to the reporting life cycle.

Whether it's traditional reporting, Enterprise Data Warehousing or Big Data.

Raw data comes in, it's massaged, perhaps aggregated and finished into an end product.

Typically the end product is a report, an Excel Spreadsheet, a PDF, a CSV File, a Dashboard, Graph, Chart or raw data dump.

So to say that one methodology is better than another is not quite accurate.

Doesn't it really depend on your business case, what questions you are trying to answer.  Wouldn't that depend on what Reporting methodology you choose.

If Business Users want to discover data trends, wouldn't Self Service be optimal?

If CEOs wanted Graphs an Charts, with drill down capabilities, wouldn't Traditional Reporting work?

If you have huge volumes of data, wouldn't Big Data options suffice?

So as you can see, all methods work, depending on your scenario.  Is one better than another?

Is a fork better than a spoon?  Not for eating soup.  Is a knife better than a spoon?  Not for cutting meat.  Each has value depending on your needs.

When the spoon was created, did people say, "This triggers the end of the fork."?  Nonsense.

Thus Big Data will solve problems which Relational Databases could not.  And NoSQL databases will solve other problems.  And Relational Database will solve others.

I don't see the death of any data platforms at the moment.  In fact I see the number of options exponentially increasing.

We're all in this together, to form Order out of Chaos, in the world of Data.  Data Professional have good job security at the moment, justly so, as we are witnessing the Data Revolution ~!

Descriptive Data Forms Relationships with Other Data

We as humans have 5 senses, not including Intuition.  They are as follows (from http://en.wikipedia.org/wiki/Sense)

2.1 Sight
2.2 Hearing
2.3 Taste
2.4 Smell
2.5 Touch 

Data is a thing.  It can't be heard, tasted, smelled or touched.  It can only be seen.  That's very limiting in the fact that it's responsible for so much.  Our brains have a limited ability to grasp data because it can only be viewed.
  Therefore our understanding may be limited.

However, I believe data is similar to Energy.  Like energy, it can be stored, it can remain at rest, or it can be transferred.  So Databases are simply storehouses for energy units called data.

And because the data is at rest, it's potential energy, waiting to be released / consumed.  You exert gravitational energy by issuing a SQL Query command.  Where a copy of the energy is sent to the requester, thus releasing it into the wild.

The speed at which that energy / data is transferred is dependent on the specifications of the host server, the amount of RAM, as well as the network connecting the two entities.  The size of the data also makes a difference.  And at it's most simple form, data is a series of bits and bytes.  There is work today to place this onto a single atom.

Which is good because volume of data is accumulating at phenomenal pace.  We're going to need a way to store this energy somewhere.  So it can be accessed, interrogated and consumed.

Data can be stored in a variety of formats.  There's no universal standard dictating how it should be stored.  This poses problems downstream.  It would be nice to have a standard for storing Public data.  In doing so, there would need to be universal storing definition language adhered to by Data Professionals.  However, with the rise of UnStructured Data, this complicates matter.  How do you store Pictures, a standard format type, what pixel level, etc.  Obviously this would be a difficult task but a standard could be introduced and implemented at a specific point in time where people must comply.  Because right now it's a free for all.  And having data mix with foreign data sets is extremely difficult.

Data shouldn't be monogamous.  It should have multiple partners.  It should be able to see and communicate with all sorts of other data.  To form relationships.  Establish patterns.  Based on Fuzzy logic.  And that data should form a network.  As data should have built in descriptive information about what that piece of data consists of.  Where was it originated from, by whom, when, how and why.  Other characteristics should be embedded within it as well.  So data can communicate with other data, "hey, I'm a piece of data with these characteristics, you there, do you have similar characteristics, if so, let's create a bond between us, so other people can associate a link between us for later use.  It's that relationship between the data that has the real value.  Graph technology could form networks of similar data points to find patterns not visible with the naked eye.

Going forward, Models could interrogate those relationships and build a storehouse of information, thus training the Model, for true Artificial Intelligence.  Because once the Model is trained, it can be used for a variety of purposes.  Where it grow in intelligence, can determine choices based on probabilities and past experience, and can learn new things which get added to its memory banks.

And one day, that Model could exhibit human like qualities, of thoughts and feelings, to eventually become a conscious life form.  The birth of a new being, smart, clever, with a good memory, ability to learn, to evolve and grow.

But in order for this to happen, data needs to have standards, data needs to be self describing, data needs to form links to other data based on similarities, and those common links can be used to build Models, which can be trained, to perform true Artificial Intelligence.  I think it's possible.

Data Scientist are this generations Astronauts

Where are we heading?

Into the world of data.

We've been extracting data for years.  Into fancy reports and dashboards.  A bit of self service.  Now big volumes of data with parallel processing queries.

That's provided a lot of careers for a lot of people.

And everyone is hopping on the 'big data' band wagon.

Everyone is now a 'data scientist'.

Reporting, BI and Data Warehouse will continue to exist, transform and grow.

Yet the real power is in combining Math, Statistics with the Data and Business Domain Knowledge expertise.  That's an all-star combination.  And that's called a 'data scientist'.  Someone who really understands the math behind the algorithms.  It does require advanced math and statistics.  There's no shortcut around this.

And what do these people do all day?  They solve problems.  They theorize.  They write code to sample data.  They analyze and interpret the results.  They set hypothesis, research and form conclusions.  It really is a 'science'.  And not everybody is qualified.

I'll be the first to admit, I'm not a data scientist.  There's a good chance that you aren't either.  But for the professionals who possess the necessary skills, they are held in high regard, they will form the future of our society, by building neural networks, by writing algorithms and getting another step close to true Artificial Intelligence.  AI as in creating a 'consciousness', not regurgitating huge volumes of data and winning chess match / Jeopardy matches.

I believe that the 'Data Scientist' are this generations Astronauts.  For they are exploring uncharted territory, possess great skills, high intelligence, produce results, and pave the foundation for the next generations.


Self Service BI vs Traditional BI

Self Service BI is cool.  No question about it.

Microsoft's Power BI is a fun way for business users to download, manipulate and analyze data.

And typically the person doing the work is also the person viewing the results.  And analyzing the data.

There is very little "coding" involved.  Mostly point and clicking.

And this model serves a purpose.  It allows Business Users, Data Analysts, Power Users to be a self contained Business Intelligence army.

It also allows for quick turnaround, user friendly easy to use techniques and end product can be uploaded for collaboration, data refreshes and dissemination of reports and visualizations.

However it bypassing IT, no indication of data governance or coding standards, no guarantee of best practices or evidence the user has understanding data.

The entire Power Package is a great bundle of Self Service BI options. 

However, it's my impression that enterprise level Business Intelligence solutions require a Data Professional who understands the flow of data, cleansing, best practices, governance, manipulation of data and use the tools designed for such purpose.

I don't think the two offerings are competing products, Self Service vs. Traditional BI.  In fact the opposite, they each have specific user base and they overlap in functionality in many areas. 

Each set of products solves the problem of extracting data to find insight to manage the business.  One is designed for Power users who have little coding experience and the other for Enterprise data solutions used by Data Professionals.

Public Data Identified by Namespaces to Create Neural Network #AI

What does the future of data look like?

Well for one thing, the past is not indicative of future behavior.

We've had data for a while now.  Flat files.  Relational Data.  Three Dimensional Cube data.  Large sets of data.

We have grown the size of the data.  And the complexity of the data.  And the format of the data.

However, if you add it all up, what's the one thing that's still missing?

Connected Data.  As in a Neural Network.  As in a million and a half "Data Silos".  Data not talking to each other.

Partly due to legacy infrastructures.  Vendors not communicating with other systems.  Proprietary data not viewable outside Organizations.  Inability to merge complex data sets between internal systems.  Many reasons.

What we need is a way to have data communicate with other data.  And how do we do that?

Expose the data to outside calls.  Similar to Web Services.  You give your data a uniquely qualified namespace.  You expose the Data Web Service to a Public list, so people can find your data, you specify a protocol so they can communicate and pull data, on the fly.

And you modify the way in which you store the data.  Instead of having a field called Firstname ( a Varchar String), you qualify as "UnitedStates.Florida.SafetyHarbor.BloomConsultingBI.FirstName".  In addition, you add TimeStamp of Date Created, Date Modified, Date Deleted.  The database would need to expand to allow an XML like field to describe the data.  That way, anyone querying the data set can derive the necessary information.  That info would be contained in the exposed Public WSDL, similar to Web Service standards: http://www.w3.org/TR/wsdl.

So let's say I'm a Data Professional working at a job, I need to query data.  So I search the Public Data Records, enter the type of data I'm looking for, it returns top 100 sources, I connect to the appropriate one, using some type of SOAP XML SQL standard, connect my internal data with the Public data (Cloud), join the necessary tables in the SQL Statement, instead of using the 4 part naming convention, Server. Database. Owner. Table, you would qualify it further with the Namespace. Server. Database. Owner. Table and run the query on demand, once getting past the authentication of the host Cloud Data set.

If that were to happen, you would have real time data talking to data in a Public Data Network.

And after that, the Neural Networks would already have mapped all the Public data sets and have access to them and while it was crunching algorithms in Machine Learning, it would be smart enough to query those data sets on the fly without the need for human intervention.

And that would assist to create a true Artificial Intelligence system running 24/7 unassisted learning gathering up all the data available to data mine, predict and be a super computer store house of information.

I believe IBM is moving in this direction already with it's renound "Watson" computer.

I blogged about this concept of Namespaces back in November of 2012: Namespaces for Public Community Data

and here: Public #BigData Services Exposed on the Web

This is a possible scenario in the not so distant future.  Public Data available for on the fly querying, enabling Super Computers to practice true Artificial Intelligence in a giant Neural Network.

If you add RFID to the mix you have a true Internet of Everything.

Make is so number 1 ~!


Follow the Data Trail Through Excel and Access

Sometimes it's helpful to be curious.  Curious as in being a detective.  Being a detective as in following the intricate pattern a report uses to gather data.

I'm talking about Excel reports.  An Excel report which gathers data from multiple tabs.  Each tab gathers data from connections.  Those connections could be an Access Database.  Which calls another Access Database.  Which has a 15 step process to build temporary tables.  Which in fact derives its data from the Operational Data Warehouse.

Of course, there's hard coded values in the Excel spreadsheet.  Which get updated by some user somewhere based on some unknown frequency.  And how or why these values get updated is anybody's guess.

So you have a spreadsheet with 20 tabs, pulling data from everywhere imaginable, and your task is to recreate the business logic to grab data directly from the ODS in order to by-pass the Excel / Access.

Sounds simple.  Well, that's a project I've been working.  And tracking down the business rules is so much fun.  You really get to know VLookup quite well.

Suffice to say, it's a challenge indeed.  Except at the end of the day, when the business logic is converted, the data is stored in an Enterprise Data Warehouse, which gets refreshed nightly, the client can find a use for the developer's time rather than taking 3 hours to refresh the reports daily and users can access the data via Excel Pivot Tables connected to an SSAS Cube, that's what we call:


TDWI Keynote Speech

After watching the Keynote speaker for TDWI conference, something stood out which many people take for granted.

TWDI Keynote Link

It's the data that's important.

We've had data going back many decades now, once stored in VSAM files on the mainframe.  Then Relational Databases.  Then large DB like Teradata.  Then Hadoop (Big Data).

And many of the conversations surrounding Big Data are what platform are you using.  Shove a ton of data, from a variety of sources, process and find nuggets of gold called insights.

And a key point from the Youtube video is this.  Nobody in the Big Data space is talking about the complexity of wrangling with the data.  Merging complex data.  From disparate sources.  With no common linkages.  Data cleansing.  Data Governance.  Applying business rules.

These are some very core functions required in data manipulation.  Which the Enterprise Data Warehouse programmers have been wrestling with for years.  And they've found good solutions, frameworks, processes, standards as well as repeatable implementations.

And the keynote speaker goes on to say that the Big Data crowds have yet to tackle these multi headed monsters.  And that is why going forward, many Data Warehouse people can help pave the way for introducing known concepts into the new world of Big Data.

And that's a big concept.  Because the trains already left the station.  Big Data is chugging down the track at full speed.  And at some point, the train will need to cross the bridge of all the topics listed above.  Sooner or later.

Because those of us in the Data Warehouse community known the challenges of working with data.  And another key point from the lecture, Data provides value to the company, by reducing costs, increasing sales or streamlining processes.  And the DW community does not sell their value enough.  They should be touting cost savings, increased sales like an evangelist would.  Because it's an intrinsic part of the job description, which gets severely under reported.

I thought the presentation was great, I learned a lot and enjoyed it.


Intro to EDW

I'll be presenting at the Tampa Bay BI User Group next week.  In preparation, the following slide deck was created.  Topic will be on "Intro to EDW".





Today's task was to SUM the Month To Date values for a particular Measure in MDX.

To do that we used the AGGREGATE function in MDX:


            PeriodsToDate( [FiscalDate].[Fiscal Calendar Drilldown].[Fiscal Month],

            [FiscalDate].[Fiscal Calendar Drilldown].CurrentMember),

            ([Measures].[Fuel Gallons Upload])


You give it the time period you want to Summarize, in this case, the Fiscal Month from the Fiscal Calendar Drilldown Measure.

From there, you tell it the CurrentMember for the exact location within the Drilldown.

Finally, you supply the Measure to Add, "Fuel Gallons Upload".

This query, goes to the beginning of the Month, adds up all the values, up to and including the Current Member of the Date Hierarchy.

Fun stuff!


Presented on SSRS for DBA's

I presented at the SQL Saturday in Tampa at USF.

And if I recall, I attended classes on the same campus the summer of 1990.

It was a 7 week Archeology series of classes, where we dug archeology out in the field some days and were in class the other.  I received 12 upper division credits that summer, which allowed me to graduate from University of Florida on time with a Bachelors Degree in Liberal Arts.

Fast forward to today, I spoke about SSRS for DBA's.  I submitted the session as "Beginner" so some of the more advanced programmers may have shyd away, here's the link to the session description: http://www.sqlsaturday.com/viewsession.aspx?sat=273&sessionid=20560

However, the room was packed, nobody walked out, and only one guy was sleeping.  My slide deck can be found here: http://www.slideshare.net/JonathanBloom1/ssrs-for-dbas

To be honest, this was my best presentation to date, the audience seemed engaged and interested.  Although my slide deck is rather bland, as I went through them, I got the feeling that the material was too dry, so I ad-libbed a bunch.  And 20 minutes into the talk, I realized I was half way through the slide deck, so I varied the topic away from SSRS into Self Service BI, to Hadoop, explained what it is and who the major players are, Data Scientist and their new role as well as InfoNomics and how people are now associating dollar amounts to databases and using that asset as collateral on loans.

About 50 minutes into the talk I asked if people had questions, which there were about 4 or 5, with one question I could not answer, does SSRS allow to query off Web Services, my guess is no, but do not know for sure.

I managed to do a few demos, showed an SSRS report connected to an Analysis Service Cube with MDX query, all done without writing a line of code.  And I connected to Hortonworks Hyper-V Hadoop cluster using HIVEODBC.

And since the talk was at 9am, I got that out of the way, to allow time to network and view other sessions.  Overall, the slide deck is a bit uneventful, but I spiced it up by discussing off topic subjects related to BI, Big Data and the role of being a report writer.

Not bad for an Anthropology major ~!


Data no Longer Considered Garbage

Whats the life span of data?  Used to be data was captured, held for a while, then archived on tape, stored offline, and never seen again. If for some reason the data needed to be viewed some person would get the microphish or tape backup for one time viewing or print off to paper. 

So the data lost most of its value at that point. Now data can be store and kept rather inexpensively so the shelf life has been extended for longer periods of time. 

That data can now be refernced, viewed, analyzed, mashed together using big data. We can access through Map Reduce or for those less technical SQL. 

Data can be sent to Dashboards for summary level visualization, with drill down to raw details. 

No longer do we discard data for lack of value. Data has tremendous value for organizations. If you know how to harnesss and leverage for maximum potential. So we can keep data forever if need be. To data mine, predict future, look for patterns and build neural networks for artificial intelligence by training models. 

Once considered garbage and thrown out to the tape dump, data has found new life and risen to the top of the heap. Justly so. 


How do we Leverage #Hadoop #BigData

We've been storing data for decades.

Hadoop allows us to store vast amounts of data, cheaper than ever before.

Which can help reduce costs.

But so what.

How do we get to the next level?

Form a business case.

How do I increase sales?  How do I streamline processes?  How do I monitor workflows to determine benchmarks?  How do I monitor the actions and behaviors of online traffic?  How can I predict weather patterns?  How do I analyze traffic patterns?  How do I monitor human behavior patterns?  How do I track down criminals.

A few examples.

Once you have a problem to solve, the next step is to form a plan.

I will gather data sets from a variety of sources, both internally as well as public data sets.  I will ingest them into Hadoop.  And query the data.  Look for patterns.  Associate events, people, occurrences that would not have otherwise been associated.  Mine the data history to help predict future behavior.  Link data that would not be linkable using traditional data stores.  Mash data from new sources to gain new insights.

Gather your findings.  Analyze the data.  Form conclusions.  Validate results with collaboration.  Set new policy's, curb expenditures, streamline processes, increase sales, lower costs, SOLVE PROBLEMS.

Create a repeatable set of tasks, to be re-run over time, to re-evaluation and fine tune your conclusions.

Rinse and repeat.

And that's how you leverage your data in the new Information Revolution Age to gain competitive advantage and utilize new technology.

Make it so!

Become a Coding Warrior

You can read technical books.  Or attend training classes.

But they don't teach you everything.

When you have a production server down, losing valuable time, customers and money, you really don't have time to go through your notes.  You have to rely on your gut wisdom to solve the problem, get the server back online.

I've always loved the thrill of that.  Being the goto guy when nobody else can solve the problem.

Because that's how you grow.  You gain wisdom by tackling the tough problems with no easy solution.  You have to get in the weeds, jump into the deep end of the pool, with no safety net, and swim for your life.

And over time, those experiences add up.  Because in life you are thrown curve balls from time to time.  And sometimes you have no resources to help.  And you are going to have to solve the problem unassisted.

And that's where you build inner strength.  Inner confidence.

So become a Rambo, a coding warrior.  Take on the tough projects that nobody else wants.

In the end, those are the most satisfying experiences, and they help you grow the fastest.  No matter how conservative your approach, you will encounter a problem which you don't know how to solve.  And those are the true tests in life.


Full Life Cycle

I've learned a tremenous amount of skills on my current job.

Full life cycle Data Warehousing for starters.

From pulling data from source systems, to modeling tables, to staging the data, to applying business rules and ETL to the Data Warehouse.

And working with Cubes and MDX and Calculated Measures and KPIs.

I've done much of this work on prior jobs, but now I get to do everything combined, just about every day.  There's talk of the death of the Enterprise Data Warehouse.  I probably contributed to the mantra when I worked with Tabular Model and Power Pivot a few years ago.

But just about every client I'm working is using EDW.  And some use SharePoint with Power View and some use Performance Point, and most use SSRS reporting services.  And many use MDS Master Data Services as well,  Not many using DQS Data Quality Services.

I feel comfortable in SSAS now.  And that's saying a lot.  It's been a long time in coming.


Blogging Don't Require No Certifications

There's a lot of smart people out there.  With advanced degrees.  Good positions with good company's.  Plenty of recognition throughout the industry.

I've got an Anthropology degree.  Worked in banking for 4 years.  And have close to 20 years in IT.  How am I authorized to speak about subjects like Big Data?

Well, I have taken the less traditional approach to career.  I haven't worked for a major "software" company.  I've worked for company's that have an IT department who have a need for reports and maintenance of custom applications.  Hard to get recognized when swimming in the smaller lakes off in the distance.

However, I've had a steady uphill climb, with little to no support along the way.  I'm basically self taught.  And I've changed my career path many times so I haven't had the luxury of becoming known for a specific language.  And I moved around every few years going after more skills and more salary, because when you start your career making $15,000 a year, you have a bigger hill to climb.

Yet I have good problem solving skills, I take on new challenges, I learn the latest code, I'm curious, I have good mental stamina, I love to solve problems that nobody else can solve, I see the big picture and watch the trends, I have good intuition and can recognize talent from fraudsters.

I'm surely not the greatest programmer to ever exist, but I can stick with a problem until its solved.  I'm a bit slow when learning the business rules and will ask questions over and over until I understand, and then I know it rock solid.  And I can finish a project and show results.

I grew up in a computer family, I was online in the mid 1980's on a 1200 baud modem, calling local BBS to download stuff, I learned DOS in the early to mid 1980's, I took Fortran in college.  And solved the rubic's cube in the 6th grade.

Overall I feel qualified to write about new topics as I do have experience working with Hadoop for a few years now and my unique perspective could possibly add value to the articles found online.  Sure I don't have a PhD.  I've never written articles for Forbes or New York Times.  But I'm here day in and day out letting the world know my thoughts about IT, Programming, Big Data and the life of Jon Bloom.  Maybe one day someone will stop at this site, read deeper into the vault and say, this guy is a bit out there, but some of his stuff really makes sense.  How come nobody's ever heard of him.

And as always, thanks for reading ~!

#BigData Knows You Better than your Mom

If they want to know who you associate, they look at your cell phone history.

If they want to know your purchasing habits, they check your credit card transactions.

If they want to know your wealth, they check your bank account.

If they want to know your daily happenings, they check your online profile data.

Big Data knows you better than anyone, including your mother.

They know your preferences, your habits, your social life, your financial life, your tastes in politics, music, entertainment, your personal beliefs, when and where you travel.

Everything you do leaves a digital audit trail.  And those who have access to that data, know everything about you.  Credit Reports, tapping phone wires, hiring detectives are all so passé, now a social profile report reveals just about everything.

Welcome to the world of glass houses, surveillance and lack of privacy.

The Data Revolution

Data is becoming fragmented.  Internally, from multiple Vendor systems.  Externally, newly created data sets.  One problem is merging them into a cohesive data set.  A second problem is storing the data.  The third problem is unstructured data.  Disparate data.  Volumes of data.  And Data formatting.  Back in the day, the mainframe captured all the data in VSAM flat files.  We then moved to Relational Databases.  Then Data Warehouses and Data Marts and Enterprise Data Warehouses.  Then Self Service Business Intelligence.  Then Big Data.  And what we have now is a mish mash of options, overwhelming amounts of data that doesn't fit together nicely and insufficient resource to process this data into information.  The data is fragmented.  The data storage devices are fragmented by multiple options.  And the number of Vendors arriving to this space is adding complexity.  What we need is a data ecosystem or data hub or data platform to digest all forms of data distributed across multiple commodity hardware servers accessible in real time by average users with built in security and low latency available in the Cloud or On-Premise or Hybrid options with accurate data consumable on demand without requiring a PhD or Data Scientist.  I think Cloudera's solution of the Enterprise Data Hub you can watch a video from the Strata conference here: http://www.youtube.com/watch?v=MCDN3DxYwEE&feature=youtu.be&utm_content=buffer512f4&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

The video is Vendor specific, however it applies to solving the current problem of tacking data.  We are witnessing the "Data Revolution" similar to the Industrial Revolution, where a new industry is being launched creating jobs and revolutionizing the world as we know it.  We are early in the cycle and things are shifting every day.  Technology is surging forward with new alternatives.  People are waking up to the power of information.  Business Models are being created and altered.  Huge influx of capital and resources are chasing the new Information Gold Rush.

What are you doing today to contribute to the Data Revolution?

Best Practices Never Practiced

In the world of programmer, we're taught to use "best practices" when building applications.

As in life, we're also taught to do the same.  For example, it would be considered best practice to drive the speed limit when transporting oneself in a vehicle.  And if you drive any road anywhere, if you drive the speed limit chances are you'll be pushed off the road by speeding aggressive drivers.

Same with diets, its considered best practice to follow 3 balanced meals per day, containing the 4 basic food groups.  And if you pass any fast food joint anywhere, people are lined up for miles eager to eat some kind of processed meat, congealed wax known as French fries and some carbonated beverage which will strip the paint off your car.

How about budgets, it's considered best practice to have a budget in place to account for your household spending, so you don't spend more than you earn.  And if you go to any Mall anywhere, you'll see people buying the latest fad name brand items, financed on credit card at 18%, always paying the minimum payment each month and juggling several credit cards.

And sleep, it's recommended people to sleep an average 8 to 9 hours per night, and if you see people are on the computer, television, bars until the wee hours of the night and survive on 4 to 5 hours on average.

So as you can see, best practices are recommended in just about every facet of human life.  And some people choose to follow them, however the majority throw caution to the wind and completely disregard.

So now you see human behavior, don't be surprised the next application you inherit, the person did not follow "best practices".

And there you have it!


Building a Data Warehouse from Excel Reports

On the latest project, we were supplied 10 Excel spreadsheets.

The first report, has 20 tabs.  The first tab pulls data from all the other tabs.  And those tabs have data connections to other tabs, connections to one or more Access databases or back to the SQL Server source.

So tracking down the data is quite fun.

To build a Data Warehouse with Dim tables and Fact tables.  And the ETL to by pass the Excel doc and grab data directly from SQL.  Build the ETL to stage.  And from stage to the Data Warehouse tables.

And then push the data to SQL Server Analysis Services (SSAS), build Measures and Calculated measures with MDX.  And then pull the data from Excel.

And tonight our data matched across the board.  We got that working.  For the first report.  Of 10.

On to the next reports.  Yippie ~!


Because Hadoop is so New

In the world of technology, things move pretty fast.

And no matter which direction you aim, there's always somebody who's already mastered that and has been doing it forever.

Perhaps they already wrote a book or two, maybe they helped write the code years ago for the actual product or maybe there's already an established hierarchy in place.

What I like about Hadoop is the level playing ground.  Everybody is new to it except maybe those who worked with it at Yahoo back in the day.  So whoever has the most desire to take the plunge and learn it has the opportunity.  There are no barriers.

With Hadoop, we're all newbies.  There's so many pieces to the puzzle.  And it's changing rather fast.  And there's already several flavors.  And that opens the doors to allow anybody a chance to become an expert.  A bit of freedom if you ask me.

Microsoft Partnership with Hortonworks

Microsoft has teamed up with Hortonworks to create multiple options for Hadoop.

First there's HDInsight.  Big Data in the Cloud.  It's designed to work with data born in the Cloud.  Except you can upload data as well.  Once there, you can spin up your Nodes, run some jobs, then spin down the clusters, you only get charged for the uptime.  Good for proof of concepts, then bring logic on-premise.

Then there's HDP 2.0 for Windows.  A fully functioning version of Hadoop running on Windows Server.  You can connect via HiveODBC, PowerBI (Power Pivot and Power Query) Excel and as linked servers.  You pay for the commodity hardware, administer the app from Service Center and bring in data from a variety of sources.

You can also connect via Visual Studio.  .net has library's to connect to Hadoop as well as LINQ connectors.

Hadoop is becoming tightly integrated into the Microsoft Data platform, now called Hybrid Enterprise Data Warehouse.  Hadoop blends into the current infrastructure becoming transparent to the end user.  The benefit is adding Semi Structured and Unstructured to the mix.  As well as distributed parallel processing.

And Hadoop 2.0 is distancing itself from the dependency on Map Reduce Java programming.  You can now program using c# in Visual Studio or bypassing completely using HIVE.  You can ingest data, process queries, aggregate and export back to SQL Server, upload to your Analysis server and disseminate information from a variety of sources including SharePoint, Excel, Odata, PowerQuery and Power Pivot.

This article focus' on Hortonworks implementation, simply because it's been ported to Windows and Azure.  There are other Hadoop Vendors which are just as outstanding and provide excellent support, service and training.  I actually attended Cloudera's training course and learn most of my knowledge on Hadoop then.  And IBM has a tremendous offering as well, although I'm not as familiar with it.  And there are others.

I will say that Hortonworks partnership with Microsoft is strengthening both vendors and leveraging an existing developer set, me included.

Best Part About #Hadoop, it's Free

When a new technology comes out, technical people want to see it.

Not only see it, but touch it.  Take it for a spin around the block.

There are some great Data Visualization programs out there.

So I downloaded a copy and installed it.  And soon got a phone call from the home office.

Asking me to purchase it, yet wouldn't disclose the price.  And ever few weeks I would get another call, because I attended a webinar or downloaded a white paper.

And after 14 days, my trial evaluation expired.  No more playing with new technology I suppose.  On the next call I was granted another week to evaluate.  Once that expired I was done with it.  And haven't touched it in years.

Along comes Hadoop.  Free to download.  Free to install.  In a tight and easy to use Virtual Machine.  Or for the more courageous, install from scratch.  A full functioning product ready for production.  No phone calls from pushy sales people.  No expiration data.  Plenty of online documentation.

Win.  Win.  Win.

And that is a major reason why the technology is taking off.  Sure there are Vendors out there who service the product, provide training, maintenance and updates, for a fee.  But at least you know what you're getting into.  At least you can download and use the technology with being rushed.  I'm all for Hadoop.  With each version, it's getting crisper, easier to use, better integration, better knowledge bases, becoming an ecosystem in itself as well as a Data platform.

You can keep the pushy sales reps, I'll take a free Hadoop any day.

Data Becoming an Asset #Infonomics

You know how much you paid for the chair you're sitting in.  Probably the desk too.  I'm sure it's documented somewhere.

How about the building, you know how much you pay in rent per month.

What about the data that sits in your database?  What's that worth?

Simple question, not so simple answer.

Data is becoming an asset.  With an assigned value.  Does it depreciate over time.  Or appreciate in value?

Can you apply Data as an asset to secure collateral on a loan?

I introduced a new concept back in August of 2013, the Certified Data Appraiser.

You can find the blog post here:


Imagine, an entirely new concept, occupation and financial instrument in the making.

This concept is currently known as infonomics, you can read about it here:


All in all, each day, as people surf your websites, you are steadily increasing your data loads and could in fact be building a wealthy asset for future use, for collateral on a loan or asset when you sell the company.

What are your thoughts on this matter?

What Should I Learn Today?

What should I learn today?

Go deeper on Data Warehousing?  How to streamline SQL queries?  How about more MDX?  Or design patterns for ETL?

Wait, maybe Self Service BI.  Data Visualizations?  Mobile BI?  Embedded BI?  Geo-spatial?  Or maybe BI in the Cloud?

Then again, Big Data is all the rage, so should I open up my Hadoop VMs?  Or how about download the latest Hortonworks which runs on Windows Server?  Then I can dabble in Hybrid BI / Hadoop.

But you know, if I'm ever going to be a Data Scientist, I should be taking an online course.  Perhaps R.  And I have to know Predictive Analytics.  And don't forget Machine Learning.  And Log File analysis.  And Streaming.  And Graph.

Oh yeah, I also have to produce on my full time consulting job on a daily basis.

My head is starting to hurt.  Think I'll run downstairs and get a coffee.

And think about thinking about learning.  Remember, those who stand still will be flat footed and out dated in the blink of an eye. 

Get busy!


Interesting Week

This was a good week.  I was able to participate in a Hortonworks + Microsoft webinar.  The topic was the Hybrid (MDA) Modern Day Architecture.

It's pretty awesome concept.  Basically it expands the current Enterprise Data Warehouse with the addition of Hadoop.  Because Hortonworks has a version which runs on Windows, it now integrates nicely with a variety of technology's.  Here's a post I tweeted about which got some good retweets:

And then I attended another online Tweet chat which also talked about the Hybrid (MDA) Modern Day Architecture with a lot of attendees from IBM.

And finally I attended another one where I tweeted about the definition of Data Scientist:

I've been heads down building Data Warehouses the past 5 or 6 months so I definitely see the EDW around for a long time.  And Self Service is starting to pick up steam.  As is the Cloud and Mobile BI.

In the world of data, things are really starting to heat up.  Glad to be a part of it.


#BigData #Hadoop not Mandatory for #DataScience

Data Scientist can work with Big Data and Hadoop, but it's not mandatory.

I spoke with an old boss the other day, who had a job posting for Data Scientist.

So I asked what technologies will be used in that position.

r, SQL-Server and Cognos.

I asked, "not using Hadoop", replied "no not necessary".

Interesting, because most people think that Data Scientist position grew out of Hadoop.

So it is possible to be a Data Scientist and never tough Hadoop.

Apparently the person they are looking for is more of a Data Miner or Statistician or Actuarial with knowledge of data an technology.

Personally, I thought Data Scientist were Domain Knowledge Experts, Technical savvy and goo with Math.

Notice there's not Big Data and / or Hadoop.  That's the definition I'm going with.

For me, I loved Statistics class back in college.  And I know technology pretty good.  And I've got some industry knowledge in several sectors and I've attended the Cloudera training course and VM and dabbled in Hortonworks VM and installed on Windows.  But am I a Data Scientist.

Perhaps we'll find a clear definition at some point.

And so it goes!


100k Page Views

It's been a long and winding journey.

When starting this blog I intended to post some ramblings on Business Intelligence.

Which grew into Big Data and my perspective on the business world in general.

Along with some personal articles about how I got here and what I'm doing today.

And today marks a new milestone.

100,000 page views.

And a ranking of top page views of all time:

I thank everyone who spent the time and energy to read my blog articles.

Cheers ~!


More Information Producting Worse Results

More data does not automatically compute to better or faster decisions.

In fact I would say it muddy's the water.

Facts become less clear.  Doubt enters the equation.  Too many pieces of the puzzle for the basic mind to comprehend.

Things used to be simple.  You have a problem, you speak with an expert, they solve the problem, you go on your way.

Now the line of clarity is blurred, too many factors, too many people involved, too much information to process.

I see the quality of service getting worse, despite the rapid rise in technology.

Things are just too damn complicated.  We have unlimited information at our disposal.  Yet problems compound in shorter increments.  Simple problems become complex the more people you add to it.

This impacts quality.

Example: What is the answer to a particular problem?  Yes.  No.  Maybe.  I don't know.  Wrong answer.  Incomplete Answer.  Conflicting answer.  Answer which leaves out critical information.  All of the above.  None of the above.  Are you going to hold me responsible for this answer.  Let me patch you through to voice mail.  You'll need to speak with Bob, he's on a 3 year sabbatical, would you like to leave a message.

You get the point.  In the world of data, more information could potentially lead you to faster or greater insight.  Yet in the real world, more information doesn't seem to be solving many problems faster.

As always, when you include humans into any equation, you must factor in emotions, hence, a biased or incorrect answer may result.

Many New Leaders will be Female

Being a leader does not mean you will be liked.  Because a leader must face tough decision. And sometimes people don't like those decision. So they attack and blame and bully their point of view. So that causes many sleepless nights for the leader. Because they are responsible for the final outcome.  To ensure the choices made are for the betterment of the situation.

Sure it would be a lot simpler to go with the status quo, so people will like you. Except that's not the role of leader, to be liked.  Their role is to improve situations, see things through to the end, to roll up their sleeves when nobody else will, and to make tough decisions.

Decisions which have consequences.

Not everyone is cut out to be a leader.  Many people prefer to be led.  To have someone hold their hand.  Except where are these people when times are tough, when help is really needed.  They are nowhere's to be found, that's where.  They show up after all the hard work was done, take credit, and then attempt to take control.  Nice try.  Doesn't work.  These people are not leaders, they are opportunist.

I've seen some people, quiet and timid, who get bullied by the intense, aggressive tactics of bully's.  Once you see a bully for who they really are, there power is diminished.  And it's the quiet person who's really been the leader all the time.  And when it's their time to shine, it's like a shining diamond sparkling like beauty.

Some could say the male dominated society is like this.  Males have typically been stronger.  They got their way for centuries.  Now it's becoming apparent that the females, who have been leading from behind the scenes anyway, are now making their way as public leaders, and doing a great job.

There's definitely a turning point going on in today's society.  Many of the new leaders will in fact be female, using a blend in innate intuition with logic.  And the male society with brute force and aggressiveness will fall to the wayside.  And we will soon become a matriarch society.  In the world of horoscopes, we are entering the age of Aquarius and going away from the rigid society into a more fluid one.

As a mystic once said, "These times they are a changin' ".


Search for Keyword Query in SQL Server Database

I was looking for a solution to search a SQL Server database for every column to find a keyword.

Which brought me to this link.


And from here, I modified the code to work in SQL Server (was written for Sybase):

It's not stealing if I give proper credit, right?
set nocount off
IF OBJECT_ID(N'tempdb..#SearchString') IS NOT NULL
     drop table #SearchString
create table #SearchString (SearchString varchar(100))
declare @search_string  varchar(100)
set @search_string = 'Zero'
insert into #SearchString (SearchString)
    values (@search_string)
IF OBJECT_ID(N'tempdb..#TabCol') IS NOT NULL
     drop table #TabCol
--insert #TabCol
    select object_name(o.id) as TableName, c.name as ColumnName
    into #TabCol
        from sysobjects o, syscolumns c
    where o.type = 'U' -- ONLY USER TABLES
          and c.usertype in (1,2,18,19,24,25,42) -- ONLY LOOK FOR CHAR, VARCHAR, ETC.
          and c.id = o.id
          and c.name is not null
          and c.length >= datalength(@search_string)
select count(*) as RelevantColumns from #TabCol
declare cur cursor for
select TableName, ColumnName from #TabCol order by TableName, ColumnName
for read only
    @table_name     SYSNAME,
    @table_id       int,
    @column_name    SYSNAME,
    @sql_string     varchar(2000),
    @search_string  varchar(100)
select @search_string = SearchString from #SearchString
open cur
fetch cur into @table_name, @column_name
while (@@FETCH_STATUS != -1)
    set @sql_string = 'if exists (select * from ' + @table_name + ' where [' + @column_name + '] like ''%' + @search_string + '%'') print ''' + @table_name + ', ' + @column_name + ''''
    fetch cur into @table_name, @column_name
close cur
deallocate cur
drop table #SearchString
drop table #TabCol