Working as a consultant who builds data warehouses, you get to see a lot of different organizations and how they operate.
You go in, assess the project, estimate a scope and begin work. From my perspective one of the biggest challenges seems to be the business rules.
More times than not, the business rules are not documented. They are embedded in people's heads or buried deep within the code. Deciphering the business logic is the toughest part of the project, in my opinion.
At the end of the day the numbers have to match theirs. Identifying and locating the data sources is sometimes difficult. Translating the business logic from Access or Excel or in the programmers noggin are quite difficult. And the exceptions. That's the one that gets you.
What makes sense is to grab the data from the source systems. Many times we are given views or tables pre loaded. There are business rules hidden from view, or the views are outdated or there's missing or incorrect data, or the timing of the loads are not in synch.
There are many reasons why it makes sense to get the data from the source data repository.
Finally, to create a data warehouse you need access to the correct data, you need to understand the business rules as well as business processes and you need someone from the organization to assist with questions as they arise.
Otherwise you're just asking for trouble. Enough said.
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...