Data Warehousing has been around for many years. The processes are well defined. Many books explain the architecture and framework. There are many experts in this field today.
The thing is, the industry is moving forward. Sure data warehouse will still be around for sure. But what else is out there in the data space?
Well, self service Business Intelligence. Allow power users to connect to data sources, mash data, massage the data, build models, create complex interactive dashboards, publish to the web or public cloud, refresh and collaborate.
Wow, pretty cool.
There's big data solutions. Like Hadoop. Been around for ten years now. Lots of new faces in this field. Some bypassed the data warehouse revolution completely. Enormous data sets. Stored in HDFS. Maybe flat files or CSV or what have you.
There's machine learning. Algorithms to do all sorts of stuff. Predict, classify, cluster, detect anomalies and a bunch more. Many new faces here also. Many skipped the data warehouse revolution as well. Some are called Data Scientists.
There's distributed analytic frameworks like Spark and Flink. Built for streaming data and now have Spark SQL and Spark R and MLLIb machine learning. These have spiked recently, lots of users. Works mainly on Linux. Integrates nicely with Hadoop, but not requirement.
So it seems there are alternate careers available in the data space. Every org has a certain budget to spend on data projects. I wonder how many are still funding enterprise data warehouses vs self service vs machine learning vs analytic frameworks specifically.
Or perhaps a blending of this and that. It's probably a good question to ask, because that will guide developers into career paths. And if people are moving away from data warehouse over time, that's a leading indicator on where to steer a career.
The one thing that could stop the data warehouse person is this. If self service allows users to do their own data manipulation and reporting and dashboards, how can one compete with that.
Next, data warehouse people know relational databases, SQL queries, and defined frameworks. Once they enter the world of Hadoop or Machine Learning, they need to program languages, not just SQL. That's like jumping the shark, as many SQL coders are so used to click, click, wizard, they may not feel comfortable having to write Python or R or Scala or Java. Seriously, two distinct separate worlds.
Next, SQL has been around for awhile. Many developers are quite talented. That does not lead to statistical thinking as in machine learning. Machine Learning requires statistical understanding and perhaps some mathematics, although many algorithms are pre-built and you just need to know what each one does and why to use it. Many data warehouse developers steer clear of statistics and math.
They may not come out and say it, but the world of data is changing and requires new skill sets. After twenty years of expert level skills, they now have to become entry level and grow their skill sets either on their own, through online courses, or at their current jobs.
Lastly, many shops are leveraging the cloud. For obvious reasons. Learning the cloud takes time. Things are similar but different. Many factors need to be considered, security, moving data, authorization of users, data refreshes, etc. Lots of new skills to learn. Add that to the list.
There's also Internet of Things. As in hardware and software and data and security and network traffic and protocols and a bunch more. Add that to the list too.
Suffice to say, traditional report writers and data warehouse developers
may need to expand their skill set as to not get left in the dust. Saddle up, it's going to be a bumpy ride. The gravy train is moving out.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...