So you have to specialize to some degree. If you look at the data technologies out there, there's a lot to know.
We have traditional databases, many longtime vendors and they have standardized on the SQL language. There hasn't been too much disruption in this space, minus In Memory databases, newer ways to compress data, additions like XML and JSON and some other features. But for the most part, this space hasn't changed too much.
Within databases, we have seen new technologies emerge. NoSQL, Hadoop and Graph database sprung up to handle different business cases for storing and retrieving data.
This has opened up the ETL Extract Transform and Load space. As we have to account for the new types of data and storage capacities.
And further downstream, we have to report on this data, so that has evolved as well. New tools to visualize data embedded within application, via the web and mobile devices.
So too, has the volume of data grown. The amount of data stored now is nowhere near twenty years ago. With the addition of sensors and Internet of Things we have to account for huge quantities of data stored indefinitely. As well as reading the data as it flows into the ecosystem.
And there's the data mining side. Machine learning is all the rage, as we have new technologies in the hands of everyday programmers to predict, look for outliers and make recommendations.
Throw in Artificial Intelligent programs that can learn over time and the world of data is just exploding.
So as an opportunity cost, if I decide to focus on a specific piece of the data space, I will not be able to learn another piece. So where do I focus my time and effort?
Well, it seems to me the role of the traditional report writer is losing its luster. Why? Because report writers were slow and users didn't trust the data. Self Service tools have sprung forth, putting the ability to work with data in the hands of everyone.
If they offer a product, that can pull data from almost any source, store that data using compression data algorithms, allow users to mash and integrate data without having to know specific SQL syntax, then report on this data in charts and visualizations, and refresh the data automatically, why do they need full time report writers? Why would I try to compete with a free product, available on premise or in the cloud? What service could I provide that isn't already available for free? I could train the users, except the product is so dang easy to use and documented, I don't see much opportunity there.
We still have complex data warehousing which hasn't been automated yet, so there still some steam left in that space. We have newer technologies which haven't been totally explored like NoSQL, although the use cases and ROI haven't been up to the hype. There's the still new machine learning which hasn't been fully tapped yet, due to lack of qualified developers and not clearly defining the role of the data scientist, trying to lump to much into a single position.
Where are the sore points in the data space? Data integration is still a big bowl of spaghetti. And the advanced topics that haven't been automated yet, which require a higher level of thinking. As well as industry specific niches where you know everything about a specific industry as well as the technology, an example would be medical data. And I believe technology training will be in huge demand for the next x years.
- Data integration
- Advanced technologies
- Domain Knowledge + Technology