So we have these databases. Some here. One over there. Oh, and these Excel files maintained by accounting. And the Sales team has these files to track targets. And our financial database over here.
We typically have separate reports for each system. Or we consolidate into a Data Warehouse for a single source of the truth. Takes a while, lots of business rules, sometimes the data isn't 100% accurate. Then there are always changes. Adding new data sources, changing business rules, cubes don't run every night. Reporting is tough work.
Then we look at the reports. What do all these numbers mean. These reports are so static, I'd like the ability to drill down, see what the underlying data looks like. Well, we have tools for that now, called Data Visualization tools. Like Power BI, Tableau and Qlikview. They're really taking off right now. Because they are interactive, very intuitive and easy to get started. And they scale.
Then we talk of deep learning. Having really smart people build models based on data sets. By creating samples of data, denormalized, and then applying statistical algorithms to find patterns and meaning and insight. And that gets into the realm of Data Science.
So we stared with static reports, then a single source of the truth via Data Warehousing, then self exploration using Data Visualization tools, and then Data Science.
But there are still more levels to go. The main one is called "Unsupervised Learning". This is one of the most difficult process to perform. In that, you give the machine a set of data, and let it find the meanings and patterns and insights. Sure takes a lot of the grunt work out of the equation. But that's the future in my opinion. Let the machine teach itself. Let it master one domain. Then network with other machines that are experts in other domains. In real time. And let it continually learn over time as things change. Find the anomalies, the exceptions, as well as the patterns, groupings, fluctuations, standard deviations.
Sure we still need static reports to send to the state every quarter. And call centers need to know how many calls were received, answers, dropped yesterday and who took the calls and what the average call time was. But the future of data is in unsupervised learning.
It works by creating multiple layers. Each layer contains specific algorithms, which are connected to other layers. They neurons of each layer weigh the data and trigger downstream activation of other neurons to create a grid of connected synapses. And the more layers, the more processing and granularity. And running these processes in parallel helps to speed up the jobs.
And the next level after that, is to run these processes on Quantum Computers. Because they run using different foundation. They don't use binary model of zeros and ones. They use Qubits, which can have more than two states at any given time. This methodology can increase the processing power tremendously. That's the direction of things at the moment.
In addition to Internet of Connected Things, Robotics and accumulation of massive Big Data. Someone or something needs to process all this data, synthesize into information that we can understand and take action in real time. There's just too much heavy lifting for traditional data professionals. We need to hire the computers to do what they do best, which is to crunch massive amounts of data. I think we are on our way to that vision.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...