Data is alive. It's a living thing. It grows. It multiplies. It has depth. Structure. Free form. Unlimited capacity.
People are hung up on the term Big Data. If you think about it, it's actually "Macro Data". And it's opposite is "Micro Data". Similar to the structure of the universe.
You add up all the Micro to derive the Macro.
You can view data through a microscope or a telescope.
That would describe the "Volume" of data.
However, there's also "Velocity". How fast is the data growing. What is the influx rate at which the data increases in size.
Lastly there's "Variety". And to me this seems to be the best attribute of data. What we are really after is "Insight". In order to derive the most insight, it helps to have lots of data or speed of new data, but the key factor in generating insight is Variety.
What we are really after is the "Holistic" view of the data. We don't just want to know the frequency in which you called the Support Center, and the reason, how long it took to solve the problem, by whom, etc.
We want to know what products you already own, so we can tailor your customer experience to supply additional products. We want to know you bank account info to know how much you can afford. We want to know your purchasing habits to see how frequent you buy. We want to know the demographics of your company, your location, your partners, your investors, your employee representation, you political beliefs, you religious beliefs, how many kids you have, did you go to college, what kind of car do you drive, do you drink milk, wine or whiskey, where did you attend high school, who were your teachers, what were your grades.
In the world of data, the more data sets available are better. And integrating those data sets is the key.
Because every database has their own unique identifiers. Mashing data is perhaps the most logistically challenging aspect.
I'm a firm believe if want to get good at something, watch how the experts do it. And who are the data experts in my opinion. The NSA.
They have access to all data. They can mash it up. They can keep mega warehouses of everything about you, in real time. Their goal is National Security (perhaps).
Except they are using data as it was meant to be. To unite the Micro Data to form Macro Data to create Holistic Data, that which comprises All Data.
To create a blueprint, a hierarchy, a network, a risk factor. A neural network of everything, connecting all the dots, dotting every I and crossing every T.
To simulate the Universal Mind. http://www.mind-your-reality.com/universal_mind.html
Like any tool it can be used to unite, heal, and grow. Or can be used for dark purposes.
Either way, the goal is the same. Holistic knowledge based on All Data available at any given time.
The insight part is more difficult, because bias, experience, motives get in the way.
Like an Artist there is room for interpretation. Like a Scientist, there is logical practical approach to solving problems.
And the blend is done through the department know as the Chief Data Officer, the go between for IT and the Business, who reports directly to the CEO, who steers the organization Business Intelligence program, who can leverage internal employees or contract out, is responsible for data quality and data governance, who stays current with Data trends, who can hire Data Scientist, who knows the business and includes Resident Business Process experts, and will be the most important position in the org.
We are witnessing the Data Revolution, where every action will be recorded electronically, for mining purposes.
There is no more land to concur and plunder (on this planet), we've used up most of the natural resources, the next logical step is to create a new resource to mine, and that is Data.
And the Data is Alive.
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...