Hadoop allows for Unstructured data to be ingested into the HDFS file system. Once there, it's up to you to splice the file into a Structured format, so it can be ingested into HIVESql in the form of a SQL Table which you can run queries against.
So you have a set of data. Great. What if you want to mash that data with another data source / table. Okay. What if there's no common field to join the data sets on. Because as we all know to join two tables, you need a common key to link on. With no key, your data set is an island unconnected to other data.
I don't think people are aware of this or it's not getting much publicity. What do I do with a stand alone data set? Sure, you can still derive value from it, but if you can't join to other tables, then you're not leveraging the true power of SQL.
What do you think? Am I missing some missing concept here? How do people get around this dilemma in the real world?
Another thing, we all here about ingesting images into HDFS to derive value. Exactly how is this done. Let's say I upload all my vacation pictures from Disney. How does Hadoop leverage the details contained in those images. I'm sure the pics have a time stamp, perhaps some logistics of longitude and latitude from the location, but how can I find the sentiment of a picture of me on a roller coaster or me standing in an hour line waiting for my 3 minute ride? How can I extract the features contained in the image, perhaps face recognition, but what else?
I really don't know the answer to this question. Perhaps someone can shed light on the subject... Because if you listen to the noise in social media, everyone says it's possible, I'm wondering how?