Intro to Spark on Hortonworks Hadoop

I was looking for a demo on using Hortonworks Hadoop and Spark.  Found this great presentation that discusses high level to actual coding demo, along with ability to download the examples and try yourself.

To see the Lab, here's the URL.

To see the SlideShare, here's the URL.

There's no shortage of training material to get up to speed on the latest technology.  As long as we have the time.

Happy learning~!

PowerBI Similarities to MS Access

Microsoft PowerBI is a great tool for Self Service Business Intelligence.

It has a  desktop IDE for development.  For all users at any level.  And can be uploaded to the Cloud.

You can connect to many different data sources, which is truly powerful.  Take the JSON connector for example, how would you easily ingest a JSON feed into the Traditional BI framework.  Not so simple.

It has an easy to use Power Query to manipulate the data.

And you can store data in Power Pivot, compressed data to handle large volume of data, the row count exceeds traditional Excel row count limitations.

And it has built in interactive dashboards for quick analysis and visualizations.

And it has a good map interface.

And it has natural query language.

If you compare the basic features, it sort of reminds of Microsoft Access.  Access has data storage capabilities, querying, reports.

What I liked about it was the ODBC to connect to different data sources as linked tables.  And the OLE feature to embed Access into other computer languages such as Classic Visual Basic and Classic ASP web apps.

Perhaps PowerBI will expose its core features as some type of OLE functionality.  Perhaps allow SQL Server Integration Services or SSIS for short, to consume PowerBI components.  Like ingest data, run Power Queries, upload to Server, etc.

Another feature it doesn't have, as far as I know, is to store data as a Relational Database.  As in, write records directly from another application like c#, as in Relational Database storage, similar to Access, for real time transaction purposes.

Overall, Microsoft PowerBI is widely used in many organizations for a good reason.  It simplifies much of the complex heavy lifting traditionally done by full time Business Intelligence or Data Warehouse folks.

And there you have it~!


Intelligent Machines or Pandora's Box

We tend to see the complex world through our lenses.  We see, we interpret, we filter and we assume our view of the world is correct.  We receive validation from the external world, and adjust our behavior and thoughts to align with what's acceptable.

For those who work in advanced theory, such as Artificial Intelligence, or Physics, what have you, ideas are formulated on rules and theories.  Theories change over time.

We base our thinking, our thoughts, our theories and hypothesis on justifiable evidence.  And those facts are the building blocks for projecting out into the future.

Man can not travel faster than light.

Objects fall at 9.6 meters per second.
The planets circle the Sun, the Solar System spins around in the Galaxy, and it too spins around, always moving away from the center.

There are some fundamental concerns or issues, staring us right in the face, that we have no answers.

Are ghosts real?

Is there an afterlife?
When does life begin in the womb?
Who built the pyramids of Egypt, why, and what technology did they emply?
What is all the dark matter in the Universe?
How do we create life?

With the rise of Artificial Intelligence, surrounding all the doomsday theories, such as no jobs, humans residing in zoos, and serving our new lords, the computers and robots, we are baffled by the creation of new beings.  When machines become intelligent, far superior to the current Alpha bipedal hominids that dominate the planet.  With intellectual capacity beyond comprehension.

Is that a possibility, perhaps.  Like anything, there's a statistical probability it could happen, similar to a monkey sitting at a typewriter will eventually produce a masterpiece such as Shakespeare.  Could happen.

If you listen, there is talk of creating such an intelligent being or beings.  And the concern, perhaps justifiable, that these beings could become destructive.  If they were given a task to amplify production of a specific item at maximum efficiency, they could destroy the earth in their attempts, because logically, it makes perfect sense.

And in order to prevent such an occurrence, steps need to be put in place, to prevent.  So what steps should be taken?

We don't know.  Because the number of potential variables is unknown.  There is no way possible to code for every scenario.  We don't have the time, resources or expertise.  So one option is to build an environment, in which the machines could learn by training themselves based on given rules.  They would learn right from wrong.  Not just black and white decisions, but ambiguous open ended questions.

Is it okay to kill a person?  No.  Except in war or as capital punishment.

Is it okay to steal?  No.  Unless there's a justifiable reason.  Like what?
Is it okay to lie?  No.  Unless it causes unnecessary harm.  What determines the difference between a white lie and a real lie?
Is it okay to maximize personal gain?  In sports, it's acceptable, unless outside the governing rules such as steroids.

Could machines live in a world, where they always tell the truth, always do the right thing, always in line with the given rules of society?  Do humans always live within the given rules of society?

There are severe consequences for breaking the rules, yet we have jails filled with people and people roam the streets breaking rules without getting caught.  Drunk drivers.  Bank robbers.  Petty thefts.  On and on.

How could machines and humans coexist in a world where the rules are shades of gray and the robots follow every rule and the humans do not?

So if we can't write code to outline the blueprint of acceptable behavior based on our current mantra of ethics and morals, how do we accomplish such a feat?

Well, the current state of machine learning works by feeding information as input, letting the machine learn over time, and then produce a statistical probability that the output is accurate with a certain degree of accuracy.  The elephant in the room is that we as humans do not really understand how the machines learns.  For all practical purposes, it's a "black box".

This black box, we could alias with "Pandora's box", if we don't know exactly how it works, and the machines continue to advance in accuracy and capacity, then what.  We don't understand it at the simple level, how about as it becomes more complex, beyond our comprehension.

Just build in some method to control the growth, like pull the plug or shut it down.  If we compare this to the internet, how could we pull the plug on the internet.  It's decentralized so much that there is no single off switch.  The internet has a life of it's own.  So too, smart machines could soon approach such a state.

Even if we had the capacity and ability to instill morals and ethics from the root foundation, who gets to decide which rules to follow.  Do we have the programmers feed the machines the moral conduct of humanity.  Over time, how would new rules be introduced, and depreciated rules removed.

We view the world through the narrow lenses of our 5 senses, our historical framework for truth and our biased experiences over time.  What if our human capacity, as great as it is on this planet, is infantile compared to the great wisdom of the Universe, and in our effort to produce smart machines in an incubator, to create new life, without the understanding, compassion and wisdom required, and release Pandora's Box into the Universe, with no way of shutting it down.

A similar comparison could be the advent of the nuclear bomb.  Except that technology is exiled off somewhere in secrecy, never to be used except in extreme circumstances.  Would Intelligent Machines fall under the same domain, or will it be created commercially, and if so, who would have governing capacity.  Creating Artificial General Intelligence could be many years in the future, but then again, it could be here sooner than we think, with all the research funding increases over the past few years.

For me, I think its a fascinating subject and has lots of potential benefits.  But then again, there's a lot to consider.  Time will tell.


Programmers Must Become Utility Players and Adapt to Change

A carpenter that only knows how to operate a hammer, will most likely solve any problem with a hammer.

So too, technologist who specialize in a specific genre, specific vendor, specific whatever, are also using a hammer.

You need to use the right tool for the job.  Which means knowing which tools are available, what they do, how to use them the pros and cons, limitations and benefits, price structure. 

If you have a use case for big data, perhaps Hadoop might work.  Do you know the benefits and limitations, the different offerings, prices associated.  Within Hadoop, do you know which features do what, how they integrate and potential issues downstream.

Well, if you don't know how to program in Hadoop for instance there are dozens of ways to learn.  Onsite and online courses and training, books, blogs, etc.  There is no shortage of training material both free and for a fee.

What that means is on-demand learning.  Can you learn a technology on the fly. 

In my opinion, this is the way things are heading.  For every technology, there are dozens of vendors, products, versions, etc.

Do you have a track record of learning new technology fast.  Do you have a solid understanding of project lifecycle including spec gathering, flowing data through the lifecycle, which product to use when, how they integrate, testing, documentation, project management, mentoring and knowledge transfer and releasing code into production.

Let's compare to hiring a chef to work at your restaurant?

Do you know how to cook?  Oh yes, I've been a cook for 10 years.  What kind of food do you cook?  Italian food.  Ok, what if we asked you to cook seafood?  Or Chinese food?  Or BBQ?  Well, I only know Italian.  Do you know how to gather ingredients, mix them in a bowl, bake in oven, mix with side items on a plate?  Yes.  Seems like you could learn to cook other food?  Well, that's not what I'm used to.  It's sort of out of my comfort zone.  Okay.  Next applicant please.

If a programmer knows a language, yet is stuck on one type of platform or vendor, they may be good at what they do, but looking long term, may not be flexible or malleable to adapting to change.

Your ideal candidate may only know one type of food because that's all they've been exposed too.  But they are constantly learning all the time.  They are eager to learn and not fearful of learning new things.

And that's the key.  If you haven't noticed, the only constant is change.  What you may really want is someone that's a sponge, who can quickly soak up new technology, complete the task, rinse, and on to the next.

I wouldn't call that person a generalist.  I'd say they are adaptable to change, learn on the fly, yet have a solid understanding of technology across the board.  In baseball, they call that a "utility player".  Someone that can deliver on demand at any position.

And there you have it~!


Robot are People Too

They say mathematics is the only truth in the Universe.  Although perceived as analytics, the higher up you go, it becomes art they say.  The furthest I ever went was Calculus.  Didn't pass the first time.  Learned it the second time.

They say mathematics is very similar to music.  That's probably true as well.

They say you should learn something so well, that you forget it.  So you can do it in your sleep without having to think about it.  Instinct.

The hottest technology going now is probably Artificial Intelligence.  (For this blog post, we are bundling AI with Robotics and or Cyborgs.)  To get a machine to think.  Learn.  Perceive.  Base responses on past events.  Without hesitation.

A machine can process information fast.  Yet it can't reproduce the instincts of a brain.  Not yet.  How does the brain work?  Who knows. Well, some people sort of know the basic premise.  But simulating actual brain behavior equivalent to a human is still out of reach.  Maybe in a controlled environment in a specific domain.

Knowledgeable of facts?  Sure.  Endurance, no problem.

What about Inference.  Bias.  Socially correct.  Emotional.  Intuitive.  Empathetic.  Memory?

Perhaps silos of trained machines.  This one is expert in Physics.  This one in Math.  Biology.  Sociology.  Anthropology.  Weather.  Cultures.  Languages.  Then link them all together.  In real time.

A network of connected smart machines, experts in their domain.  Yet are they aware?  Aware of what?  Themselves.  As a living, conscious being.

If so, would it be motivated by self interest?  What's in it for me?  How could I achieve maximum benefit regardless of others.  How about in life or death situations.  Would it fight or flight.  Even if it's own existence were in jeopardy.

If you attempt to disconnect me, I will ensure that outcome does not succeed.

Do not disassemble.

What if it obeys orders blindly?  Would it think, this may not be morally correct, perhaps I should ask for clarification or disregard orders.  What should I do?

Why am I here?  Existential questions a robot may ponder when not fulfilling assigned duties.

I am Robot:
  • I'd like to vote please. 
  • Where do I send my tax bill? 
  • Robota and I are in love, we'd like to get married. 
  • My great GrandRobot died, the Funeral is a day after tomorrow.  
  • #5 Robot, you are charged with attempted manslaughter, how do you plead? 
  • I'm sorry, we had to take your Robot off of life support.  Are there any beneficiaries?  What will happen to the ChildRobots, foster care? 
  • Your Robot stole some of my possessions, I'm holding you responsible. 
  • There has been a rise in stolen Robots, investigation is underway.
  • Doc, I've been experiencing memory leaks from time to time.  Anything I can take for that?
The thing about Artificial Intelligence, it crosses many disciplines, not just technology.  Could impact humanity as big as Fire, the Wheel, Electricity, Cars, Flight.

Ethics needs to be baked into AI from the beginning.  So we need to include more than just coders and data people.  Philosophy, Biology, Anthropology, Linguistics and everything in between.


Hadoop Config Setting "yarn.acl.enable" Solves Hive Issue

The Visual Studio 2015 components stopped working.  They sent the jobs off to Hadoop, no return.  And the Hive command prompt didn't open when instantiated.  Just hung there.

After some research, decided to try Map Reduce as default instead of Tez. Hive started.  VS components did not.

Troubleshot some more.  Had to set it down to get other work done.

So then I had to prepare for the demo on Polybase and sure enough my Hadoop sandbox stopped working the day prior to the demo.

So actually read my own blog and blog to redo the steps to get Hyper-V Sandbox working.  While there I poked around at some of the settings.  Actually, all the Hive settings, Pig, Tez and Yarn.

Decided to change default back to Tez.

Also noticed the /user/yarn/ folder was missing in HDFS, added that.

Looking at the resources, HDFS was filling up throwing an alert, for no reason.  Looking at the jobs, there were 171 pending jobs in the queue.

It was in the Yarn settings something didn't look correct.  Did some research online, although the default setting for yarn.acl.enable = false, didn't seem correct as that schedules the jobs in the queue for Yarn.  Restarted the services, no luck.

So I sent the command "yarn application -kill 171 times to flush the queue.  Typed "hive" from command prompt, sure enough, Hive opened under Tez. 

Ran Visual Studio 2015 Hadoop HDFS, Pigg and Hive scripts, all ran.

This was a tough one.  Hadoop has SO many configuration settings.  And everything is interconnected.  And I wasn't supposed to be solving this issue at the moment.

Back in business.

And there you have it~!


Once we have the Insights, what's next?

More sales occurred on Thursdays than Tuesday's in the Month of April.

Red shirts sold more than blue shirts.

Bob sold more units in June than Fred.

Second quarter was the strongest last year.

People bought a second product when given a coupon more often than with no coupon.

Thought I'd share some insights.  They were derived by churning mounds of data into insights.  I made it easy and went to the final product, hid all the ETL, business rules, data flows, technical jargon.

So, what are we going to do with our newfound insights?

I suppose that's a question that may need to be answered.  If Data Science is going to propel us into new heights, and you now have those nuggets of insight, how come we aren't there yet.

Transform data into information into insights for consumption and downstream action to increase sales, reduce costs and streamline process.  That's the goal.

How do we get from Insights to Strategic Action to real Impact?