Few Questions on Proposed Basic Income

Basic Income.  The proposed solution to keep people afloat when they are unable to find gainful employment, at wide scale, due to increased efficiency of automated systems.

Sounds like a good idea.  When you first think about it.  Free money for doing nothing.

Where would the money come from?

How is this different from welfare?

Is healthcare included?

Does the money have to be paid back?

Will people still have to search for work?

How will people spend their days?

How much money is given above the basic costs of living?

Will people have to move to affordable housing?

What about pending debt owed, will that be forgiven?

How will stores stay in business, with less money floating through the economy?

Will education still be required?

Will people turn their assets over to the state, in exchange for basic income?

Will people still be required to pay taxes?

How will government agencies stay afloat with less tax revenue?

Will people be given set rations?

Will the black market expand to exchanges goods for services?

Will the suicide rate increase?

Will a limit on number of children be mandated to curb population?

Will employed people have to give percentage of income to support basic wage for others?

Will people pool their basic income resources for efficiency?

Will they increase the number of Gilligan's Island reruns so we have something to do 24/7?


Troubleshooting Ninja

Long live the great Industrial Revolution

The great Industrial Revolution.  It was sold as the best thing since sliced bread.  Don't wait 6 months for your rocking chair to be built, with IR, you can have it in under a week?  How you ask?  We created assembly lines to produce goods in mass production.

Oh, that sounds great, I'll place an order for a new rocking chair.  Sign me up!

And so, we sent quality to the graveyard, in exchange for faster goods at lower costs.

Fast forward a hundred years, where are we?  We have over production of useless goods, that don't last, and aren't that inexpensive.

First, as we know basic economics, that increased quantity of goods should reduce the price.  That is not what we see in the marketplace.  There's several stores with the exact same business model, with the exact same products, with the exact same price.  Prices do not drop based on increased supply.

Second, the quality of products has decreased to such an extent, sometimes I walk out of the store, deposit the item directly into the rubbish can right outside, why wait to get home and use it once before it breaks.  Quality shmality.  I'm not buying it.  The quality of material is not meant to  last more than a few uses.

Third, have you ever walked down some of the isles of a mega warehouse store.  Ask yourself, how many of these products existed during my grandparents day.  I'd say less than a quarter.  We've invented new products that haven't existed for very long, that are specialized to a specific niche, with 28 different scents or varieties.  In our grandparents day, they didn't have teeth whitener or mildew remover.  They used basic products that actually worked without inflated prices.

Fourth, the packaging has gotten so creative.  When I say creative, I mean deceiving.  The product sizes are slowly shrinking.  While the prices are slowing increasing.  Pay more for less, on products you don't need, which won't last.  Why not put it on credit, can pay it off over the next 7 years.

We've been duped.  Industrial Revolution was supposed to get us the same products faster and less expensive.  What we actually got was crap products that we don't need which don't last at inflated prices.

Luckily, nobody has much money left to spend on goods.  Enter Robots and Artificial Intelligence.  They'll do nicely as a low cost substitute for the tapped out humans.  The new acronym, B2R or Business to Robot.

Long live the great Industrial Revolution.


The Self Service Business Intelligence Revolution is Underway

I wrote my first Crystal Report probably in 1996 as Version 5 was just released.  And Crystal Info too, that original code resides in today's Business Objects.  The thing about it, reports were based on data.  And we used SQL or Structured Query Language to get the data out of the DB into a Report.

Sure they had a wizard, but after 5 minutes, you had to write custom SQL.  And for a very long time SQL didn't change.  They've added new things like JSON and XML support.  And some other goodies over time.  It hasn't changed drastically.

Now we have Self Service Business Intelligence.  Put the power of the data in the hands of the people.  It really is a revolution.  It is powerful.  And easy to use.  You can build a dashboard in about 5 minutes.  Just connect to the data source, add a few fields, colors, labels, publish, out the door.  Bam!

So why do we need report writers or ETL (Extract Transform & Load) developers, or Data Warehouse developers?

Well, Self Service BI does not include a button to push to clean the data.  The data is still dirty.  That extra space at the end of row 3854, or 5 different ways so store Firstname field, Lastname field or Firstname + " " + Lastname or Lastname, Firstname Middle Initial or Lastname.  Or duplicate data.  Or missing data.  Or manually extracting data feeds from other reports.  Or data only being correct for the day the report ran.  Or business rules changed in May, how do we report both the old metrics and new metrics year end reporting, and on and on.

Self Service will get you brownie points for fast, visually appealing Dashboards, just keep in mind, there's no easy solution for data quality issues.  If we set that to the side, self service bi could very well displace traditional report writers.  Because the end users know the data and the business rules.  And they don't need to be trained in coding SQL.  Report writers just know the data for the most part, slight generalization, but true, they may not be able to decipher the meaning behind the numbers.

If I was strictly a report writer, with no other value added, I'd be slightly worried.  The Self Service BI is taking over.


AI is Ready for Takeoff

We all start out with a clean slate or Tabula Rasa.  At some point we recognize people and learn their names.  Momma.  Pappa.  And so, the journey begins.

Soon we are taught the alphabet.  Letters, pronunciation, learn to draw them cursively,  and how they are combined to make words.  Those words have meaning and can be combined to form sentences.  Which are combined to make paragraphs.  Which are combined to make stories.  That opens up a whole new world.

Soon we are given assignments, designed to help us learn, and then we are tested, and receive a letter grade validation.  If we receive high marks, we are given positive reinforcement and negative feedback for low marks.  After much training, we become experts, where we transition from learning mode to production mode, taking a job to earn a living, and our training is put to good use.

So too, with computer algorithms.  We create a system by which a machine is capable of receiving input, learning the patterns through positive reinforcement and tuning.  After reaching desired levels, in that it can predict statistical probability with a high degree of accuracy, the system has graduated from it's learning stage.  We then send real data through as input, to the trained model and the algorithms can produce some great results.

DeepMind, an Artificial Intelligence division at Google, was able to produce an environment in which the machine taught itself Atari games without any prior knowledge or expected results.  Soon it mastered the game at expert level, better than almost any human.  Later, it taught itself the game Go, after being fed millions of games which were scraped off an online site.  The machine went on to beat one of the best players of the past decade, surprising many.

So as we can see, machines are capable of learning and performing great feats.  Algorithms learn from input data over time.  Then duplicates the behavior with high degree of accuracy, not just facts, but strategy.  Although some may say this is the brute force way of training, it seems to work.

What's next?  Well, we need more data.  More input.  More algorithms.  More training.  More developers.  There are now tools available to train and monitor electronic activity via websites, for instance, gaming sights.  You point your AI algorithm to the site, it learns in the background.

As more of the world goes online, with smart phones, social media, payment transactions, etc. the amount of data and events are growing exponentially.  Not only that, those events are tied to people, places and things.  The AI is learning the patterns from multiple sources continuously.

One way to train a model is through the technique called Supervised Learning, in which the data fed into the model is known in advance.  If the data is composed of images, perhaps the label says dog or cat or bird or mountain.

Yet there's another technique used to train the model's known as unsupervised training.  The input data does not have any labels or identifiable information, so the machine has to learn more without assistance.  There are techniques to find the patterns, but it's more complicated than Supervised Learning.

As the data grows, and becomes more available, the AI machines have more input to crunch.  Like completing a jig-saw puzzle, each single piece doesn't add much value.  As you put more pieces together, a picture starts to take shape.  AI will eventually put the pieces together across multiple domains, to create a giant puzzle comprised of bits and pieces to complete it's own puzzle.

Watching patterns over time is what AI systems do.  Patterns can be interpreted.  AI systems can detect patterns from the data, similar to fraud detection system's looking for anomalies. 

They used to say any device hooked up to the internet can be compromised.  How about AI systems.  Anything electronic, from Internet of Things devices to cameras, social media, smart phones, payment transactions, banking accounts, location data, etc. could potentially be ingested, processed and interpreted to detect patterns over time.

Perhaps we have passed the basic learning of the letters of the alphabet, to forming words and sentences and paragraphs.  And we about to graduate from our 12 years of learning, into the real world.  Where our knowledge is put to the test.  Artificial Intelligence is graduating from school and entering the real world.  Hang on to your hats.  AI is ready for takeoff.


Intro to Spark on Hortonworks Hadoop

I was looking for a demo on using Hortonworks Hadoop and Spark.  Found this great presentation that discusses high level to actual coding demo, along with ability to download the examples and try yourself.

To see the Lab, here's the URL.

To see the SlideShare, here's the URL.

There's no shortage of training material to get up to speed on the latest technology.  As long as we have the time.

Happy learning~!

PowerBI Similarities to MS Access

Microsoft PowerBI is a great tool for Self Service Business Intelligence.

It has a  desktop IDE for development.  For all users at any level.  And can be uploaded to the Cloud.

You can connect to many different data sources, which is truly powerful.  Take the JSON connector for example, how would you easily ingest a JSON feed into the Traditional BI framework.  Not so simple.

It has an easy to use Power Query to manipulate the data.

And you can store data in Power Pivot, compressed data to handle large volume of data, the row count exceeds traditional Excel row count limitations.

And it has built in interactive dashboards for quick analysis and visualizations.

And it has a good map interface.

And it has natural query language.

If you compare the basic features, it sort of reminds of Microsoft Access.  Access has data storage capabilities, querying, reports.

What I liked about it was the ODBC to connect to different data sources as linked tables.  And the OLE feature to embed Access into other computer languages such as Classic Visual Basic and Classic ASP web apps.

Perhaps PowerBI will expose its core features as some type of OLE functionality.  Perhaps allow SQL Server Integration Services or SSIS for short, to consume PowerBI components.  Like ingest data, run Power Queries, upload to Server, etc.

Another feature it doesn't have, as far as I know, is to store data as a Relational Database.  As in, write records directly from another application like c#, as in Relational Database storage, similar to Access, for real time transaction purposes.

Overall, Microsoft PowerBI is widely used in many organizations for a good reason.  It simplifies much of the complex heavy lifting traditionally done by full time Business Intelligence or Data Warehouse folks.

And there you have it~!


Intelligent Machines or Pandora's Box

We tend to see the complex world through our lenses.  We see, we interpret, we filter and we assume our view of the world is correct.  We receive validation from the external world, and adjust our behavior and thoughts to align with what's acceptable.

For those who work in advanced theory, such as Artificial Intelligence, or Physics, what have you, ideas are formulated on rules and theories.  Theories change over time.

We base our thinking, our thoughts, our theories and hypothesis on justifiable evidence.  And those facts are the building blocks for projecting out into the future.

Man can not travel faster than light.

Objects fall at 9.6 meters per second.
The planets circle the Sun, the Solar System spins around in the Galaxy, and it too spins around, always moving away from the center.

There are some fundamental concerns or issues, staring us right in the face, that we have no answers.

Are ghosts real?

Is there an afterlife?
When does life begin in the womb?
Who built the pyramids of Egypt, why, and what technology did they emply?
What is all the dark matter in the Universe?
How do we create life?

With the rise of Artificial Intelligence, surrounding all the doomsday theories, such as no jobs, humans residing in zoos, and serving our new lords, the computers and robots, we are baffled by the creation of new beings.  When machines become intelligent, far superior to the current Alpha bipedal hominids that dominate the planet.  With intellectual capacity beyond comprehension.

Is that a possibility, perhaps.  Like anything, there's a statistical probability it could happen, similar to a monkey sitting at a typewriter will eventually produce a masterpiece such as Shakespeare.  Could happen.

If you listen, there is talk of creating such an intelligent being or beings.  And the concern, perhaps justifiable, that these beings could become destructive.  If they were given a task to amplify production of a specific item at maximum efficiency, they could destroy the earth in their attempts, because logically, it makes perfect sense.

And in order to prevent such an occurrence, steps need to be put in place, to prevent.  So what steps should be taken?

We don't know.  Because the number of potential variables is unknown.  There is no way possible to code for every scenario.  We don't have the time, resources or expertise.  So one option is to build an environment, in which the machines could learn by training themselves based on given rules.  They would learn right from wrong.  Not just black and white decisions, but ambiguous open ended questions.

Is it okay to kill a person?  No.  Except in war or as capital punishment.

Is it okay to steal?  No.  Unless there's a justifiable reason.  Like what?
Is it okay to lie?  No.  Unless it causes unnecessary harm.  What determines the difference between a white lie and a real lie?
Is it okay to maximize personal gain?  In sports, it's acceptable, unless outside the governing rules such as steroids.

Could machines live in a world, where they always tell the truth, always do the right thing, always in line with the given rules of society?  Do humans always live within the given rules of society?

There are severe consequences for breaking the rules, yet we have jails filled with people and people roam the streets breaking rules without getting caught.  Drunk drivers.  Bank robbers.  Petty thefts.  On and on.

How could machines and humans coexist in a world where the rules are shades of gray and the robots follow every rule and the humans do not?

So if we can't write code to outline the blueprint of acceptable behavior based on our current mantra of ethics and morals, how do we accomplish such a feat?

Well, the current state of machine learning works by feeding information as input, letting the machine learn over time, and then produce a statistical probability that the output is accurate with a certain degree of accuracy.  The elephant in the room is that we as humans do not really understand how the machines learns.  For all practical purposes, it's a "black box".

This black box, we could alias with "Pandora's box", if we don't know exactly how it works, and the machines continue to advance in accuracy and capacity, then what.  We don't understand it at the simple level, how about as it becomes more complex, beyond our comprehension.

Just build in some method to control the growth, like pull the plug or shut it down.  If we compare this to the internet, how could we pull the plug on the internet.  It's decentralized so much that there is no single off switch.  The internet has a life of it's own.  So too, smart machines could soon approach such a state.

Even if we had the capacity and ability to instill morals and ethics from the root foundation, who gets to decide which rules to follow.  Do we have the programmers feed the machines the moral conduct of humanity.  Over time, how would new rules be introduced, and depreciated rules removed.

We view the world through the narrow lenses of our 5 senses, our historical framework for truth and our biased experiences over time.  What if our human capacity, as great as it is on this planet, is infantile compared to the great wisdom of the Universe, and in our effort to produce smart machines in an incubator, to create new life, without the understanding, compassion and wisdom required, and release Pandora's Box into the Universe, with no way of shutting it down.

A similar comparison could be the advent of the nuclear bomb.  Except that technology is exiled off somewhere in secrecy, never to be used except in extreme circumstances.  Would Intelligent Machines fall under the same domain, or will it be created commercially, and if so, who would have governing capacity.  Creating Artificial General Intelligence could be many years in the future, but then again, it could be here sooner than we think, with all the research funding increases over the past few years.

For me, I think its a fascinating subject and has lots of potential benefits.  But then again, there's a lot to consider.  Time will tell.


Programmers Must Become Utility Players and Adapt to Change

A carpenter that only knows how to operate a hammer, will most likely solve any problem with a hammer.

So too, technologist who specialize in a specific genre, specific vendor, specific whatever, are also using a hammer.

You need to use the right tool for the job.  Which means knowing which tools are available, what they do, how to use them the pros and cons, limitations and benefits, price structure. 

If you have a use case for big data, perhaps Hadoop might work.  Do you know the benefits and limitations, the different offerings, prices associated.  Within Hadoop, do you know which features do what, how they integrate and potential issues downstream.

Well, if you don't know how to program in Hadoop for instance there are dozens of ways to learn.  Onsite and online courses and training, books, blogs, etc.  There is no shortage of training material both free and for a fee.

What that means is on-demand learning.  Can you learn a technology on the fly. 

In my opinion, this is the way things are heading.  For every technology, there are dozens of vendors, products, versions, etc.

Do you have a track record of learning new technology fast.  Do you have a solid understanding of project lifecycle including spec gathering, flowing data through the lifecycle, which product to use when, how they integrate, testing, documentation, project management, mentoring and knowledge transfer and releasing code into production.

Let's compare to hiring a chef to work at your restaurant?

Do you know how to cook?  Oh yes, I've been a cook for 10 years.  What kind of food do you cook?  Italian food.  Ok, what if we asked you to cook seafood?  Or Chinese food?  Or BBQ?  Well, I only know Italian.  Do you know how to gather ingredients, mix them in a bowl, bake in oven, mix with side items on a plate?  Yes.  Seems like you could learn to cook other food?  Well, that's not what I'm used to.  It's sort of out of my comfort zone.  Okay.  Next applicant please.

If a programmer knows a language, yet is stuck on one type of platform or vendor, they may be good at what they do, but looking long term, may not be flexible or malleable to adapting to change.

Your ideal candidate may only know one type of food because that's all they've been exposed too.  But they are constantly learning all the time.  They are eager to learn and not fearful of learning new things.

And that's the key.  If you haven't noticed, the only constant is change.  What you may really want is someone that's a sponge, who can quickly soak up new technology, complete the task, rinse, and on to the next.

I wouldn't call that person a generalist.  I'd say they are adaptable to change, learn on the fly, yet have a solid understanding of technology across the board.  In baseball, they call that a "utility player".  Someone that can deliver on demand at any position.

And there you have it~!


Robot are People Too

They say mathematics is the only truth in the Universe.  Although perceived as analytics, the higher up you go, it becomes art they say.  The furthest I ever went was Calculus.  Didn't pass the first time.  Learned it the second time.

They say mathematics is very similar to music.  That's probably true as well.

They say you should learn something so well, that you forget it.  So you can do it in your sleep without having to think about it.  Instinct.

The hottest technology going now is probably Artificial Intelligence.  (For this blog post, we are bundling AI with Robotics and or Cyborgs.)  To get a machine to think.  Learn.  Perceive.  Base responses on past events.  Without hesitation.

A machine can process information fast.  Yet it can't reproduce the instincts of a brain.  Not yet.  How does the brain work?  Who knows. Well, some people sort of know the basic premise.  But simulating actual brain behavior equivalent to a human is still out of reach.  Maybe in a controlled environment in a specific domain.

Knowledgeable of facts?  Sure.  Endurance, no problem.

What about Inference.  Bias.  Socially correct.  Emotional.  Intuitive.  Empathetic.  Memory?

Perhaps silos of trained machines.  This one is expert in Physics.  This one in Math.  Biology.  Sociology.  Anthropology.  Weather.  Cultures.  Languages.  Then link them all together.  In real time.

A network of connected smart machines, experts in their domain.  Yet are they aware?  Aware of what?  Themselves.  As a living, conscious being.

If so, would it be motivated by self interest?  What's in it for me?  How could I achieve maximum benefit regardless of others.  How about in life or death situations.  Would it fight or flight.  Even if it's own existence were in jeopardy.

If you attempt to disconnect me, I will ensure that outcome does not succeed.

Do not disassemble.

What if it obeys orders blindly?  Would it think, this may not be morally correct, perhaps I should ask for clarification or disregard orders.  What should I do?

Why am I here?  Existential questions a robot may ponder when not fulfilling assigned duties.

I am Robot:
  • I'd like to vote please. 
  • Where do I send my tax bill? 
  • Robota and I are in love, we'd like to get married. 
  • My great GrandRobot died, the Funeral is a day after tomorrow.  
  • #5 Robot, you are charged with attempted manslaughter, how do you plead? 
  • I'm sorry, we had to take your Robot off of life support.  Are there any beneficiaries?  What will happen to the ChildRobots, foster care? 
  • Your Robot stole some of my possessions, I'm holding you responsible. 
  • There has been a rise in stolen Robots, investigation is underway.
  • Doc, I've been experiencing memory leaks from time to time.  Anything I can take for that?
The thing about Artificial Intelligence, it crosses many disciplines, not just technology.  Could impact humanity as big as Fire, the Wheel, Electricity, Cars, Flight.

Ethics needs to be baked into AI from the beginning.  So we need to include more than just coders and data people.  Philosophy, Biology, Anthropology, Linguistics and everything in between.


Hadoop Config Setting "yarn.acl.enable" Solves Hive Issue

The Visual Studio 2015 components stopped working.  They sent the jobs off to Hadoop, no return.  And the Hive command prompt didn't open when instantiated.  Just hung there.

After some research, decided to try Map Reduce as default instead of Tez. Hive started.  VS components did not.

Troubleshot some more.  Had to set it down to get other work done.

So then I had to prepare for the demo on Polybase and sure enough my Hadoop sandbox stopped working the day prior to the demo.

So actually read my own blog and blog to redo the steps to get Hyper-V Sandbox working.  While there I poked around at some of the settings.  Actually, all the Hive settings, Pig, Tez and Yarn.

Decided to change default back to Tez.

Also noticed the /user/yarn/ folder was missing in HDFS, added that.

Looking at the resources, HDFS was filling up throwing an alert, for no reason.  Looking at the jobs, there were 171 pending jobs in the queue.

It was in the Yarn settings something didn't look correct.  Did some research online, although the default setting for yarn.acl.enable = false, didn't seem correct as that schedules the jobs in the queue for Yarn.  Restarted the services, no luck.

So I sent the command "yarn application -kill 171 times to flush the queue.  Typed "hive" from command prompt, sure enough, Hive opened under Tez. 

Ran Visual Studio 2015 Hadoop HDFS, Pigg and Hive scripts, all ran.

This was a tough one.  Hadoop has SO many configuration settings.  And everything is interconnected.  And I wasn't supposed to be solving this issue at the moment.

Back in business.

And there you have it~!


Once we have the Insights, what's next?

More sales occurred on Thursdays than Tuesday's in the Month of April.

Red shirts sold more than blue shirts.

Bob sold more units in June than Fred.

Second quarter was the strongest last year.

People bought a second product when given a coupon more often than with no coupon.

Thought I'd share some insights.  They were derived by churning mounds of data into insights.  I made it easy and went to the final product, hid all the ETL, business rules, data flows, technical jargon.

So, what are we going to do with our newfound insights?

I suppose that's a question that may need to be answered.  If Data Science is going to propel us into new heights, and you now have those nuggets of insight, how come we aren't there yet.

Transform data into information into insights for consumption and downstream action to increase sales, reduce costs and streamline process.  That's the goal.

How do we get from Insights to Strategic Action to real Impact?


Resolving Dts.Variables in Script Component of SSIS to call REST API

For a project, I was tasked with making a REST API call to send up some data using JSON, then receive the returned values and place into a SQL Server table.

So started out using a Script component.  And pieced together some code, to get the REST API call to work successfully.  In order to do that I first had to figure out the correct syntax.  I used the built in plug-in using Chrome.

Click the Launch button and another window appears:

After entering the URL for the REST API, I selected POST.

Then tweaked the JSON data until it processed success. So the REST API call was accessible.

Next, went into SSIS using Visual Studio 2015, added some code, including a Script component.  Then pieced together some code to send a JSON statement to the service using a hard coded JSON string.  And that worked.

Next, had to feed data into the Script component, using an Execute SQL script.

Next, added a For Each Loop component and set the result set to a variable: User:ResultSet

Then created 4 User variable to hold the contents of the result set.

Then within the For Each container, added a Script component, selecting "Source", not "Destination" nor "Transformation".  And attached a Destination table in SQL Server.

Then you set the Output expected Returned values within the Script component.

Then edit the script.

I got it to run success when hard coding the JSON string.  Of course we have to then set the thing to run using real data, flowing in from our Source data from our Execute SQL component. 

So we set our "ReadOnly" variables, not "Read/Write", added the 4 User variables and head back to the Edit Script again.

And in the code, we swap out our hard coded JSON statement with a well defined code substitution our 4 User:Variables.

But we ran into an issue.  It wasn't able to read our variables.

Error CS0234 The type or namespace name 'Variables' does not exist in the namespace 'Microsoft.SqlServer.Dts' (are you missing an assembly reference?)

I read some post online that suggested many possible solutions, none worked.  Then I remembered that you have to add the Microsoft.SQLServer.ManagedDTS.dll to the project, which I did, still threw an error.
Then searched in a few places, and created a Matrix, comparing the DLL values from the SQL Server 2016 box to the development box that has Visual Studio 2015:

You can see that the DLL could reside in multiple places, specifically the SDK directory and the GAC.  The 592 and 593 were the file sizes to I could identify the differences.

It turns out, the version in the GAC needs a different version so I overwrote it with a newer version.

Reopened Visual Studio:

string strStreet = this.Variables.street.ToString();

I changed the syntax slightly, and it resolved correctly and was able to flow a record through and it showed up in the database table as expected. 

So I opened up the SQL statement and removed the "top 1" and it ran entire way through without errors.

So it seems that the dll version must have corrected the issue, as well as having to change the syntax slightly for from:




That's my story and I'm sticking to it.

Getting Started with Hortonworks Certification Program

If you are interested in getting certified in Hadoop, Hortonworks offers a complete Certification path.  You can get started here.

They have a few paths to choose from:

HDP Certified Developer

HDP Certified Apache Spark Developer

HDP Certified Java Developer

HDP Certified Administrator

Hortonworks Certified Associate

I started on the Hortonworks Certification path last week with the HCA exam, I blogged about it here.

I think the next exam I'd like to try is The HDPCD Exam

Here's the Cert Sheet

And Exam Objectives.

And the HDP Certified Professional FAQ

And Guide for Attempting an HDP Certification Practice Exam where you can take a sample "practice" test on AWS, just charged for usage time at minimal rate.

And you can register for the exam here, must use Credit Card to secure the cost of the exam.

I'm thinking this Certification would make the most sense as I have some experience with HDFS, Hive, Pig and a bit of Flume.  They also have the Spark Certification, which is great, but I'm into the movement of data within HDFS at this point in time and not as much into the In Memory aspect of large scale interrogation of data sets in HDFS.  Maybe after the first exam.  The Administrator exam would be after that, and probably not take the Java Exam as that is more Hadoop 1.0.

This blog post simply consolidates some of the available links to get your started with Certification from Hortonworks platform.
Thanks for reading~!


Hortonworks Certified Associate

I signed up for the Hortonworks Certified Associate exam last Thursday.  Figured if I sign up, I'd have to take the test.  And if I take the test, I could pass.  In order to pass, I'd better study.

So I studied a bit on Friday.  And Saturday.

And 9am on Sunday, logged on to the test site, seated from my office, out by the pool.  When you log on, they ask you to share your desktop.  And your camera & microphone.

Then you are instructed to pan the camera 365 degrees as well as the desktop.  Once I checked out, the exam started.  41 questions.

Most questions, I knew the answer immediately.  There were a few I wasn't sure, those must have been the ones I got incorrect.

You get an hour, but I completed in just over 25 minutes.  Scored 82%.  If you earn over 75% you pass.

So I officially passed the first test for Hortonworks Certified Associate.  

And I've been getting some Hadoop real world experience on my current project with HDFS, Hive, Pig, Polybase.

I actually learned Hadoop in 2012 at a prior job, took the 2 week class with a bunch of other developers.  Just took a while to get a good project.


Early Adopter of Computers 35 Years Ago

The other day, my wife and I were talking in the car.  I mentioned that my family had the first version of IBM PC, before it was MS-DOS, it was called PC-DOS.

So I told her the story of how Microsoft licensed it's software to IBM.  MS didn't actually write it though, instead, they acquired it quite cheap.  And how that was probably the biggest strategic business maneuvering of our lifetime.

When I first got on a computer, at age 14, we didn't have hard drives or a mouse.  Just dual floppy disks, a keyboard, color monitor, Epson printer, and a 1200 baud modem.  

Back then, I would call up the BBS and page the Sysop, download software and games.  

Although the Commodore 64 was out and the kid up the street had an Apple, and the TRS-80 at school, IBM was cutting edge back then.

So why didn't you major in computers in college my wife asked.

Well, I sort of majored in Business for 2 years.  After being "undecided" around Junior year, they said you need to pick a major so you can graduate.

I had so many Anthropology credits, that I only needed a few more Anthro classes to graduate.  Although I was just shy of a Minor in Business.

The reason I didn't major in computers was this: in high school all I did was play tennis and do enough school work to get A's and B's.  Reason being, my education sort of suffered after moving to Florida.  As not one teacher realized that I could solve the Rubik's cube in a minute, or could speak for that matter.  And lastly, I never thought for a second about growing up and getting a job after college.

It wasn't until I got into the workforce and found programming as an occupation, that the career starting to improve exponentially. 

I said to my wife, a lot of people with similar backgrounds of early access to programming and / or growing up in an IBM family, went on to build companies and become millionaires.  So in that regard, not sure what happened.  

I said, my personal opinion, the area in which I worked had back office jobs that may have had an IT department, that my have needed some programs maintained or reports created.  There wasn't much new development.

Also, the newer technology didn't float to where I was and although I tried to learn the new stuff, the jobs and projects just weren't there.  

So it could have been all those years of "maintenance programming" as well as "location".

I said to the wife, that I've been programming since 1982, almost 35 years.  I was lucky to learn at a young age, on original IBM PC, which is now in the Smithsonian.  

And in 1990, I had a laptop the size of a suitcase with orange color screen and weighed 40-50 pounds.  After that, I had IBM 286, 486 and on and on.

Where are we now?  Well, soon we'll have hybrid people with embedded digital devices, artificial intelligence in the mainstream, automation of everyday activities, 3-D printing everyday items, flying cars and delivery drones.  Is that good or bad?  

Well, for one thing, the quality of service would become more standardized, and we wouldn't have people give preferential treatment based on bias, special favors or ignorance.  Maybe even some accountability.

And services will get faster.  And we'll have audit trails.  And personalized service.  And business activity 24/7, not just 9-5.  And perhaps more interaction with people across the globe.  Maybe use some technology to cure diseases or benefit humanity.

Technology is finding it's second wind.  As am I.  

I was there 35 years ago, in the office upstairs, typing away on that IBM home computer, at the very beginning of the technology movement.  How lucky is that? 

And that's the history of computers through the eyes of an ex-tennis playing Anthropologist turned Loan Underwriter, Programmer & Data Professional Consultant.

Thanks for your time~!

Install SSDT Data Tools for Visual Studio with no Internet Connection

In order to install SSDT Data Tools for Visual Studio, with no internet connection, it is possible:


    Once downloaded, run the following command using an administrator command prompt (cmd.exe run as administrator):

    SSDTSetup.exe /layout

    Where is the location you wish to create the administrative install point (e.g on a USB drive, a LAN drive or other accessible location). NOTE: You will need approximately 1.8GB of free space at for the full install point because it includes all possible components that might be required.

To use the install point once created, simply run SSDTSetup.exe from the location with no arguments. This will use the install point rather than attempting to download new copies of the relevant chained components.
Have to run from Dos (administrator), extract the files to another folder:

Then cancel the installation in progress, drill into new folder (temp) and run from Dos (administrator) same window, but run from the temp directory:

 Copied folder to c:\temp\ssdt along with 2 folders (payload & ssdt):

Now you can develop SSIS, SSRS, SSAS and Tabular projects.


Self Describing Data that Tells YOU It's Story

Having data means nothing.  Unless you can interpret it.  That's the latest phrase heard recently.

And its probably true.

Data tells a story.  Data is merely accumulated pieces of raw data.  Perhaps in relational tables or flat files or Excel docs or unstructured.

Even still, something's missing.

Data needs to be more descriptive.  As in attributes that describe the data.  In language we have nouns: persons, places and things.  And we have verbs: things doing something in different tenses, past, present and future.

We also have adjectives and adverbs, which describe the nouns and verbs.  Book.  What kind of book subject?  Size?  Shape?  Contains what?  Author?  Date written?  Might be valuable information.  Describes the book.

Run.  Run where?  By whom?  When?  How far?  Started and ended where?  Describes the "run".

It seems we need an inherent way of self describing data.  If someone hands you some data, wouldn't it be nice if you ran it through an interpreter, or compiler, or some data framework to load up the data in descriptive detail, without having to write queries, joins, merge data sets, etc.

Thanks for loading this data.  This data was created at this location, at this time, by this application.  The data describes a set of purchases of books.  Here's information about the books.  The author.  Who purchased them.  How much they paid.  What else they bought.  The customer demographics. And on and on.

Plug and play.  Insert data set.  Whola!  Here's the story about the data.  And of course we could interrogate it further.  With queries or natural query language or put into charts and graphs and visualizations and compare against similar prior data sets.

Presto magic.  Self describing data.  Pieces of data with self describing attributes.  That can be loaded into a pre-built framework by anyone anywhere, so long as they have permissions.

And why not share some of these self describing data sets.  Put them out on the web to be consumed by REST Web Services or SOAP calls or query the data remotely.

Data without interpretation is like stacked bundles of hay.  Doesn't do much.  It's when you understand the data that it becomes valuable.  Have the data tell you it's story.  By labeling your data with self describing attributes to be self interpreted by you or machines.

And then have machines crawl those data sets like a network of self describing knowledge.  A world digital encyclopedia 24/7 from anywhere anytime, just expose the data sets for public consumption, keep the private data private.

That's the piece that's missing from the data centric world in my opinion.

Self Describing Data that tells you it's story.


Hadoop is Gaining Traction in Building out Data Ecosystems

This week I got to assist on a Hadoop installation of a Master Node a three Data Nodes.  We used the Ambari installation.

At first, the install was done manually, on Redhat Linux.  I spent a good time troubleshooting, poking through all the directories and configuration files.

And then we decided to use the automated scripts.  First, the Ambari server was setup on the Master node.

And the PostgreSQL database was installed and configured.

Then we stepped through the process, applying the correct settings and ran it through.  And it threw errors as some services would not start.  So we troubleshot and tried again.

I think the major issues were not having the $JAVA_HOME path set in all the right places.  Another issue was to use the actual fully qualified domain name instead of localhost in the HOSTNAME= setting.  As well as using ROOT as default user.

After that, booted up, all services were running, mission accomplished.  Since I was doing all my Hadoop development locally on a laptop, running Hadoop on Hyper-V with 3 or 4 gigs of ram, after I ported the Visual Studio 2015 project over to the client servers pointing to SQL Server 2016 and Hadoop cluster, it runs really fast.

At this point, we're ingesting some data from Excel, into HDFS, ETL using Pig, mounting Hive tables, then Hive ORC tables, cleaning up the file remnants along the way (don't be a litter bug!) and finally, pulling that data into SQL Server 2016 using Polybase.

What's next?  Adding some Data Quality Services, Master Data Services, and then flowing into a Data Warehouse using Dim and Fact tables, and then finally, pushing the data into Analysis Services Tabular Model for consumption.

There's a few other things that need to be done as well, like set up Kerberos, create some generic users/group to run the services, and standardize the directory structures along the way.

Hadoop was sold as the next big thing, shove all your data in, find unlimited insights, and then the hype wore off.  Because it's basically a bundled set of mini applications, fairly complex to set up and administer, and the lack of qualified resources to develop.  As SQL developers did not know Java, DBA's didn't like to code and traditional Java programmers didn't know the data layer.

At this point in time, years later, there's still a learning curve, but the tools to push the data through have gotten better, we are not required to write Java to write map/reduce jobs and we have many additions to mimic traditional Data Warehousing concepts, along with Machine Learning, Graph database, workflows, security, etc., etc.

So now we can leverage our existing or new Data Warehouses that have been around for a long time, and add more data sources including non-structured and semi-structured data along the path.  I could definitely see more organizations taking advantage of this paradigm to beef up their data ecosystems and find those "insights" we were promised many years ago.

It's now a data centric world.  Hop on board.  Hadoop is getting traction.