11/28/2017

What is Programming

Some would say that programming computers is like putting things together that don't have explicit instructions.  You sort of have to figure it out as you go.  Sure past experience is critical in arranging code in specific way to function logically, but more times than not, you have some level of flexibility and creativity to write the code as you see fit.

If you read other people's code, you wonder what they were trying to do, if they used the most practical approach, if they simply got it to work and didn't account for the unexpected, did they document their code by embedding some explanation of what was occurring, to let the next person have some indication when attempting to troubleshoot.

Coding is based on logic, yet flexible enough to add a personal touch.  So perhaps its a blend of science and art, of cognition and creativity, of repetition and free form.

That ain't too bad of a career choice.  Where else you going to find such ambiguity, without necessarily having to deal with people.

11/23/2017

Two robots debate the future of humanity

Interesting video on Robots.  Ben Goertzel has been researching Artificial Intelligence for 30 years, way before mainstream.  He focus on Artificial General Intelligence primarily, rather than Narrow AI, and I've seen many of his video's over the years.  He's got a project OpenCog, found here: https://opencog.org

Enjoy~!


11/05/2017

Humans Apply Bias to Everything Producing Less than Optimal Results

We're the best.  Best?  You're the only one. Yes, but who's counting.

We see what we want to see.  Wear blinders to confirm our bias.  What is bias?  It's our pre-assumed notions of life.

Hey Jon, you wear blue shirts.  I do?  Yes, 10 years ago, you wore blue shirts.  Well, that was 10 years ago.  Now I wear green shirts.  That doesn't align with my bias.  I will continue to think that you wear blue shirts.  I'd rather not adjust to new information, sure it's lazy, but I've already identified this and prefer not to change.

Bias is part of every equation.  You order a cup of coffee.  Some days they are nice.  Other days not.  Why's that.  Perhaps you look like their ex, we'll give your poor service.  Nobody's tracking service levels at a per customer basis, they'll never know.  I'll give you decaf instead of caffeinated.  In my little world, you will suffer.  Now drink your decaf, nobody's the wiser.

Oh, next customer is an old friend, we'll give them extra special service.

Human bias exist everywhere.  Medical offices, schools, you name it.  Special treatment for different people, based on hidden or blatant bias.  That's probably one reason why artificial intelligence is a difficult nut to crack.  Have to account for the "insanity" of humans, where they override logic based on emotions or what have you.

You can program a computer to use 100% logic.  Humans are anything but logical.  Give humans a choice, it will most likely not be based on logic.  At any point along the thought trail, there is human bias, for better or worse.

If you want to simulate human actions tied to thought process, simply identify the best possible solution using logic, and take the worse possible outcome at minimum 50% of the time.

You do know its probably better to pay your mortgage than go to the casino and gamble away your paycheck.  Of course, anyone knows that.  See you, I'm off to the gambling.

Humans use logic, and apply hidden bias to every equation, producing results that make no sense.  Hence, artificial intelligence embedded into robots is a bit complex problem to solve.

And there you have it~!

10/29/2017

Securing the Data is Top Priority

The recent data breach at a major information store house company should send alarm flags to every major business today.

If they can hack an almost impenetrable system, chances are, yours could be next.  I once heard that any device connected to the internet is hack-able.  And it's playing out in real life.

If I was responsible for securing data, I would go out and hire the best security team available and set up a fortress, best that money could buy.  It's easier to defend before an event happens than to clean up the mess down stream.

Security experts should be well employed from here until eternity.  Yes, having great data converted to insights to increase sales, reduce costs and streamline process' is great, but if hackers can access your data dump on the internet and your business shuts down, that's not too good.

Data breaches are real.  They happen every day.  To some of the biggest orgs out there, especially the ones that have valuable data for exploitation.

Security first!  That's the way I see it.

And so it goes~!

10/14/2017

With All our Technology Why haven't We Improved These

We talk about technology helping mankind in amazing ways.

Why haven't we found an easier way to remove garbage and waste from society?  We sure do produce lots of plastics and empty packaging and waste.  In a Consumer society, at some point, everything becomes waste.  I imagine we won't have museums in the future, as none of this stuff will last that long.

Why haven't we found a better way to feed society.  Agriculture got us out of the savanna and on to farms, how many people do you know that are full time farmers?  In that regard, why haven't we created synthesized foods, that bypass the need to have cattle and dairy and fruits and vegetables.  How about a pill wash down with glass of water?

Why haven't we created a better mechanism for the standard Toilets.  Seems like people have been using bathrooms for a very long time.  This is the best we can do?

These three are very basic needs.

How about others?

Why haven't we centralized the health care records.  This one seems like a no brain'er.  We certainly have storage, software, networks, security (maybe), and some really smart people.  Imagine how much savings could be had.  And perhaps better care, faster, more efficient.

Why aren't children considered adults sooner.  Why must children wait until 18 years of age?  Seems like children could be fast tracked and become adults much younger?  Once they reach puberty, they could become parents.  Why do we have 12 grades of school, locking kids into the system until adulthood?  Why not become 'legal' adults sooner.  Does this rule seem a bit outdated?  Like when we needed a dozen kids to work the farm to get the crops in.  I'd say, in the interim, if parents sign waver, child becomes adult when they say, otherwise, wait until 18.  If parent signs waiver to drink alcohol, join military, clergy, etc. make it happen.

Why don't we have an additional 'legal' status, before becoming married.  Like 'sort of married'.  Can have a ceremony, gifts, cake, dancing, except the two are not united in holy matrimony, until death do they part.  They're just trying it out, see if it's a fit.  Sure would cut down on the divorce rate.

Why do people get drivers licences at 16 years old, and never tested again?  Seems like a test now and then wouldn't hurt.  Just to verify the eyes and ears and reflexes are still performing.  And new rules appear all the time on the road, people don't keep up with new rules.  Why not have classes and tests periodically keep people up to standards.  Maybe reduce a few wrecks here and there.

And so it goes~!

Some Questions about Robots and their Role in Society

Assuming Robots enter society, what is our approach towards their mental health.

Will Robots get depressed?  Will they feel fatigue?  Will they feel euphoria?

Assuming Robots had emotions, could they love?  Could they hate?  Would they become spiteful, jealous, envy, revenge?

I could see a Robot in conflict, two prime directives, unsure which takes priority.  Sort of like having someone dipped in freezing water below the waist and excessive heat above the waist.  Body doesn't know how to respond as too much stimuli, not sure which to fix first.  Do they attempt to get out of the freezing water first, then take care of the heat?  What sort of mental dilemma would the Robot experience.

Robot, go to the store, pick me up a box of pop tarts.  Robot gets there, they are out of pop tarts.  Should they get an alternative instead?  Should they go to another store?  Should they return, order on the web?  How would a Robot decide If-Then-Else when the variables are greater?

Would Robots have preferences?  Robot 1 prefers the color Green Robot 2 likes Blue.  If the Robots shared a room, what color would they paint it, Green or Blue?  Perhaps Robot 1 stabs Robot 2 in his/her sleep.

How would that be handled?  Would Robot 1 go to court, hire an attorney, be locked up in prison?  What if Robot 1 was instructed by a human to stab Robot 2, or perhaps their Spouse or Partner?  Robot 1 performed the deed, could the person dictating the order be held responsible?

What if we had a Robot army?  Would Robots feel remorse or Post Traumatic Stress Disorder?  Would a Robot be paid for services?  Would a Robot seek Mental Health counseling, assuming it had Insurance?

Could humans purchase Life Insurance on a Robot?  Will Robots have warranty periods and expiration dates?  What if a Robot receives a faulty upgrade patch, goes off the deep end, who gets held responsible?

Would a Robot have thoughts?  About life and death?  Existentialism?  Would it feel remorse?  Or happiness?  Could it plan future events to steer outcomes such that it would benefit over another Robot or Human?

Would a Robot sleep?  Or would it work 24/7?  Would it get time off to pursue interests?  Would a Robot get married?  Divorced?  Polygamy or Monogamy?  Could it adopt little Robots or Human children?  Could it have pets?  Could it earn salary?  Have  a bank account?  Could a Robot run for elected Office?  Could a Robot have a funeral?

Keep in mind, when we discuss Robots, we are encapsulating Artificial General Intelligence beings, that are self aware, autonomous and have free will, to a degree.

So we have legal questions, property questions, health questions, family questions, liability questions, just about every question imaginable.

Robots are on the horizon.  Are we ready?

9/30/2017

Current State of Technology

What is the current state of technology.

Get a deck of cards.  Open the box.  Throw the cards up into the air.  Watch how each card lands.

Next get a fan, plug it in, aim the fan at the cards, turn on high.  Watch the cards fly through the room, under chairs, tables, behind curtains.

Next, shut all the lights in the room, so you have absolutely no idea where anything is.  Notice how you can't even see the cards any more, scattered, under the chairs and tables.

Now, walk out of the room, down the hall, get in your car, and drive away, fast.

That basically sums it up.

Any questions?

9/28/2017

Technology Advances like a Drum with Block Chain Ripe for Prime Time

Data becomes information.  Information adds value if used properly to align business practices, streamline processes with net result of increased sales & profit and reduced costs.

Traditional Reporting
Reports have existed for a very long time.  They report on the past.  Business typically managed the business in the rear view mirror.

Red shirts seems to be selling in region B, while White shirts are selling in Region C, while Yellow shirts are not selling anywhere.  Let's drop Yellow from the product line, increase Red & White shirt production.  Seems Fred had highest sales, Bob had lowest.  Lots of useful information of the past.


OLAP Cubes
Next, we accumulated that data into Cubes for faster slice and dice.  Lots of data transformation, business rules and data ingestion.  Costs a lot, tough to maintain, can't store all data.


Big Data
Next, Big Data provides data lake to store all data.  Still need to cleanse, transform and prepare data for consumption.  Handles a lot of prior issues, and adds some problems as well.

Machine Learning
Machine learning has been around for a while also.  People have been creating "models" to predict future probability based on past data.  I know the banks used this in the mid 1990's to score credit applications.  Advances in technology, lower costs and software availability have proliferated into mass consumption.  Schools and University's provide courses so people can get up to speed quickly.

Artificial Intelligence
Artificial Intelligence is a higher level of all this.  Massive data, crunching numbers to provide easy access to important information at the snap of a finger.  It's widely available on smart phones.

Internet of Things
Internet of Things allows remote sensors to capture and relay information back to central hub for storage, processing, analysis in real time.  Our planet is in the process of becoming interconnected.  Besides not have standard protocols, guaranteed security and widely available programmer base, there are other things to consider like how long is the battery life of a device, how do we apply software patches and upgrade Hardware & Software over time.

Quantum Computing
Quantum Computing is a hot topic.  Scientist attempt to leverage the peculiar behavior of quantum physics and apply to technology.  Our entire computing system is based on binary numbers of zeros and ones, when combined, they form a language which computers can interpret.  Quantum bits allow greater flexibility where a state may be 0, 1 or some where in between.  This added feature allows faster computation across wider scenarios and can potentially reduce computation time to solve difficult problems.  Theoretically, this technology will be able to crack any encryption known today and poses security risks perhaps.  The machines are very expensive, difficult to maintain cold environments and the knowledge base is centralized to advanced level mathematicians, physicists and scientist.  The quantum bit has bizarre properties, such as instantaneous communication with paired bits across great distances.  Another behavior is not knowing the state of a particular bit until we ask the bit what its state is.  This tech has great potential.

Block Chain
With all that said, the newest feature that has a lot of attention right now is the Block Chain.  Why is it hot?  Because it lays the framework plumbing technology foundation across a variety of sectors.  It has potential to radically alter existing processes such as banking, healthcare, election systems, stock market transactions, clearing house middlemen, just about anything.  What is unique about Block Chain?

It's a distributed ledger.  Transactions are written, never removed.  Each transaction has a Digital Signature, basically a hashed key of the prior transaction, along with current key.  As the transaction is sent out, the Digital Signature is validated and record is recorded across a distributed  data store and record is assumed to be valid because it passes the rules engine and the group agrees the transaction to be valid.  Block Chain can handle any "asset".  Assets can be people, things, events, you name it.

The transaction could be anything.  We agree to give x person 10 units of y.  Transaction sent.  Authenticated & distributed & recorded.  Audit trail agrees.  Done deal.

It could also be stock purchases.  ABC agrees to sell 10 shares of Y Stock to DEF at $10 per share on 10/01/2017.  Transaction sent, authenticated, distributed, recorded.  Done deal.  No batch processing.  No hidden fees.  No hold up in time.  No fraud.  No corruption.  

Speaking of fraud, imagine an Election process that wrote votes to the Block Chain.  Secure, record-able, transparent if need be, audit trail.  Completely transparent with documented audit trail.

This Block Chain technology could not only disrupt just about every business model today, it could also replace our paper money.  Imagine that.  Without paper money, what's the need for traditional banks.  Just sent money across the Block Chain, pay the nominal fees, transaction is secure, done deal.  Imagine all the available real estate with empty banks potentially.

In my estimation, sure reporting is great.  Insights are wonderful.  AI to ease the flow of life, that's tremendous, although Artificial General Intelligence may not happen for a while, if at all, can read my post on that topic here.  IoT can link the worlds together digitally.  Block Chain has the most potential to radically alter the life here on planet Earth.  

What's the holdup from widespread adoption?  My estimation is getting the Block Chain to scale up.  As in millions of transactions a second across a distributed network across the planet.  Maybe that new underwater pipeline between North America and Europe could help facilitate that.  Once Block Chain scales, you could see widespread adoption across a variety of sectors, including all the big boys in every sector.

Technology geeks will eventually run the World's digital plumbing via Block Chain and it will have the biggest impact in the near future.

And so it goes~!

Can Mankind Create Artificial General Intelligence

What do you want to do when you grow up.  For some of us, we still haven't decided.  After close to 50 years.

Chances are, if you chose anything related to technology, you have a chance at keeping your job until you retire.  Although most jobs are now becoming technology related.

They say Artificial Intelligence will soon be here.  First we'd have to define what AI is.  And there are two camps, one is narrow artificial intelligence and the other is general artificial intelligence.

We've already got narrow AI built into many household devices especially smart phones along with some sites on the web.  These assistants make our lives easier by studying patterns, analyzing tons of data to facilitate easier flow in our everyday lives.

As far as general artificial intelligence, from what I've seen, we are years away, if ever, of solving this riddle.  What we are talking about is creating a life form out of thin air.  That sort of dabbles with the hidden laws of nature and physics, which is probably hidden for a reason.

Some of our brightest minds have attempted to create AGI for the past 60+ years, with minimal success.  If the goal is to reproduce the activity of the human mind, we have a ways to go.  We know the basic architecture, except for the black box that ties everything together.  Call it what you like, the soul, the being, the unique aura of an individual, we don't know much about it.

If the goal is to reproduce the behaviors and characteristics of mankind, we better understand what that infers.  This AGI must understand human emotions, like love and hate, lust, greed, envy, rage, revenge, wars, homicide, genocide, blind obedience, hierarchical structures that benefit only the few, for the AGI beings to understand just how we think and behave.

What we've got here, is a reputable version of clinical settings for creating a Frankenstein.  It's alive!  Oh know, what have we created?

I would think that many smart people in the AGI field have already concluded that creating life out of thin air is no easy task.  Is it possible at all?  Time will tell.  If we do in fact let the genie out of the bottle, there would be no turning back.

And why is that?  Primarily because our attempt to rush into this without planning, architect, strategist, creating rules and laws of ethics and behavior, is just not at a point it needs to be.  Our rush to market, for profit and control of the market, is simply bypassing these required tasks.

Another reason for possible delay, AGI is not simply a technology issue.  It's a social, financial, legal, medical, biological, just about every industry needs to be apart of the process.  Technology is just one aspect.  That requires an arsenal of staff, granted, some heavy hitters are putting up a lot of money so they probably have these items covered.

Technology is racing forward.  We've removed the guard rails and safety nets.  Full steam ahead.  If we do crack the code of creating life, I sure hope society can withstand their creation, and if it survives, our lives could potentially be altered forever.



You can read some more blogs on the subject:

Intelligent Machines or Pandora's Box







And so it goes~!

9/26/2017

Data Is Not the New Oil

They say data is the new oil.  Oil is black tar like substance, extracted from the ground, from ancient fossils.

Data is a bunch of zeros and ones combined to form objects in the form of strings, numbers, blobs, etc.

There's only so much oil in the Earth.  Data is an infinite resource.  That's a clear distinction.

Oil can be extracted from the ground, assuming you have the rights and machinery to do so, as well a way to offload your cargo to refinery and then market for sale

Data can be home grown, purchased or found on the web.

Only oil specialist can work with oil, from extraction to shipping to processing to your gas pump at the station.

Anyone who has access to a computer, knows Excel or SQL can work with data.

There are some very clear distinctions as to why Data is nothing like Oil.

What do they have in common.  Ability to make money~!

and there you have it~!

9/22/2017

Digital Currency uses Block Chain Technology

Digital Currency is taking off.  A new currency that bypasses traditional banks and markets.  Many currencies are worth billions.

It runs on block chain.  A database that writes transactions across distributed network.  The digital ledger is sequentially written, transparent and uses hashtags to ensures validity.  If someone tampers with the chain, future transactions will not be accurate for all to see.

Who discovered this technology?  Some guy, not sure.  He wrote a white paper, helped out a bit, then disappeared.  Seems odd.  Digital currency entered the scene Jan 2009, after the financial meltdown.

Block chain is highly valuable due to the fact that it handles secured transactions.  The chain doesn't have to be related to money, so the open source code can be used across variety of sectors.  Think healthcare, retail, stock market.  Transactions are instantaneous, bypassing the traditional wait time like overnight batch processing.

It's primed to replace paper money in the near future.  Perhaps another financial collapse, exit paper money, enter digital currency stage left.  Electronic currency can be tightly controlled, making fraud and money laundering almost impossible.

Block chain can track assets.  Any asset.  With the rise of Internet of Things or IoT, products can be tracked using current methodologies, and soon to be tracked using block train.

Let's say you purchase an item on the internet, using a digital currency like Bitcoin.  The product is shipped, perhaps tracked in the delivery truck, it arrives at your house.  The embedded sensors in the IoT device sends signals back to the home office, write a record to the block chain, "asset ownership="John Q. Owner", DateOfOwnership="2017-09-22", PricePaid="49.99", TaxPaid="3.49".  As you can see, the asset is tracked in an IoT Block Chain database, over time, with audit trail.

If asset changes hands, perhaps stolen or bartered, were proper taxes paid, IoT-Block Chain knows.

Once digital currency takes off and potentially replaces paper money, the national deficit could be wiped clean over night.  Since the dollar is no longer tied to the gold standard, and potentially no longer secured with paper money, not only can every coin or token be tracked in real time, we could also trace each coin back in time to see how it changed hands, when, by whom, you name it.

Imagine how many companies are tied to paper money.  Banks may not be necessary in the future.  Also something to note, the entire currency could be centrally located, perhaps across the globe, with a single currency.  Asset tracking could become a reality.  Transparency.  Audit trail.  And Centrally controlled.  The ramifications of this technology could alter the planet, a real eye opener for sure.  Don't take my word for it.  Have a watch:




The times, they are a changin'.

9/18/2017

Is Ride Share Business Model a Fad or Here to Stay

Driving people from place to place, using your own vehicle has many positive aspects.

  • You become self employed.  Tax write offs.
  • You set your own hours.  Work any time you feel like it.  Day or night.
  • You get to meet interesting people.
  • You get to see the world, not behind a cubicle, but right in front of your eyes.
  • You can work part time or full time, you choose.
  • You get tips plus paycheck.
  • Revenue stream

There's a lot of freedom.

On the downside, you're car takes a real beating. 

  • Wrack up miles faster than one can image.
  • What does that do to the resale value.
  • What about frequent car repair.
  • Increases chance of auto wrecks, possible insurance increases.
  • No guarantee of riders.
  • What if your passenger takes you an hour away with no return fairs.
  • Increase competition as more drivers ready to pick up next ride.
  • Not sure if they get benefits, like days off, sick days, insurance, but I doubt it, freelancers.
I wonder what happens when the drivers require a new vehicle at the end of the year.  Wouldn't the cost of a new vehicle dip in to the profit stream worked so hard to get during the year.

Is ride sharing a good business model.  Definitely for the owners of the company.  Not much overhead, as they don't own the vehicles, the workers are not employees and the routes are selected via a derivative of machines learning and artificial intelligence.

Good model for the workers?  Perhaps.  Quick and easy revenue stream.  Just keep in mind, there are downstream hidden costs.

Will this model trickle into other segments of society?  Streamlined business model, keep the profits, place the burden of costs on the free-lanced workers.  If the goal in increased profits and decreased costs, seems like a no-brainer.

And there you have it~!

9/15/2017

Universal Global Unique Identifier or U-GUID

The Universe is based on energy.  That energy aligns and clusters and disperses.  Similar to a swarm of bees.

We align and cluster ourselves.  Pick a favorite sports team, we align, unite, form a bond, have an agreed upon adversary.  Pick a Country.  State.  City.  Political Party.

We cluster and form an energy unit.  Aligned on beliefs and values.

Yet some of our cluster mates, may actually be antagonist in another cluster.  We may align with USA, but we like the Yankees and you like the Mets.  Another cluster may like the Mets, yet they are Republicans and Pro-Choice while you are Democrat and Pro-Union.


We partake in different overlapping clusters and form alliances accordingly so long as the group does not further cluster within the outer cluster.

Try for a minute, to remove all your Clusters.  No favorite team, political party, country, religion.  What have you.  

That's your true self, without the layer upon layer of aligned clusters.  Now that you've found the core, you have past memories and experiences that make you unique.

Now remove those.  At this point, how much different are you than anybody else?


Do you have a unique signature that defines you specifically, a Universal Global Unique Identifier or U-GUID?  If so, does that stay with you after you depart this planet?  Has your U-GUID been to other realities in other parts of the Universe?  Has it been to this planet prior and will it return, where does it go when it leaves?  Are there different levels of U-GUID's as far as Spiritual Growth.

Is the U-GUID your "soul"?  An energy field that never dies?  Simply transforms dimensions and realities and learns with each new life.  If so, how important is that new big screen plasma television or Ferrari, material pleasures or distractions from our true calling?

That's the question on the table.  Were we placed here on Earth at this point and time to further our evolutionary growth of Spirit, to experience physical reality in order to learn and grow?

And so it goes~!

9/10/2017

Mapping Attributes Across the Galaxy using Big Data

Let's say you had an IoT device.  It would contain a variety of information including latitude and longitude as two attributes that change over time.

Let's say you could record every location along with time, to give a complete record.

Let's take this a step further, and somehow embed a sensor IoT in every atom in our Galaxy.  We'd store the Id, Lat, Lon along with time stamp for every atom at every instance of time down to the microsecond.

Then we could use mapping technology to replay specific intervals in time.  That would pick up a handful of attributes, but what if we could store others.  And record in a big data repository.  Perhaps a Graph database to see how different atoms aligned or repelled with other Atoms, discover the secret to 'dark mass' in the Universe.

If anyone has free time and would like some extra credit, please feel free to begin this project.  It would be much appreciated.

And so it goes~!

6/06/2017

Troubleshooting Hadoop After Configuration Changes

Here's a blog post from a project Q4 last year working in Hadoop.  Troubleshooting took some effort documented the process.

Have a read: https://www.linkedin.com/pulse/troubleshooting-hadoop-after-configuration-changes-jon-bloom

5/10/2017

Format Currency, Percentage, Date, and Mask SSN in Hive SQL

In Hive SQL, to format Currency, Percentage and Date, use the following SQL:


SELECT
    CONCAT('$',format_number(COALESCE(12345.6789,0),2))                 AS `Sample Currency Format`,
    CONCAT(format_number(COALESCE(98.6543,0),2),'%')                    AS `Sample Currency Format`,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '01-01-yyyy') AS
    `Sample Date Format`,
    CONCAT('****-**',SUBSTR("123456789", LENGTH("123456789")-2, 3)) AS MaskedSSN








You could also apply the Round function if needed.


Happy coding~!

4/21/2017

Compare Option in DBVisualizer

I've been working with a tool DBVisualizer.  This tool allows development against a variety of databases.  I happen to be working against AWS Data Lake Hive tables.  After you configure the Connection, you can go to the tables tree view, expand, highlight all objects, right click, script objects, to a file or window.  I selected the "Create" button and ran it.  544 objects to create, runs for a while.  So we have a file containing all table objects.


From there, we connect to another environment.  Assuming I had access to the Production Environment, I would perform same steps, generate a second file.


From there, you can click the Tools dropdown, Compare option, select your 2 new files, and see the differences. 




Now to view the differences, green indicates new:




This is a handy feature when developing, as sometimes the objects do differ between environments, and it's no fun to deploy a report that's been fully validated, only to have it fail in production, with different objects.  Missing Views or Tables, fields renamed or missing.


And there you have it~!

4/20/2017

Hive SQL Date Functions Cheat Sheet


Since I've been working with Hive SQL lately, against AWS Data Lake, assembled a quick list of key Date functions to speed up development:





SELECT
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), 'MM-dd-yyyy') AS TodaysDate
    ,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1)), 'yyyy-MM-dd'),
    'MM-dd-yyyy') AS FirstDayThisMonth ,
    from_unixtime(unix_timestamp(LAST_DAY(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1))),
    'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayThisMonth,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY (CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MM-01-yyyy') AS FirstDayPriorMonth,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)+1)) , 'yyyy-MM-dd'),
    'MM-dd-yyyy')                                                              AS LastDayPriorMonth,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '01-01-yyyy') AS
    FirstDayThisYear,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '12-31-yyyy') AS
    LastDayThisYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-01-01'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS FirstDayPriorYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-12-31'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayPriorYear,
    DATE_ADD(CURRENT_DATE, -90)                         TodayMinus90Days,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY(CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MMM')                                                                      AS PriorMonth3Char,
    DATEDIFF(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())), TO_DATE(CSI.LOSSDT)) <90 as="" br="">    CheckForXDaysAgoTrueFalse
FROM
    eis_app.CLAIMSSUMMARYINFO CSI limit 1

















3/19/2017

The Data is Priceless

We hear the drum that data is the new oil.

IBM owns the weather channel.  Surely those weather points are valuable.

Microsoft owns LinkedIn.  Sure that data is valuable.  Just about every person that is employed is on LinkedIn.  With their complete work history, timelines, places, job descriptions.  How much could the data alone be worth.  Priceless.

What are some other data points that could be purchased?  That's what investors should focus on.  That data is worth more than diamonds, oil, land.

In my humble opinion.

3/14/2017

AWS Data Lake Hadoop Hive with DBVisualizer Project

About midway through the 2nd week of an 8 week project.  I'm working for a large insurance company located in Downtown Boston.  What technologies am I working on for this project?  I work on Operational Reports for the Actuarial department.  They have a source database, a team that gets the data into AWS Data Lake, Hadoop Hive tables.  We connect using an IDE called DBVisualizer and write custom SQL statements.  Also some Power BI and Tableau development. 

I spent some time researching Hive optimization techniques.  They have partitioning, bucketing, indexing, writing better SQL code, but they also have other options.  They recommend using Sort By rather than Order by, specify the order of your Group By fields, avoid nested Sub-Queries, use Between rather than <= and >=.  

Found a few good links I read:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/optimize-joins.html

http://stackoverflow.com/questions/32370033/hive-join-optimization

https://www.justanalytics.com/blog/hive-tez-query-optimization

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query

https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

https://www.justanalytics.com/blog/hive-tez-sql-query-optimization-best-practices

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_performance_tuning/content/ch_query_optimization_hive.html

Basically its full life cycle report development.  Gather specs, map the fields, write the queries, validate the data with the Business, deploy to production, document, maintain and enhance.   I've worked for an Insurance company before, so I understand the basic concepts such as Inforce, Written Premium, Earned Premium, Claim Payments, etc. 

I do enjoy working in different regions with different clients, people, projects, challenges, scenery and weather.  I guess that's one good thing about consulting, never the same day twice.

And there you have it~!

3/01/2017

Getting Started with Docker

Microsoft now offers SQL Server on Linux.  Now that's big news.  Here's a blog post from the team:  https://blogs.microsoft.com/blog/2016/03/07/announcing-sql-server-on-linux/#sm.00016g1jw81e4bdoku6pmahks7tll

I read this link that has a download available for Public Preview:

https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux



The first step, is to install Dockers for Windows 10 using this URL:  https://docs.docker.com/docker-for-windows/install/


I clicked the Stable channel, downloaded file, ran the install.


Install complete!



Docker has started...


In the task menu, there is a whale, right click to see version and settings:


There are several settings on this page which is easy to use and is similar to the Hyper-V Settings I've used in the past.

From the Advanced tab, I set the Memory to 2816, clicked apply, Docker resets.  As a note, I originally select 4096 and it threw an error insufficient memory.


It sets a default sub-net address, sub-net mask and you can modify the DNS server if needed:


Following the steps from this post:  https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-docker we open our trusty Command Prompt, we check the Docker Version to verify it installed correctly (you can also use Power Shell):


Still within Command Prompt, we initiate the Pull request:


Downloading bits:


Extracting:

 
 
Completed, typed in > Docker info
 


Per the instructions on the website, type in:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=' -p 1433:1433 -d microsoft/mssql-server-Linux

It create a VHDX file which can be opened in Hyper-V on Widows 10:



Looking at Hyper-V, it loaded the new server as MobyLinuxVM:


From within Hyper-V, click Connect:


The VM did not load, so uninstalled Docker (stable) and downloaded the beta version.  Then initiated another pull, this time using Power Shell:


I poked around on some of the Docker blog posts and learned quite a bit.  I will use Power Shell to work on Docker going forward.

In time, I'll go back and get SQL Server working on a Docker Hyper-V VM.  Seems like a cool way to download pre-built containers, distribute and maintain images.

Thanks for reading~!

2/28/2017

What is your Most Valued Asset

What is your greatest asset?  Most financial advisors will tell you your home is your most valued asset.  Or your cars.  Or airplanes.  Or portfolio.

I'd say those are indeed great assets.  However, incorrect.

Your most important asset is your mind. 

It's completely expandable and elastic.  Our brains have no limits.  We use a fraction of our brain capacity for the duration of lives.

If we could visualize a person's brain, some might be 500 pounds overweight.  Because they are fed garbage or never used.  Similar to eating junk food, we feed our brains hours of television, never read books or attempt to learn new skills.

Yet other's may be similar to a high powered racecar.  Because they are exercised and finely tuned.  Like the award winning body sculpted Arnold Schwarzenegger.

The key takeaway, each of us has a brain.  It's our choice how we develop it.  We can exercise our minds by learning a musical instrument or foreign language.

The task of memorizing facts is admirable, yet not required for many occupations.  I'd suggest most jobs could be learned within a few weeks and vary slightly over time.  People get into the groove and deathly afraid of change.  How beneficial is that for the mind?

Once you determine that your brain is your best asset, and you take steps to develop, grow and maintain, you'll soon realize that the restrictions you face in life are self imposed.

If you depend on teachers, family or life circumstances for your outcome in life, you become dependent on the system.  And blame everyone within distance for your troubles.

Sorry to break the bad news.  You determine the outcome in life.  And it starts with training the mind.  Because the mind is your most valued asset.  Learning is the key to training the mind.  And there are no limits on learning.  The more you learn, the more in demand you become.  The more in demand, the more opportunities.  The more opportunities, the more freedom.  It also becomes apparent that the more in demand you become, the more salary your earn.

With more money, sure you can purchase a home, cars or airplane. Yet it's the mind that builds the foundation.

So are you going to feed your brain junk food.  Or train it to change your life?

Your move.

In Hot Pursuit of Artificial General Intelligence

Artificial Intelligence is making strides in the world today.  AI is baked into everyday web sites to predict, classify, cluster and churn through  data.  The thing to remember is this.  Today's AI is considered "weak". 

Computers are not self aware, they are not living beings, they do not have personalities.  This type of AI is considered "general" or AGI.

The experts know the truth of the matter and that is, AGI is a long way off, if not impossible.

The reasons are many.  We are attempting to mimic the human brain.  Nobody really understands the true under workings of the human brain.  And by the way, mimic male or female brain, nobody seems to know.

Is personality based on DNA and genes handed down by generation or is personality derived from culture.  No concrete answers.

Believe it or not, Humans may not be the most intelligent species in the Universe.  Attempting to mimic the Human thought process, may not be a high enough achievement, sorry to break the news.

If AGI beings were developed and were to interact with Humans, they will need to understand our dynamics.  As in Humans tend to not base lives on logic.  Humans are capable of various behaviors such as envy, greed, revenge, favoritism.  It may be difficult for AGI beings to understand how we tick, and may find our behavior quite bizarre.

Your behavior does not computer.  Input parameters do not align with expected output.  The Human processors must have a bug or two and are a few versions past due on their service packs.

Corporations are manmade inventions that possess certain characteristics of Humans.  So too could AGI beings.  Autonomous creations that mimic humans, yet lack accountability.  Do we have guardrails in place to handle downstream anomalies that may arise.

The wheel was a great invention.  As was electricity.  AGI is a technology ready for mainstream and has been for 50+ years.  Like any tool, it can benefit or hinder mankind.

We've already stated that Humans tend to behave in patterns that defy logic.  And perhaps the main concern is who controls the tools and what are the intentions.

Either way you slice it, the pursuit of AGI will continue until solved.

And so it goes~!

2/18/2017

Tableau Parameter Modification Using Calculated Fields and Filters with Conditions

Working on a Tableau project recently, we discovered a bug regarding Parameters.  The base Dashboard had a Parameter for Year and Quarter.  Each populated based on a list:

Year:
2016
2017
2018
2019
2020

Quarter:
Q1
Q2
Q3
Q4

Looking through the code, the Year filter was pointing to the incorrect database field from Salesforce.  Thus, when the Parameters changed, the reports displayed incorrect data.

A Calculated Field set the Year field to the passed in Parameter:



In order to change the Parameters, I located the correct field to filter on.  Then created a Calculated Field to obtain the Year as follows:




Then added the new Calculated Field to the Filters, opened and under "Condition" added the following logic to filter the data set where the Year part = the passed in Parameter "Year":




For the Quarter field, very similar.  Created a new Calculated Field for Quarter, to obtain the Quarter fragment of the date field:






Next, add new Calculated Field to the Filters pane, open to "Condition" and add the following logic:




Essentially, it strips out the "Q" character from Q1, Q2, Q3, Q4 passed in value from Parameter [Quarter 1] which the user selects, converts to an Integer, and filters the Quarter date field where the Quarter is less than or equal to the Parameter value minus the "Q".

So if user selects Q3, strip out the "Q" resulting in 3 as Int, and filter date Quarter field for 1, 2 or 3 since that is equal to or less than 3.  It excludes Quarter 4 because 4 is higher than 3.

And these code modifications resulting in accurate results in Tableau Dashboard when user changes either the [Year] or [Quarter 1] parameters.

Hope that helps in your Tableau development.  Thanks for reading~!

2/15/2017

Intro to Statistical Learning Notes from Online Course

When thinking about machine learning there's a lot going on.
 
Inference attempts to understand the relationship between the predictors and the results.  If we send in a value of 10 for parameter 1, the result is something.  If we send in a value of 20, the results is something else.
 
Prediction attempts to fit the model such that the relationship between the predictors and the results can identify future accurate results.
 
Both of these are included in Supervised Learning which typically have a Predictor and a ResponseLinear Regression and Logistic Regression are classic version.  Newer techniques include GAM, Bootstrapping and Support Vector Machines.
 
The alternate approach is known as Unsupervised Learning.  UL also has Predictors but no Response.  Basically it attempts to organize the data into buckets to understand relationships and patterns known as Clustering or Cluster Analysis
 
There is actually another approach known as Semi-Structured Learning which combines the two.
 
Another point of interest is the differentiation between Flexibility and Interpretability
 
Some methods are restrictive and inflexible, yet they are easy to interpret such as Least Squares or Lasso.  These are typically associate with Inference and Linear Models.
 
Opposite methods are flexible like thin plate splines yet more difficult to interpret.  Flexible models are associate with Splines and Boosting methods and seeing the relationship between predictor and results is rather difficult.
 
Parametric Methods have a two step approach: 1. assume the relationship of data points is linear 2. apply a procedure to fit or train the model using training data.  One possible affect is overfitting the model when the results are too accurate and they account for the noise or errors to closely.
 
Non-Parametric Methods attempts to estimate to the data points as close as possible and typically performs better with more data.  Thin Plate Spline is one method for fitting the data.  It too can be overfit.
 
Another topic is Quantitative and Qualitative.
 
Quantitative involves numerical values as it has the word "quantit" to help remember.  These are Regression problems such as Least Squares Linear Regression.
 
Qualitative has Classes or Categories.  These classes are sometimes binary as in True/False, Yes/No, Male/Female, or Group1, Group2 or Group3.  These are Classification problems such as Logistic Regression.
 
The main takeaway is there is no one silver bullet to apply to every data set.  It's the responsibility of the analyst to decide which approach works best for a particular situation as results can vary.
 
For Regression problems the Mean Squared Error or MSE can determine the quality of the results.  It's useful for testing data rather than the training data.  The lower the MSE the better as fewer errors translates to more accuracy.
 
There are Qualitative which are the functions and parameters part of the equation.  The Irreducible Errors which are the downstream errors known as Epsilon.  Reducible Errors can be tweaked, Irreducible Errors can not.
 
One way to offset the Reducible Errors is to account for Bias or Variance.  Flexible models tend to have higher variance while inflexible models tend to have lower variance.  All regression models should contain some variance or errors or result in overfitting.
 
The Bayes Classifier is associated under the Classification spectrum which is based on Conditional Probability.  The segment of the chart where the probability is exactly 50% is known as the Bayes decision boundary and the lowest possible error is called rate is termed Based error rate similar to the irreducible error.  Since the classifier is based on classes, it always chooses the largest class.  Although this method is highly accurate, it's difficult to apply in real life scenarios.
 
K-Nearest Neighbors attempts to estimate the conditional distribution and then classify the highest estimated probability.  Although a simpler method, it is fairly accurate compared to the Bayes Classifier.
 
This blog attempts to summarize the course I'm attending online from Stanford Statistical Learning.  I'm paraphrasing and identifying the key concepts in an effort to organize and remember.  I use this technique to learn and self teach and in no way are these my original thoughts.  I'm reading from the assigned book for the course titled: "Springer Texts in Statistics" found here and they deserve all the credit!  The course can be found here which I highly recommend.

Stay tuned for more blog post and thanks for reading~!