6/29/2017

Get Sh#t Done!

I probably accomplish more before 9am than most people do in a week.  When you go to bed early, wake up early, start working as soon as your feet touch the floor, it becomes ingrained into your life patterns.  It's not a habit, it's part of life.

Most people have lawns.  In those lawns grow weeds.  Weeds are persistent and you have to tend to them on a constant basis.  If not, they grow out of control and become a problem.  An overwhelming problem such that most people don't know where to begin because the project becomes too big.  Where do I start, how do I handle the unknowns, too many variables.

Many projects never begin because of these limiting factors.  When you tend to the weeds in your life, you take an active role in the outcome, reduce stress on a daily basis, it's hard work though.

Some people tend to this like they do diets or exercise.  Moments of enthusiasm that levels off into abandonment.  Lack of motivation stems from a variety of factors, laziness, lack of immediate results, boredom, competing tasks, etc.  In the game of life, you have to decide what your priorities are.

Do I tend to the boring monotonous tasks or do I eat my cake now for immediate gratification?  From experience, those boring tasks are the substance of life, and if you ignore, they compound over time, like compounding interest.

How do you get motivated on a daily basis?  You have to enjoy your tasks.  If you don't they won't continue.  What's a task?  Bringing down the garbage pales.  Mowing the lawn.  Trimming the bushes.  Food shopping.  Errands.  Writing reports.  Timesheets.  Expense statements.  You name it.

Whistle while you work.  It's all work.  And when you are living in the moment of play, nothing is work.

Another factor is people think you have to look busy at all times or you aren't productive.  Inspiration and motivation are like storms, they come and go.  When they arrive, you can harness that energy.  When it dissipates you relax and gain your energy back.  While the energy is flowing at 100%, you can really accomplish a lot.  And not just one task.  Multiple tasks simultaneously.  This pattern happens throughout the day, every day, during the week and the weekends.

Life is to be lived, not parked in front of the television or fueling vices.  Not that vices are bad, we all have them, but moderation is the key.

Go forth, at your own pace, and get sH#t done.

Thanks for reading~!

5/24/2017

Bloom Consulting Since Year 2000

My company is called Bloom Consulting.  Although it's a Sole Proprietor, it was formed all the way back in August 2000.  Quite a long time to be a consultant.

I've had a few clients over the years, although I don't actively pursue side work.  I tend to work full time roles with benefits.
I've had classic Visual Basic gigs, classic Microsoft ASP projects, Visual Source Safe integration, SSIS, SSAS, Data Warehousing as well as Crystal Reports projects.

I've had small project offers that I passed along to other developers.

I've worked on projects to gain new experience as well as extra cash flow to throw at the mortgage.  Some projects were over a year in duration and had one for 16 hours, to figure out why some Crystal Reports were slow.

How long have I been consulting?  Over 17 years.

How long have I been programming?  35 years. 3/4 of my life.

5/10/2017

Format Currency, Percentage, Date, and Mask SSN in Hive SQL

In Hive SQL, to format Currency, Percentage and Date, use the following SQL:


SELECT
    CONCAT('$',format_number(COALESCE(12345.6789,0),2))                 AS `Sample Currency Format`,
    CONCAT(format_number(COALESCE(98.6543,0),2),'%')                    AS `Sample Currency Format`,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '01-01-yyyy') AS
    `Sample Date Format`,
    CONCAT('****-**',SUBSTR("123456789", LENGTH("123456789")-2, 3)) AS MaskedSSN








You could also apply the Round function if needed.


Happy coding~!

4/21/2017

Compare Option in DBVisualizer

I've been working with a tool DBVisualizer.  This tool allows development against a variety of databases.  I happen to be working against AWS Data Lake Hive tables.  After you configure the Connection, you can go to the tables tree view, expand, highlight all objects, right click, script objects, to a file or window.  I selected the "Create" button and ran it.  544 objects to create, runs for a while.  So we have a file containing all table objects.


From there, we connect to another environment.  Assuming I had access to the Production Environment, I would perform same steps, generate a second file.


From there, you can click the Tools dropdown, Compare option, select your 2 new files, and see the differences. 




Now to view the differences, green indicates new:




This is a handy feature when developing, as sometimes the objects do differ between environments, and it's no fun to deploy a report that's been fully validated, only to have it fail in production, with different objects.  Missing Views or Tables, fields renamed or missing.


And there you have it~!

4/20/2017

Hive SQL Date Functions Cheat Sheet


Since I've been working with Hive SQL lately, against AWS Data Lake, assembled a quick list of key Date functions to speed up development:





SELECT
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), 'MM-dd-yyyy') AS TodaysDate
    ,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1)), 'yyyy-MM-dd'),
    'MM-dd-yyyy') AS FirstDayThisMonth ,
    from_unixtime(unix_timestamp(LAST_DAY(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1))),
    'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayThisMonth,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY (CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MM-01-yyyy') AS FirstDayPriorMonth,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)+1)) , 'yyyy-MM-dd'),
    'MM-dd-yyyy')                                                              AS LastDayPriorMonth,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '01-01-yyyy') AS
    FirstDayThisYear,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '12-31-yyyy') AS
    LastDayThisYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-01-01'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS FirstDayPriorYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-12-31'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayPriorYear,
    DATE_ADD(CURRENT_DATE, -90)                         TodayMinus90Days,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY(CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MMM')                                                                      AS PriorMonth3Char,
    DATEDIFF(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())), TO_DATE(CSI.LOSSDT)) <90 as="" br="">    CheckForXDaysAgoTrueFalse
FROM
    eis_app.CLAIMSSUMMARYINFO CSI limit 1

















4/12/2017

Thanks for Reading all these Years

Here's a blog post to mark the 300,000 page views for this site.  It says the first post was back in 2010 which could be accurate, as I've done a few social media purges over the years and got rid of a bunch of stuff.  It says there're 1208 blog post on the site currently,

I originally didn't want ads on the site.  Added them as it was the thing to do.  You hear of people making $5k a month on ad revenue.  Remove 4 zeros more like it.

I grew up behind a keyboard, not sure the exact year, but it was an IBM PC original, no hard drive, just 2 floppies, color chrome monitor, dot matrix Epson printer, no mouse, and PC Dos.  Graphics weren't a thing really, we had modem's 1200 baud connect to local BBS boards.  That was way before the internet revolution, PC in every home, Smart Phone on every belt clip.

I didn't major in computers, just a few courses here and there.  Got into IT in 1996 or so.  Report writer, SQL, programmer client server.

Then web.  Then dot net.  Then Java.  Some project management.  Then Supervisor.  Then BI guy.  Then consultant.


We all have different motivators.  Some go for money.  Some go for fame.  I like to solve the problems that others can't.  Money is okay as a driver, up to a point.  If you don't enjoy the daily grind, you can't succeed.

Programming is a continuous battle of micro problems to be solved every single day, and like gravity constantly pulling us back to earth and inertia slowing us down, we get ambushed by the flood of new technology.  And more often than not, you don't get to choose the technology of your liking, at least for my career anyway.  I've had a tough time breaking into cutting edge technology, not sure why.

As far as blogging, I still enjoy it.  It's no longer fresh, where I can pump out 8 to 10 posts in an hour.  When you write, it just flows.  You don't know what you're going to write about until its on the screen and you go back and read it for the first time.  I feel people get offended by some post and I've had to filter back a lot of content, which reduces the quality, no longer writing out of spontaneity.

When it's raw, that's the best stuff.  Political correctness has removed our sense of being human and quality has gone with it.  As we inch ever closer to our robot future selves, devoid of emotions, talking monotone.  If you look to television for content generation, all the good shows happened in the past.  Same with movies, music also.  The quality was just better then.  I don't watch TV anymore nor do I go to the movies.  And I don't read the papers or listen to the radio.

There seems to be a complete drought of fresh content, that stimulates the mind, with new thoughts and ideas.  Society is stale, like a fishbowl that hasn't been cleaned.  Some days I feel like stirring it up.  Other days I seek a new bowl.  Either way, I continue writing my stories, for nickels in revenue per month, without the slightest indication that anyone reads them.  As far as the 300k blog reads, perhaps, I do know that every Saturday night, around 2am, somebody in Russia scrapes the site and reads 200 plus pages in under an hour, so that inflates the numbers, no way to block it, so who cares.

To summarize, for those who do read the blog, I appreciate it.  And for my 12th grade English teacher who thought I wasn't paying attention in the back of the class, I make a deliberate effort to put a space between "a lot" and not "alot".  I was listening the entire time, so thanks!


~JB

4/02/2017

Hadoop Project

Much to my surprise, the guy slotted for the Hadoop project left the company.  And so did the next in line.  I spoke with my wife, I wanted to work on Hadoop.  Sure I learned it 4 years prior, never got a shot at it.  I emailed the boss, sure enough, I got the project.
The first week, client offered to stand up the Hadoop cluster.  Except we learned Hadoop no longer supported on Windows operating system.  I was on vacation the 2nd week of the project.  I decided to take the laptop on vacation to the cabin and got Hadoop working on Microsoft Windows 10 Hyper-V, no easy feat.  I configured it to allow remote connection from the laptop.  After doing some research, turned out Visual Studio 2015 had 3 new components to work with Hadoop.  I played around with it and got flat files to flow t HDFS, send Pig Scripts which worked and pushed data to Hive tables.  That was all I needed to return on Week 3 of the project.  Upon return, I was brought in to assist in the troubleshooting and we reinstalled Hadoop one or two more times, finally got it stood up on Linux, Master Node with three Data Nodes.  I ported my Visual Studio 2015 source code to the server, connected with Git and the project was humming along nicely.  Data flowed from Excel, to text files, pushed from Shared Folder to HDFS, Pig script to message the data, pushed into Hive tables, then Hive ORC tables.  Then, figured out how to install Polybase on SQL Server 2016, allowing seamless flow of data from Hadoop to SQL Server using common T-SQL.  I architected the SSIS project.  Flowed the data into Master Data Service entities.  Ran the data through some custom logic to parse and clean the address.  Then called an SSIS component in c# to call an internal web service to Geo Code the Latitude and Longitude.  Then called another web service using c# to send in Lat & Lon to get Geo Tagging.  Then pushed data into SQL Server ODS tables.  However, around that time, the Hadoop cluster went down.  I troubleshot the server for hours and hours and hours, deep into the nights.  The thing about Hadoop, if it stops working, there are so many configuration files to go through and investigate.  If one file has one entry with an extra space, the entire thing stops working.  I looked through every folder and many files interrogating everything.  And took tedious notes along the way.  And searched many a website/blog in attempting to fix.   Permissions.  Config files.   User accounts.  Error logs.  Host files.  You name it.  I was told to hold off on the issue as I had to jump back on my actual assignment.  I couldn’t let it rest and would work on it after hours.  Then I saw something, in one of the webs, you could see the jobs: success, failed, killed, etc.  There it was.  After picking through the logs, I saw the job was still running.  And so were 150 other jobs.  Which I initiated during my development.  The server got backed up.  Found a site on how to kill the process.  And proceeded to execute a command to kill 1 process at a time, for 150+ processes.  Restarted  Ambari, bam!  I could execute Hive with no error on the command line.  Then flowed some Pig scripts through Visual Studio 2015 using the WebHCat server and sure enough, back in business.  Solved it~!
I like to solve difficult problems.  Especially the ones that others gave up on.  Those are the juicy problems that are not easy to find.  That took some meticulous troubleshooting for many, many hours.  I rolled off the Hadoop project before the Data Warehouse was completed but I created the SSIS files to handle the Dim and Fact tables as well as refresh the Tabular Model.   That is what you call a great project.

And there you have it~!

3/26/2017

Will automation take people's jobs?

When you drive through a toll booth, there's nobody waiting to take your money, make change, small talk.  Just a bunch of empty booths.  With a sign indicating your charge will be though your license plate.  A mysterious picture scanning your plate, determine the charge, finding its way to your mailbox.  Or pre-purchased account on your windshield.

I drove to the ATM, took out some cash.  No teller necessary.  When I called the customer service,  a friendly voice instructing which buttons to push, authorize my credentials, perform the transaction.

Will automation take people's jobs?

That's a good question.

One possible theory, artificial intelligence will wipe out humanity, or put us in cages at the zoo, or will outperform humans to such a degree that we become obsolete.  P
eople speak of a new form of socialism or guaranteed basic income where people will enjoy luxury activities in their idle time.

Some question comes to mind, 


  • Do you think the banks will forgive people''s outstanding debt.
  • Who will pay for people's healthcare
  • Who will pay for people food.
  • Who will pay for people housing.
  • Why would people reproduce if their offspring have no chance of employment.
  • Will people enter debtors prison for unpaid debt.

I don't think we know at this point.  Let's examine some current trends.

  • We know many jobs have already been automated.
  • We know that prices are going up sharply for basic goods and services
  • We know that many incomes are flat or decreasing.
  • We know that job competition is stiff for low wage earners.
  • We know that some high paying jobs have been pushed offshore.
  • We know people are working more hours.
  • We know people are working well past retirement age.
  • We know average household debt is approximately $12,000.
  • We know many households are upside down on their mortgages and many have entered the foreclosure process.
  • We know that student loan debt has risen substantially and is longer candidate to write off during bankruptcy.
  • We know the cost of healthcare is rising and there's a penalty for not having any.

Let's take out our magic crystal ball and attempt to synthesize these observations.

Are the current trends benefiting society as a whole.

If we circle back to automation and artificial intelligence, will they have an impact on society.  


  • Will somebody step in and subsidize payments for a good chunk of society and if so, how will that get financed.
  • What will people do 24 hours a day with no chance of employment ever.  
  • How will the logistic take place to house and feed and care for all these unemployed people, forever.
  • With less money flowing through the economy, what impact will that have to business'.

This is a tricky sticky subject and I'm surely no expert in the field.  I'm just asking basic questions on possible scenarios and how they could play out.  And pointing out some trends and food for thought to see which way we're heading.

One thing to keep in mind, if the purpose of a business is to maximize profit and decrease costs, and automation and Artificial Intelligence are definite ways to accomplish those, what impact will that have on the economy, humanity and the planet.  Are there any safety nets in place or in the works to ensure a soft landing if automation was to cause ripple effects.  Will humanity continue it's trend of upward mobility, from the jungle, into the grasslands, onto farms, into cities.

Perhaps time will tell.

3/19/2017

The Data is Priceless

We hear the drum that data is the new oil.

IBM owns the weather channel.  Surely those weather points are valuable.

Microsoft owns LinkedIn.  Sure that data is valuable.  Just about every person that is employed is on LinkedIn.  With their complete work history, timelines, places, job descriptions.  How much could the data alone be worth.  Priceless.

What are some other data points that could be purchased?  That's what investors should focus on.  That data is worth more than diamonds, oil, land.

In my humble opinion.

3/18/2017

Quick Ideas for New Software Apps

I was thinking of some apps for the smartphone.

Like a diet app, when it determines your location is a bakery or fast food joint, it send an electronic zap to your phone.  Call it "Shock Diet".

Here's a handy feature, have a sensor in your mailbox, sends notification when mail is delivered.  How many times have you walked to the mailbox, opened the box, nothing. 5 x in a day.  That would save time and money.  But then again, surprised we still have snail mail.  Call it "MailboxAlert".

How about an app that can scan your food, tell you how many calories, how many sit ups you'll need to do to work it off, how many days it will cut short your life.  Call it "DietaryGenie".

How about an app so when you fly, it tells you what city/state/providence you are flying over at that exact moment.  Call it "FlyOverU".

How about an app that tells you where things are in a store.  Have you ever walked around a giant box store, wondering aimlessly, with no sales people in sight.  Call it "SalesPersonGhostTown".

How about an app that does your waiting on hold for you.  We are experience longer than usual wait times, your call should be handled in a week from Tuesday.  Call it "IfWeActuallyCared We'dHireSufficientStaff".

How about an app that rates the office staff of a place you visit.  You know the ones that could care less who you are or why you're there, that make you wait for long periods while the surf their smart phones.  Call it "TooLazyForARealJob".

How about an app that monitor's the free items at food places.  You know the people who take 25 sugar packs to stock up at home, or salt, napkins, creamers.  How about an app that limits these petty thieves.  Call it "Ain'tStealingHereNoMore".

That's all I can think of.  I should have those apps completed by lunchtime.

Thanks for reading~!

9 Things to a Happier Life

Get rid of cable.  First of all, it ain't cheap.  Second, it turns the mind to mush.  Third, think of all the better things you could be doing.  Get a part time job, read a book, walk in nature, fix the house, anything is better.

Get rid of your land line.  With all the negative aspects to smart phones, they really do assist in your day to day activities.  Of course you can talk on them, get voicemail, read emails, find directions, etc.  I don't know how we existed without them.  Why would you need a land line, other than to host a fax machine.  And get rid of you musket and covered wagons too.

Outsource the stuff you don't want to do.  Sure it may costs money, if you can free up your time to do things you enjoy, it's worth it.

Do the stuff you've been putting off.  Nobody's getting younger.  There's only so much time.  People get health issues.  Fact of life.  Don't wait until the end, there won't be time.

Enjoy the ride.  Life is a marathon.  There is no imaginary place off in the future, where everything magically becomes great.  That theme generates just about every marketing effort since the beginning of time.  That illusion will cost you.  Life may not be perfect, but it's all we have.  You don't have to stop and smell the roses, enjoy the treasures of life in the small details of everyday living.

These are the good old days.  Guess what, taxes will probably go up for the rest of our lives.  Aches and pains become more frequent.  People will not always be around.  Life has been a struggle going back to the days in the jungle.  Life is what it is.  It will never be perfect.  Best enjoy life and be thankful for what you have.  You may not always have it.

Do nice things.  It's easy to turn a blind eye.  Throw that rubbish on street.  Cut someone off in traffic.  Doesn't take much skill or effort to be self absorbed.  Give someone a compliment.  Do more than what's asked.  Show up early, stay late.  The world could use some kindness.

Get a dog.  Dogs are God's present to mankind.  They are loyal, caring and furry.  We grew up with cats.  Cats are great, but they aren't dogs.  Dogs are tremendous.  We have 3 spoiled dogs.

Get rid of fear.  Some people say there are only two forces in the Universe, Love and Fear.  Fear is a lower vibration.  We find it everywhere.  Television, news, you name it.  People are afraid of just about everything, including fear itself.  Fear is mostly an illusion.  Except when you get that feeling that a tiger is about to eat you while drinking from the watering hole.  That fear is real.  It's the false fear that weighs us down.  And turns us into not so nice people.  Fear keeps people afraid.  When people are afraid, they become docile and make you old before your time.  Stay young.  Don't fear fear.  Just an illusion.

And there you have it~!

3/14/2017

AWS Data Lake Hadoop Hive with DBVisualizer Project

About midway through the 2nd week of an 8 week project.  I'm working for a large insurance company located in Downtown Boston.  What technologies am I working on for this project?  I work on Operational Reports for the Actuarial department.  They have a source database, a team that gets the data into AWS Data Lake, Hadoop Hive tables.  We connect using an IDE called DBVisualizer and write custom SQL statements.  Also some Power BI and Tableau development. 

I spent some time researching Hive optimization techniques.  They have partitioning, bucketing, indexing, writing better SQL code, but they also have other options.  They recommend using Sort By rather than Order by, specify the order of your Group By fields, avoid nested Sub-Queries, use Between rather than <= and >=.  

Found a few good links I read:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/optimize-joins.html

http://stackoverflow.com/questions/32370033/hive-join-optimization

https://www.justanalytics.com/blog/hive-tez-query-optimization

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query

https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

https://www.justanalytics.com/blog/hive-tez-sql-query-optimization-best-practices

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_performance_tuning/content/ch_query_optimization_hive.html

Basically its full life cycle report development.  Gather specs, map the fields, write the queries, validate the data with the Business, deploy to production, document, maintain and enhance.   I've worked for an Insurance company before, so I understand the basic concepts such as Inforce, Written Premium, Earned Premium, Claim Payments, etc. 

I do enjoy working in different regions with different clients, people, projects, challenges, scenery and weather.  I guess that's one good thing about consulting, never the same day twice.

And there you have it~!

3/01/2017

Getting Started with Docker

Microsoft now offers SQL Server on Linux.  Now that's big news.  Here's a blog post from the team:  https://blogs.microsoft.com/blog/2016/03/07/announcing-sql-server-on-linux/#sm.00016g1jw81e4bdoku6pmahks7tll

I read this link that has a download available for Public Preview:

https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux



The first step, is to install Dockers for Windows 10 using this URL:  https://docs.docker.com/docker-for-windows/install/


I clicked the Stable channel, downloaded file, ran the install.


Install complete!



Docker has started...


In the task menu, there is a whale, right click to see version and settings:


There are several settings on this page which is easy to use and is similar to the Hyper-V Settings I've used in the past.

From the Advanced tab, I set the Memory to 2816, clicked apply, Docker resets.  As a note, I originally select 4096 and it threw an error insufficient memory.


It sets a default sub-net address, sub-net mask and you can modify the DNS server if needed:


Following the steps from this post:  https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-docker we open our trusty Command Prompt, we check the Docker Version to verify it installed correctly (you can also use Power Shell):


Still within Command Prompt, we initiate the Pull request:


Downloading bits:


Extracting:

 
 
Completed, typed in > Docker info
 


Per the instructions on the website, type in:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=' -p 1433:1433 -d microsoft/mssql-server-Linux

It create a VHDX file which can be opened in Hyper-V on Widows 10:



Looking at Hyper-V, it loaded the new server as MobyLinuxVM:


From within Hyper-V, click Connect:


The VM did not load, so uninstalled Docker (stable) and downloaded the beta version.  Then initiated another pull, this time using Power Shell:


I poked around on some of the Docker blog posts and learned quite a bit.  I will use Power Shell to work on Docker going forward.

In time, I'll go back and get SQL Server working on a Docker Hyper-V VM.  Seems like a cool way to download pre-built containers, distribute and maintain images.

Thanks for reading~!

2/28/2017

What is your Most Valued Asset

What is your greatest asset?  Most financial advisors will tell you your home is your most valued asset.  Or your cars.  Or airplanes.  Or portfolio.

I'd say those are indeed great assets.  However, incorrect.

Your most important asset is your mind. 

It's completely expandable and elastic.  Our brains have no limits.  We use a fraction of our brain capacity for the duration of lives.

If we could visualize a person's brain, some might be 500 pounds overweight.  Because they are fed garbage or never used.  Similar to eating junk food, we feed our brains hours of television, never read books or attempt to learn new skills.

Yet other's may be similar to a high powered racecar.  Because they are exercised and finely tuned.  Like the award winning body sculpted Arnold Schwarzenegger.

The key takeaway, each of us has a brain.  It's our choice how we develop it.  We can exercise our minds by learning a musical instrument or foreign language.

The task of memorizing facts is admirable, yet not required for many occupations.  I'd suggest most jobs could be learned within a few weeks and vary slightly over time.  People get into the groove and deathly afraid of change.  How beneficial is that for the mind?

Once you determine that your brain is your best asset, and you take steps to develop, grow and maintain, you'll soon realize that the restrictions you face in life are self imposed.

If you depend on teachers, family or life circumstances for your outcome in life, you become dependent on the system.  And blame everyone within distance for your troubles.

Sorry to break the bad news.  You determine the outcome in life.  And it starts with training the mind.  Because the mind is your most valued asset.  Learning is the key to training the mind.  And there are no limits on learning.  The more you learn, the more in demand you become.  The more in demand, the more opportunities.  The more opportunities, the more freedom.  It also becomes apparent that the more in demand you become, the more salary your earn.

With more money, sure you can purchase a home, cars or airplane. Yet it's the mind that builds the foundation.

So are you going to feed your brain junk food.  Or train it to change your life?

Your move.

In Hot Pursuit of Artificial General Intelligence

Artificial Intelligence is making strides in the world today.  AI is baked into everyday web sites to predict, classify, cluster and churn through  data.  The thing to remember is this.  Today's AI is considered "weak". 

Computers are not self aware, they are not living beings, they do not have personalities.  This type of AI is considered "general" or AGI.

The experts know the truth of the matter and that is, AGI is a long way off, if not impossible.

The reasons are many.  We are attempting to mimic the human brain.  Nobody really understands the true under workings of the human brain.  And by the way, mimic male or female brain, nobody seems to know.

Is personality based on DNA and genes handed down by generation or is personality derived from culture.  No concrete answers.

Believe it or not, Humans may not be the most intelligent species in the Universe.  Attempting to mimic the Human thought process, may not be a high enough achievement, sorry to break the news.

If AGI beings were developed and were to interact with Humans, they will need to understand our dynamics.  As in Humans tend to not base lives on logic.  Humans are capable of various behaviors such as envy, greed, revenge, favoritism.  It may be difficult for AGI beings to understand how we tick, and may find our behavior quite bizarre.

Your behavior does not computer.  Input parameters do not align with expected output.  The Human processors must have a bug or two and are a few versions past due on their service packs.

Corporations are manmade inventions that possess certain characteristics of Humans.  So too could AGI beings.  Autonomous creations that mimic humans, yet lack accountability.  Do we have guardrails in place to handle downstream anomalies that may arise.

The wheel was a great invention.  As was electricity.  AGI is a technology ready for mainstream and has been for 50+ years.  Like any tool, it can benefit or hinder mankind.

We've already stated that Humans tend to behave in patterns that defy logic.  And perhaps the main concern is who controls the tools and what are the intentions.

Either way you slice it, the pursuit of AGI will continue until solved.

And so it goes~!

2/21/2017

The 10 Worst College Majors


The 10 Worst College Majors via

My Anthropology / archeology degree ranked #1.

After graduation, I worked temp jobs at minimum wage.  Luckily I was self taught programmer age 13 and clawed way into IT department.

2/18/2017

Tableau Parameter Modification Using Calculated Fields and Filters with Conditions

Working on a Tableau project recently, we discovered a bug regarding Parameters.  The base Dashboard had a Parameter for Year and Quarter.  Each populated based on a list:

Year:
2016
2017
2018
2019
2020

Quarter:
Q1
Q2
Q3
Q4

Looking through the code, the Year filter was pointing to the incorrect database field from Salesforce.  Thus, when the Parameters changed, the reports displayed incorrect data.

A Calculated Field set the Year field to the passed in Parameter:



In order to change the Parameters, I located the correct field to filter on.  Then created a Calculated Field to obtain the Year as follows:




Then added the new Calculated Field to the Filters, opened and under "Condition" added the following logic to filter the data set where the Year part = the passed in Parameter "Year":




For the Quarter field, very similar.  Created a new Calculated Field for Quarter, to obtain the Quarter fragment of the date field:






Next, add new Calculated Field to the Filters pane, open to "Condition" and add the following logic:




Essentially, it strips out the "Q" character from Q1, Q2, Q3, Q4 passed in value from Parameter [Quarter 1] which the user selects, converts to an Integer, and filters the Quarter date field where the Quarter is less than or equal to the Parameter value minus the "Q".

So if user selects Q3, strip out the "Q" resulting in 3 as Int, and filter date Quarter field for 1, 2 or 3 since that is equal to or less than 3.  It excludes Quarter 4 because 4 is higher than 3.

And these code modifications resulting in accurate results in Tableau Dashboard when user changes either the [Year] or [Quarter 1] parameters.

Hope that helps in your Tableau development.  Thanks for reading~!

2/15/2017

Intro to Statistical Learning Notes from Online Course

When thinking about machine learning there's a lot going on.
 
Inference attempts to understand the relationship between the predictors and the results.  If we send in a value of 10 for parameter 1, the result is something.  If we send in a value of 20, the results is something else.
 
Prediction attempts to fit the model such that the relationship between the predictors and the results can identify future accurate results.
 
Both of these are included in Supervised Learning which typically have a Predictor and a ResponseLinear Regression and Logistic Regression are classic version.  Newer techniques include GAM, Bootstrapping and Support Vector Machines.
 
The alternate approach is known as Unsupervised Learning.  UL also has Predictors but no Response.  Basically it attempts to organize the data into buckets to understand relationships and patterns known as Clustering or Cluster Analysis
 
There is actually another approach known as Semi-Structured Learning which combines the two.
 
Another point of interest is the differentiation between Flexibility and Interpretability
 
Some methods are restrictive and inflexible, yet they are easy to interpret such as Least Squares or Lasso.  These are typically associate with Inference and Linear Models.
 
Opposite methods are flexible like thin plate splines yet more difficult to interpret.  Flexible models are associate with Splines and Boosting methods and seeing the relationship between predictor and results is rather difficult.
 
Parametric Methods have a two step approach: 1. assume the relationship of data points is linear 2. apply a procedure to fit or train the model using training data.  One possible affect is overfitting the model when the results are too accurate and they account for the noise or errors to closely.
 
Non-Parametric Methods attempts to estimate to the data points as close as possible and typically performs better with more data.  Thin Plate Spline is one method for fitting the data.  It too can be overfit.
 
Another topic is Quantitative and Qualitative.
 
Quantitative involves numerical values as it has the word "quantit" to help remember.  These are Regression problems such as Least Squares Linear Regression.
 
Qualitative has Classes or Categories.  These classes are sometimes binary as in True/False, Yes/No, Male/Female, or Group1, Group2 or Group3.  These are Classification problems such as Logistic Regression.
 
The main takeaway is there is no one silver bullet to apply to every data set.  It's the responsibility of the analyst to decide which approach works best for a particular situation as results can vary.
 
For Regression problems the Mean Squared Error or MSE can determine the quality of the results.  It's useful for testing data rather than the training data.  The lower the MSE the better as fewer errors translates to more accuracy.
 
There are Qualitative which are the functions and parameters part of the equation.  The Irreducible Errors which are the downstream errors known as Epsilon.  Reducible Errors can be tweaked, Irreducible Errors can not.
 
One way to offset the Reducible Errors is to account for Bias or Variance.  Flexible models tend to have higher variance while inflexible models tend to have lower variance.  All regression models should contain some variance or errors or result in overfitting.
 
The Bayes Classifier is associated under the Classification spectrum which is based on Conditional Probability.  The segment of the chart where the probability is exactly 50% is known as the Bayes decision boundary and the lowest possible error is called rate is termed Based error rate similar to the irreducible error.  Since the classifier is based on classes, it always chooses the largest class.  Although this method is highly accurate, it's difficult to apply in real life scenarios.
 
K-Nearest Neighbors attempts to estimate the conditional distribution and then classify the highest estimated probability.  Although a simpler method, it is fairly accurate compared to the Bayes Classifier.
 
This blog attempts to summarize the course I'm attending online from Stanford Statistical Learning.  I'm paraphrasing and identifying the key concepts in an effort to organize and remember.  I use this technique to learn and self teach and in no way are these my original thoughts.  I'm reading from the assigned book for the course titled: "Springer Texts in Statistics" found here and they deserve all the credit!  The course can be found here which I highly recommend.

Stay tuned for more blog post and thanks for reading~!

2/14/2017

Attending Stanford Statistical Learning MOOC

I signed up to attend the Stanford Statistical Learning course.  It's free and self paced.  There's a lot to learn.

The course is structured in a series of lectures.


Each lecture, one of the two professors explains various aspects of Machine Learning.

Then you can view the Lecture Slides at your own pace.


Then the R Sessions, as this course uses the R statistical language.


And to get up to speed on the R Statistical Language there are e-books available for download and consumption.


As well as a tab to review your progress, as there are quizzes and the passing score is 50%.  As well as a Discussion tab to see previous questions and answers to course related material.

Overall, I'm enjoying the class so far. 

First, you have to get an understanding of the concepts of Machine Learning.  Linear Models, Classification, Clustering, Supervised and Unsupervised Learning. 

Second, the course uses advanced math to explain how the algorithms work.  If you aren't up to speed on advanced math, it's a bit of a challenge. 

Third, you learn the R Statistical language from reading the e-books.

Fourth, you learn to tie the Machine Learning to Math to Statistics using the R Language in your chosen IDE for the complete picture.

Suffice to say, the course is a challenge even for those working with data for many years.  Perhaps that is why Data Science is a challenge and the number of qualified technical talent is below what's needed.

With that said, I believe Machine Learning is the hot topic today, even surpassing Big Data.  When combined, it's quite an arsenal of skills.  Plus working with data files and databases.  And translating business questions to projects that generate outcomes to produce insights viewable in Data Visualizations.  And the Models generated for re-use and combined with real time calls from web applications to produce statistical probability is kind of huge.

I'm going to continue to go through the course and learn as much as possible.  Realistically, just trying to get the big picture as a foundation, then fill in the holes over time.

And there you have it~!

2/13/2017

Intro to Visual Studio 2015 R Ingesting Data Files

This blog post is second in a series on Intro to Visual Studio 2015 R Project.  In this blog we'll discuss data and how to connect to files.
 
We start off in the Visual Studio 2015 IDE.  From the R Interactive window, we enter the command:
 
getwd() to get the "working directory" of our project.
 
Once we know that, we can add our files to the folder, or we can set the working directory using the command:
 
setwd("C:\Users\jbloom\Desktop\Statistics\data")
 
Now we defined our folder, we can add our files here.
 
Back in the IDE, we enter our command to ingest the contents of our file.  In this case, we start off with a CSV or comma separated value file
 
mydata = read.csv("Income1.csv")
 
The variable "mydata" contains the contents of our file.  If we want to view the contents, we type the command:
 
fix(mydata)

This spawns a new pop up window displaying the data:


You can double click a cell, change the value by copy, paste or delete, or typing manually, close the window, reopen, you'll see the new value(s) still appear.

Now we try a .txt file:

Auto = read.table("Auto.txt", header = T, na.strings = "?")

fix(Auto)


To see it's dimensions, type:

dim(Auto)

we get back:

[1] 397 9

To see the first 4 rows:

Auto[1:4,]


To get rid of Null Values (NA):

Auto = na.omit(Auto)

[1] 392 9

You may notice 7 less rows as before due to the Null rows.

To display the field names:

names(Auto)

We get:


[1] "mpg" "cylinders" "displacement" "horsepower" "weight" "acceleration" "year" "origin"
[9] "name"    

And that sums up our Intro to Visual Studio 2015 R Ingest Data Files.
 
To reference the first blog in the series, check out:

http://www.bloomconsultingbi.com/2017/02/intro-to-visual-studio-2015-r-project.html

Thanks for reading~!

Intro to Visual Studio 2015 R Project

Last Friday I attended a team meeting where the presenter spoke on R for Microsoft.  I had seen a similar talk a few months ago during the same team meeting.  However, I didn't realize you can build R projects in Visual Studio.
 
So to get started, download the R framework for Visual Studio 2015 here.
 
Run the package, then open Visual Studio 2015 IDE, click new project, you'll a new entry for "R":
 
 
Create the project, you see the Solution was created:
 
 
In the R Interactive window, you can do Help by entering ? and then the question:
 
 
 
The Help window display information on Array:
 

 
Attempted to create a Vector, which is similar to an Array, similar to a Collection back in our Visual Basic days, unbound in that it has no predefined max limits:
 
 
Defined the variable "x" equal to a Vector "c" with values 1,2,3,45.  Then enter.  To read back the contents of Vector x, simply type x then enter.
 
Calculations can be performed using our stored Vector x, here we declare variable "y" equal to Vector x & Vector x.  By doing so, it knows the number of values are the same, and multiplies x.1 * x.1 or 1 * 1.  Then x.2 * x.2 = 4.  x.3 * x.3 = 9 and so on:
 
 
We can display the length of our Vectors with the length() command:
 
 
To List our variables type ls():
 
 
To remove our variables type rm(x,y):
 
 
Next, we can declare a Matrix, what is a Matrix, ? matrix
 
 
And to assign a new variable x to a Matrix, 2 columns and 2 rows with value 1,2,3,4;
 
 
 
Square root:
 
 
And finally, from the sample code from the Stanford Statistical Learning class I'm attending:
 
 
Next, we can plot to the screen, create 10 random x and 10 random y:
 
 
Results:
 
 
Add labels "p" for plot, "l" for lines, "b" for both:
 
 
And that sums up our Intro to Visual Studio 2015 R Project.
 
 
Thanks for reading~!


Get Sh#t Done!