4/21/2017

Compare Option in DBVisualizer

I've been working with a tool DBVisualizer.  This tool allows development against a variety of databases.  I happen to be working against AWS Data Lake Hive tables.  After you configure the Connection, you can go to the tables tree view, expand, highlight all objects, right click, script objects, to a file or window.  I selected the "Create" button and ran it.  544 objects to create, runs for a while.  So we have a file containing all table objects.


From there, we connect to another environment.  Assuming I had access to the Production Environment, I would perform same steps, generate a second file.


From there, you can click the Tools dropdown, Compare option, select your 2 new files, and see the differences. 




Now to view the differences, green indicates new:




This is a handy feature when developing, as sometimes the objects do differ between environments, and it's no fun to deploy a report that's been fully validated, only to have it fail in production, with different objects.  Missing Views or Tables, fields renamed or missing.


And there you have it~!

4/20/2017

Hive SQL Date Functions Cheat Sheet


Since I've been working with Hive SQL lately, against AWS Data Lake, assembled a quick list of key Date functions to speed up development:





SELECT
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), 'MM-dd-yyyy') AS TodaysDate
    ,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1)), 'yyyy-MM-dd'),
    'MM-dd-yyyy') AS FirstDayThisMonth ,
    from_unixtime(unix_timestamp(LAST_DAY(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)-1))),
    'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayThisMonth,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY (CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MM-01-yyyy') AS FirstDayPriorMonth,
    from_unixtime(unix_timestamp(DATE_ADD(CURRENT_DATE,-(DAY(CURRENT_DATE)+1)) , 'yyyy-MM-dd'),
    'MM-dd-yyyy')                                                              AS LastDayPriorMonth,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '01-01-yyyy') AS
    FirstDayThisYear,
    from_unixtime(unix_timestamp(TO_DATE(CURRENT_DATE) , 'yyyy-MM-dd'), '12-31-yyyy') AS
    LastDayThisYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-01-01'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS FirstDayPriorYear,
    from_unixtime(unix_timestamp(date_sub(concat(from_unixtime(unix_timestamp(), YEAR(CURRENT_DATE)
    -1), '-12-31'), 0), 'yyyy-MM-dd'), 'MM-dd-yyyy') AS LastDayPriorYear,
    DATE_ADD(CURRENT_DATE, -90)                         TodayMinus90Days,
    from_unixtime(unix_timestamp((DATE_ADD(CURRENT_DATE, -1-DAY(CURRENT_DATE))) , 'yyyy-MM-dd'),
    'MMM')                                                                      AS PriorMonth3Char,
    DATEDIFF(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())), TO_DATE(CSI.LOSSDT)) <90 as="" br="">    CheckForXDaysAgoTrueFalse
FROM
    eis_app.CLAIMSSUMMARYINFO CSI limit 1

















4/12/2017

Thanks for Reading all these Years

Here's a blog post to mark the 300,000 page views for this site.  It says the first post was back in 2010 which could be accurate, as I've done a few social media purges over the years and got rid of a bunch of stuff.  It says there're 1208 blog post on the site currently,

I originally didn't want ads on the site.  Added them as it was the thing to do.  You hear of people making $5k a month on ad revenue.  Remove 4 zeros more like it.

I grew up behind a keyboard, not sure the exact year, but it was an IBM PC original, no hard drive, just 2 floppies, color chrome monitor, dot matrix Epson printer, no mouse, and PC Dos.  Graphics weren't a thing really, we had modem's 1200 baud connect to local BBS boards.  That was way before the internet revolution, PC in every home, Smart Phone on every belt clip.

I didn't major in computers, just a few courses here and there.  Got into IT in 1996 or so.  Report writer, SQL, programmer client server.

Then web.  Then dot net.  Then Java.  Some project management.  Then Supervisor.  Then BI guy.  Then consultant.


We all have different motivators.  Some go for money.  Some go for fame.  I like to solve the problems that others can't.  Money is okay as a driver, up to a point.  If you don't enjoy the daily grind, you can't succeed.

Programming is a continuous battle of micro problems to be solved every single day, and like gravity constantly pulling us back to earth and inertia slowing us down, we get ambushed by the flood of new technology.  And more often than not, you don't get to choose the technology of your liking, at least for my career anyway.  I've had a tough time breaking into cutting edge technology, not sure why.

As far as blogging, I still enjoy it.  It's no longer fresh, where I can pump out 8 to 10 posts in an hour.  When you write, it just flows.  You don't know what you're going to write about until its on the screen and you go back and read it for the first time.  I feel people get offended by some post and I've had to filter back a lot of content, which reduces the quality, no longer writing out of spontaneity.

When it's raw, that's the best stuff.  Political correctness has removed our sense of being human and quality has gone with it.  As we inch ever closer to our robot future selves, devoid of emotions, talking monotone.  If you look to television for content generation, all the good shows happened in the past.  Same with movies, music also.  The quality was just better then.  I don't watch TV anymore nor do I go to the movies.  And I don't read the papers or listen to the radio.

There seems to be a complete drought of fresh content, that stimulates the mind, with new thoughts and ideas.  Society is stale, like a fishbowl that hasn't been cleaned.  Some days I feel like stirring it up.  Other days I seek a new bowl.  Either way, I continue writing my stories, for nickels in revenue per month, without the slightest indication that anyone reads them.  As far as the 300k blog reads, perhaps, I do know that every Saturday night, around 2am, somebody in Russia scrapes the site and reads 200 plus pages in under an hour, so that inflates the numbers, no way to block it, so who cares.

To summarize, for those who do read the blog, I appreciate it.  And for my 12th grade English teacher who thought I wasn't paying attention in the back of the class, I make a deliberate effort to put a space between "a lot" and not "alot".  I was listening the entire time, so thanks!


~JB

4/02/2017

Hadoop Project

Much to my surprise, the guy slotted for the Hadoop project left the company.  And so did the next in line.  I spoke with my wife, I wanted to work on Hadoop.  Sure I learned it 4 years prior, never got a shot at it.  I emailed the boss, sure enough, I got the project.
The first week, client offered to stand up the Hadoop cluster.  Except we learned Hadoop no longer supported on Windows operating system.  I was on vacation the 2nd week of the project.  I decided to take the laptop on vacation to the cabin and got Hadoop working on Microsoft Windows 10 Hyper-V, no easy feat.  I configured it to allow remote connection from the laptop.  After doing some research, turned out Visual Studio 2015 had 3 new components to work with Hadoop.  I played around with it and got flat files to flow t HDFS, send Pig Scripts which worked and pushed data to Hive tables.  That was all I needed to return on Week 3 of the project.  Upon return, I was brought in to assist in the troubleshooting and we reinstalled Hadoop one or two more times, finally got it stood up on Linux, Master Node with three Data Nodes.  I ported my Visual Studio 2015 source code to the server, connected with Git and the project was humming along nicely.  Data flowed from Excel, to text files, pushed from Shared Folder to HDFS, Pig script to message the data, pushed into Hive tables, then Hive ORC tables.  Then, figured out how to install Polybase on SQL Server 2016, allowing seamless flow of data from Hadoop to SQL Server using common T-SQL.  I architected the SSIS project.  Flowed the data into Master Data Service entities.  Ran the data through some custom logic to parse and clean the address.  Then called an SSIS component in c# to call an internal web service to Geo Code the Latitude and Longitude.  Then called another web service using c# to send in Lat & Lon to get Geo Tagging.  Then pushed data into SQL Server ODS tables.  However, around that time, the Hadoop cluster went down.  I troubleshot the server for hours and hours and hours, deep into the nights.  The thing about Hadoop, if it stops working, there are so many configuration files to go through and investigate.  If one file has one entry with an extra space, the entire thing stops working.  I looked through every folder and many files interrogating everything.  And took tedious notes along the way.  And searched many a website/blog in attempting to fix.   Permissions.  Config files.   User accounts.  Error logs.  Host files.  You name it.  I was told to hold off on the issue as I had to jump back on my actual assignment.  I couldn’t let it rest and would work on it after hours.  Then I saw something, in one of the webs, you could see the jobs: success, failed, killed, etc.  There it was.  After picking through the logs, I saw the job was still running.  And so were 150 other jobs.  Which I initiated during my development.  The server got backed up.  Found a site on how to kill the process.  And proceeded to execute a command to kill 1 process at a time, for 150+ processes.  Restarted  Ambari, bam!  I could execute Hive with no error on the command line.  Then flowed some Pig scripts through Visual Studio 2015 using the WebHCat server and sure enough, back in business.  Solved it~!
I like to solve difficult problems.  Especially the ones that others gave up on.  Those are the juicy problems that are not easy to find.  That took some meticulous troubleshooting for many, many hours.  I rolled off the Hadoop project before the Data Warehouse was completed but I created the SSIS files to handle the Dim and Fact tables as well as refresh the Tabular Model.   That is what you call a great project.

And there you have it~!

Get Sh#t Done!