3/14/2017

AWS Data Lake Hadoop Hive with DBVisualizer Project

About midway through the 2nd week of an 8 week project.  I'm working for a large insurance company located in Downtown Boston.  What technologies am I working on for this project?  I work on Operational Reports for the Actuarial department.  They have a source database, a team that gets the data into AWS Data Lake, Hadoop Hive tables.  We connect using an IDE called DBVisualizer and write custom SQL statements.  Also some Power BI and Tableau development. 

I spent some time researching Hive optimization techniques.  They have partitioning, bucketing, indexing, writing better SQL code, but they also have other options.  They recommend using Sort By rather than Order by, specify the order of your Group By fields, avoid nested Sub-Queries, use Between rather than <= and >=.  

Found a few good links I read:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/optimize-joins.html

http://stackoverflow.com/questions/32370033/hive-join-optimization

https://www.justanalytics.com/blog/hive-tez-query-optimization

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query

https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

https://www.justanalytics.com/blog/hive-tez-sql-query-optimization-best-practices

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_performance_tuning/content/ch_query_optimization_hive.html

Basically its full life cycle report development.  Gather specs, map the fields, write the queries, validate the data with the Business, deploy to production, document, maintain and enhance.   I've worked for an Insurance company before, so I understand the basic concepts such as Inforce, Written Premium, Earned Premium, Claim Payments, etc. 

I do enjoy working in different regions with different clients, people, projects, challenges, scenery and weather.  I guess that's one good thing about consulting, never the same day twice.

And there you have it~!

We Interrupt this Broadcast