3/19/2017

The Data is Priceless

We hear the drum that data is the new oil.

IBM owns the weather channel.  Surely those weather points are valuable.

Microsoft owns LinkedIn.  Sure that data is valuable.  Just about every person that is employed is on LinkedIn.  With their complete work history, timelines, places, job descriptions.  How much could the data alone be worth.  Priceless.

What are some other data points that could be purchased?  That's what investors should focus on.  That data is worth more than diamonds, oil, land.

In my humble opinion.

3/14/2017

AWS Data Lake Hadoop Hive with DBVisualizer Project

About midway through the 2nd week of an 8 week project.  I'm working for a large insurance company located in Downtown Boston.  What technologies am I working on for this project?  I work on Operational Reports for the Actuarial department.  They have a source database, a team that gets the data into AWS Data Lake, Hadoop Hive tables.  We connect using an IDE called DBVisualizer and write custom SQL statements.  Also some Power BI and Tableau development. 

I spent some time researching Hive optimization techniques.  They have partitioning, bucketing, indexing, writing better SQL code, but they also have other options.  They recommend using Sort By rather than Order by, specify the order of your Group By fields, avoid nested Sub-Queries, use Between rather than <= and >=.  

Found a few good links I read:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/optimize-joins.html

http://stackoverflow.com/questions/32370033/hive-join-optimization

https://www.justanalytics.com/blog/hive-tez-query-optimization

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query

https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

https://www.justanalytics.com/blog/hive-tez-sql-query-optimization-best-practices

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_performance_tuning/content/ch_query_optimization_hive.html

Basically its full life cycle report development.  Gather specs, map the fields, write the queries, validate the data with the Business, deploy to production, document, maintain and enhance.   I've worked for an Insurance company before, so I understand the basic concepts such as Inforce, Written Premium, Earned Premium, Claim Payments, etc. 

I do enjoy working in different regions with different clients, people, projects, challenges, scenery and weather.  I guess that's one good thing about consulting, never the same day twice.

And there you have it~!

3/01/2017

Getting Started with Docker

Microsoft now offers SQL Server on Linux.  Now that's big news.  Here's a blog post from the team:  https://blogs.microsoft.com/blog/2016/03/07/announcing-sql-server-on-linux/#sm.00016g1jw81e4bdoku6pmahks7tll

I read this link that has a download available for Public Preview:

https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux



The first step, is to install Dockers for Windows 10 using this URL:  https://docs.docker.com/docker-for-windows/install/


I clicked the Stable channel, downloaded file, ran the install.


Install complete!



Docker has started...


In the task menu, there is a whale, right click to see version and settings:


There are several settings on this page which is easy to use and is similar to the Hyper-V Settings I've used in the past.

From the Advanced tab, I set the Memory to 2816, clicked apply, Docker resets.  As a note, I originally select 4096 and it threw an error insufficient memory.


It sets a default sub-net address, sub-net mask and you can modify the DNS server if needed:


Following the steps from this post:  https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-docker we open our trusty Command Prompt, we check the Docker Version to verify it installed correctly (you can also use Power Shell):


Still within Command Prompt, we initiate the Pull request:


Downloading bits:


Extracting:

 
 
Completed, typed in > Docker info
 


Per the instructions on the website, type in:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=' -p 1433:1433 -d microsoft/mssql-server-Linux

It create a VHDX file which can be opened in Hyper-V on Widows 10:



Looking at Hyper-V, it loaded the new server as MobyLinuxVM:


From within Hyper-V, click Connect:


The VM did not load, so uninstalled Docker (stable) and downloaded the beta version.  Then initiated another pull, this time using Power Shell:


I poked around on some of the Docker blog posts and learned quite a bit.  I will use Power Shell to work on Docker going forward.

In time, I'll go back and get SQL Server working on a Docker Hyper-V VM.  Seems like a cool way to download pre-built containers, distribute and maintain images.

Thanks for reading~!