8/09/2016

My First Go at Microsoft Parallel Data Warehouse

I've had the opportunity to work on Microsoft Parallel Data Warehouse the past few months.  There are a few differences.  First, we use Visual Studio to connect to Dev, QA and Production, so no SQL Server Management Studio.  And there's a plug in for Team Foundation Server which is nice to check code into TFS.  Getting the versions to line up took a bit of effort.
 
Next, you don't right click and view table options like SSMS.  There's no select top 100, or look at thee scripts to generate Create, Alter, etc.  And when you create tables, the syntax is different.  And the reason is you are actually connecting to a network of interconnected SQL Server nodes on a hardware box.  And the data gets split and resides on specific nodes.  So when you issue a query, it's smart enough to know where the data lives, shuffles data around to get the complete data set.  All in all, it's supposed to be faster.  I haven't figured out how the backups are occurring but that's outside the scope of my assigned tasks.
 
And we pull the data from the PDW into an SQL Server Analysis cube which is partitioned and seems to load okay each day.
 
Overall, it's not too different where you are completely lost, you can sit down and get busy right away, assuming you learn the new syntax and such.  T-SQL doesn't change much, you can create temp tables and real tables on the fly, add statistics and query hints and specify the distribution options for performance increases.
 
I imagine the PDW was a pre-cursor to Hadoop with distributed nodes and parallel processing and polybase queries, as it can read data from traditional SQL Server and Hadoop together, which Azure has now.

Overall it's a solid product from Microsoft and glad to get some good experience.

No comments:

Post a Comment

We Interrupt this Broadcast