7/11/2018

Azure Data Lake Store & Azure Data Lake Analytics with SSIS

Started out with DOS programming. Still writing DOS commands today.  Because it still solves problems.

Although getting some newer technology lately.  In the Microsoft Stack, working with traditional SSIS, along with Azure Data Lake Store and Azure Data Lake Analytic.

They have some neat components to send data to Azure, like the Azure Data Lake Destination which you can trail a Source component to send data to the Data Lake.  And there's also a component to send files up to Azure, Data Lake Store File System Task.

Once the data lands in Azure Data Lake, you can move that data elsewhere.  Our project has the data flowing into Azure Data Lake Analytic U-SQL tables.  Experimented around with a separate Visual Studio project to send data in two hops, found another way of doing it today, creating a second Azure connection, to Azure Data Lake Analytics ADLA connection manager in SSIS.  You connect it to an ADLA Component, with 3 different ways to store the U-SQL query: embed, file or variable.  When the component gets called, it initiates a U-SQL job, it gets submitted, and processed, the job appears on the Azure Portal in real time, after it gets queue'd up.

The Database was created ahead of time, in another U-SQL job, where you stipulate the schema, table name, index and partition.  The SSIS U-SQL job pulls the data from the Azure Data Lake, in our case a CSV file, then ingests into the database table.  For automation, will need to truncate the table before hand.

What's nice about this solution, its all contained within a single Visual Studio project, as well as a single package, should streamline the development and maintenance.  Figured out the solution in chunks, and strung together the pieces today.  Next up, hook the data source to a real database, instead of a CSV file, apply some naming conventions, standards, error handling, for a re-usable pattern.

Although Azure has been around for a while now, and I did in fact teach a course on this stuff 2 years ago, you really don't get your feet wet until you are in the trenches, building solutions, poking around the Azure Portal settings for hours, assigning IP Address's, user roles, installing dependent files.  I will say this.  I document the crap out of everything, screenshots along the way, detailing every stop for later use and knowledge hand-off.  Surely it must be a carry over from a decade of blogging.  If you have to go through the sweat of learning, why not document the journey.  It doesn't slow things down, it reinforces the learning in real time, if you have to write about it, it stamps your brain so you can retain it better.  Nobody has time or energy after the project is complete to document, do it in real time, doesn't have to be perfect, and sometimes you go down the wrong trail, and scrap things.  You learn as you go.  I say passing tests to earn certifications is great, but self teaching and solving tough problems fast is still a better indicator of top programmers in my humble opinion.

35 years behind the keyboard now, starting with DOS and BASIC(a) on the IBM PC.  I still enjoy programming computers.