9/29/2018

SSIS & U-SQL Project Using Multi-Hops to Flow Data to Azure

Building out an Azure solution, using SSIS to pull from source database, apply some data messaging with dynamic Derived Columns to swap out "\n", flow data to Azure Data Lake Store using SSIS component, pick up the CSV file using U-SQL .Csv Extractor, into U-SQL database table.

Solid concept, yet it's also automated, to build out each independent SSIS file, based on source table / field name / field type, Derived Column's also built dynamically for any Strings (Varchar  Char) fields, Data Lake Store file path dynamic, as well as U-SQL dynamically built for each table, and dynamically swap back the "\n" characters,

If you've ever looked under the hood of an SSIS package, the XML is a complex set of code, yet it has patterns, and those patterns can be duplicated.  How is this accomplished?

Using an SSIS Script Component using c#, reading from Config file and Package Variables.  It also builds unique dtsConfig file, same naming convention of package DTSX file.

Files can be set to run using series of parent packages, executed from DTExec BAT file.

I've used Visual Studio 2015 with all the required installs and Add-Ons, and then tried VS2017, and prefer the newer version.  Have ability to create U-SQL projects to write U-SQL, execute jobs from on-premise VS, view job progress in Azure Solution Explorer, and verify data on Azure side, make sure it lands in correct place, data in correct format, and lands in U-SQL, and view all previous job, both success and failure.  No ability to set breakpoints on U-SQL code in real time, so execute view, modify script, repeat.

VS integrates with source control.

One thing to keep in mind, are Nulls.  When creating tables and writing U-SQL scripts, must append "?" indicate Nulls allowed.  And when pushing data to Azure Data Lake, it transposes Nulls to "\N", so you have to handle them in the U-SQL code, although a built Null Escape function handles this.  You do need to handle special characters embedded within source raw columns such as tabs, commas, new line feeds, etc. or the column counts will be off and throw errors.

U-SQL is quite powerful, as you can nest statements, use variables, reference DLL assemblies, and have a variety of formats to work with, such as CSV, Text, JSON file source data, and variety of data formats like Avro, ORC, Parquet and Text.  You can source data from Azure Data Lake Store, Blob Storage as well as other Azure data storage types.  So it goes beyond traditional Relational Database requirements of knowing Structure ahead of time, so semi-structured and Un-Structured data is okay.

After job runs, you can view run times, description of errors and it points to the specific error with "#' sign, and keeps exact script ran for future reference, and of course, its in the Cloud so others can view what jobs are running in real time.

With U-SQL databases, you can specify Database name, apply Schema names, Table names, Column names.  Data Types are standard, although there's no direct mapping from Source system to CSV to Database to SQL Server. 

You can write your own custom functions, in c#, upload the DLL to Azure, register and reference code to handle specific scenarios.

Because you can ingest JSON, you can pull in Streaming data from Azure Blob Storage or Azure Data Lake Store, pull the data in, parse, apply rules, and send data along the trail for storage and analysis.  This is nice for IoT data, reading in lots of data from mini-packets from sensors and devices out in the field.

As you know, storing data is not terribly expensive in Azure, so archiving data from On-Premise data can save money in the long run, make the data available for Ad-Hoc reporting, as well as integrate into other data sets like Azure SQL Data Warehouse,

U-SQL is a nice addition to the Azure platform, it provides coders much flexibility, easy ramp up time and handles Unstructured data.  Basically provides a language to do the heavy lifting once data reaches the cloud.

Combing with Azure Data Factory opens up lots of opportunity, flexibility and choices.  I'm just waiting to sink teeth into building out that next.

Thanks for reading~!