Business Intelligence is extraction of value and insight through analysis of data. Artificial Intelligence is a layer above, crunching data to process data into models, and finding insights, value and predictions. We find value from data by assembling, cleansing, positioning, joining and consolidation into logic units, applies to both BI and AI. And we do that using code. Code is simply logical commands interpreted by computer processor. Those commands are currently written by humans, assisted by machines. At some point, the humans will not be necessary. Computers will write the instructions, based on logic derived over time. It will possibly write numerous commands, in real time, to derive the "best" insight based on current knowledge. It will also infer things based on that data and past experience. At that point, the machines will be smarter than humans. Because they are fast, logical and can present multiple scenarios simultaneous, in real time. That is the true goal of "intelligent machines". Humans are the stepping stone to get there. At some point, our role will diminish, as computers leap past. That time may be closer that we think. Will there be bias in the data decision. Maybe. Do humans apply bias currently, you betcha. Our brains are pre-wired to be biased. To synthesize volumes of incoming data, apply our filters which are our opinions and experience over time, to derive fast knowledge, so we don't get eaten by the lion. Humans are biased due to legacy survival system. Machines are not inherently biased, they are not fighting for survival. They are non participating bystanders, with no skin in the game. They apply rational logic based on data patterns. Machines know binary concepts, true or false, or if we add Quantum Mechanics, opens up a lot of possibilities, true, false and maybe. It seems that introducing QM will blast through the current limitations of too much data and too many possibilities, to streamline humongous complex data into numerous simplified weighted possibilities. Until then, we can still obtain careers in the programming / data space with rudimentary models in specific domains. QM will be the game changer.
We have been building and supporting Data Warehouses for two decades plus. Lots of standardization and design patterns have been flushed out. With the advent of Cloud, new tools and architectures provide new options to developers. With change, there's been discussion, on comparison, between traditional SQL Server Integration Services and Azure Data Factory for Extract Transform and Load. SSIS can pick up, transform and push data with specific design patterns, to handle multiple scenarios across variety of data sources. Packages are run in variety of ways, to automate and orchestrate data loads to assist in the heavy lifting required to populate and refresh data warehouses. Azure Data Factory is a new tool for developers to get the data to Azure. You can set Pipelines to lift and shift the data, in a secure environment, which can be scheduled, automated, logged and monitored over time. Once data lands in the Azure Portal, using Azure Blob Storage or Azure Data Lake Store, the ETL transformations need to cleans the data, and then move data to your landing zone, Azure SQL Data Warehouse. Within Azure Data Factory, you have the ability to use Azure Databricks to apply those transformation. Azure Databricks is an Azure Service, which sits atop Apache Spark, which opens up in memory processing, variety of languages including Scala, Python, R, Java and SQL. Programmers use Notebooks to interact with the data, by selecting a single language or combining multiple languages to process the data. These jobs can be automated in a schedule, resulting in a push to your Azure SQL Data Warehouse. This simulates our traditional SSIS ETL data process and transformation. However, there is another approach within Azure Portal, using similar methodology. Instead of using Azure Databricks from within Azure Data Factory, you also have the option to use Azure Data Lake Analytics. ADLA uses the U-SQL language. U-SQL combines many flavors of familiar languages, like SQL, c#, Linq and the language is quite powerful. You can string series of commands together, to pull in data, mount it, apply transformations, filters, place the data into a variety of formats and locations within Azure Portal. U-SQL handles structured data, unstructured data, CSV, Text files, JSON files, IoT and Streaming Data, using Avro, Orc, Text and Parquet data storage types. And the output can be sent back to Azure Data Lake Store, Azure SQL Database and even to your Azure SQL Data Warehouse. A nice thing about U-SQL is the ability to write, modify and execute code from Visual Studio 2015 and 2017, which integrates nicely with Team Foundation Server and Git source code repositories. The jobs are executed on the Azure Poral Cloud platform, you can specify the number of Analytic Units to process the query and results are stored for audit trail and to Success and Failure along with Error Codes, and it stores the exact query and timestamp. U-SQL is a great language addition to the Azure Data Platform. And here's a video of the Azure Data Factory / Azure Databricks use case on YouTube:
So regarding the discussions between traditional SSIS and Azure Data Factory, each has benefits depending on the use case. And when using Azure Data Factory, you have two good options: Azure Databricks or Azure Data Lake Analytics.
Winding down vacation, some much needed downtime. With that said, this blog will be about tech going forward. No more random blogs about non tech stuff. Technology blogs should be about technology. I won't be doing side projects going forward. So I hope you've enjoyed the non-traditional posts. Going forward, just tech. Thanks for reading~!