Current Technology Foreshadowed in Movies

The way to introduce things into mainstream technology, is to blend in gradually.

Who remembers Star Trek:

The Jetsons:

Lost in Space:

War Games (we're in!):

Short Circuit (artificial intelligence):

Lots of current technology was predicted right before our eyes.



Quantum Mechanics will Blast through the BI & AI Limitations

Business Intelligence is extraction of value and insight through analysis of data.  Artificial Intelligence is a layer above, crunching data to process data into models, and finding insights, value and predictions.

We find value from data by assembling, cleansing, positioning, joining and consolidation into logic units, applies to both BI and AI.

And we do that using code.  Code is simply logical commands interpreted by computer processor.  Those commands are currently written by humans, assisted by machines.  At some point, the humans will not be necessary.  Computers will write the instructions, based on logic derived over time.  It will possibly write numerous commands, in real time, to derive the "best" insight based on current knowledge.  It will also infer things based on that data and past experience.

At that point, the machines will be smarter than humans.  Because they are fast, logical and can present multiple scenarios simultaneous, in real time.

That is the true goal of "intelligent machines".  Humans are the stepping stone to get there.  At some point, our role will diminish, as computers leap past.  That time may be closer that we think.

Will there be bias in the data decision.  Maybe.  Do humans apply bias currently, you betcha.  Our brains are pre-wired to be biased.  To synthesize volumes of incoming data, apply our filters which are our opinions and experience over time, to derive fast knowledge, so we don't get eaten by the lion.  Humans are biased due to legacy survival system.  Machines are not inherently biased, they are not fighting for survival.  They are non participating bystanders, with no skin in the game.  They apply rational logic based on data patterns.  Machines know binary concepts, true or false, or if we add Quantum Mechanics, opens up a lot of possibilities, true, false and maybe.

It seems that introducing QM will blast through the current limitations of too much data and too many possibilities, to streamline humongous complex data into numerous simplified weighted possibilities.  Until then, we can still obtain careers in the programming / data space with rudimentary models in specific domains.  QM will be the game changer.


Azure Data Factory with Azure Databricks or Azure Data Lake Analytics U-SQL

We have been building and supporting Data Warehouses for two decades plus.  Lots of standardization and design patterns have been flushed out.

With the advent of Cloud, new tools and architectures provide new options to developers.

With change, there's been discussion, on comparison, between traditional SQL Server Integration Services and Azure Data Factory for Extract Transform and Load.

SSIS can pick up, transform and push data with specific design patterns, to handle multiple scenarios across variety of data sources.  

Packages are run in variety of ways, to automate and orchestrate data loads to assist in the heavy lifting required to populate and refresh data warehouses.

Azure Data Factory is a new tool for developers to get the data to Azure.  You can set Pipelines to lift and shift the data, in a secure environment, which can be scheduled, automated, logged and monitored over time.

Once data lands in the Azure Portal, using Azure Blob Storage or Azure Data Lake Store, the ETL transformations need to cleans the data, and then move data to your landing zone, Azure SQL Data Warehouse.  

Within Azure Data Factory, you have the ability to use Azure Databricks to apply those transformation.  

Azure Databricks is an Azure Service, which sits atop Apache Spark, which opens up in memory processing, variety of languages including Scala, Python, R, Java and SQL.  

Programmers use Notebooks to interact with the data, by selecting a single language or combining multiple languages to process the data.  

These jobs can be automated in a schedule, resulting in a push to your Azure SQL Data Warehouse.

This simulates our traditional SSIS ETL data process and transformation.

However, there is another approach within Azure Portal, using similar methodology.  

Instead of using Azure Databricks from within Azure Data Factory, you also have the option to use Azure Data Lake Analytics.  ADLA uses the U-SQL language.  

U-SQL combines many flavors of familiar languages, like SQL, c#, Linq and the language is quite powerful.

You can string series of commands together, to pull in data, mount it, apply transformations, filters, place the data into a variety of formats and locations within Azure Portal.  

U-SQL handles structured data, unstructured data, CSV, Text files, JSON files, IoT and Streaming Data, using Avro, Orc, Text and Parquet data storage types.  

And the output can be sent back to Azure Data Lake Store, Azure SQL Database and even to your Azure SQL Data Warehouse.

A nice thing about U-SQL is the ability to write, modify and execute code from Visual Studio 2015 and 2017, which integrates nicely with Team Foundation Server and Git source code repositories.

The jobs are executed on the Azure Poral Cloud platform, you can specify the number of Analytic Units to process the query and results are stored for audit trail and to Success and Failure along with Error Codes, and it stores the exact query and timestamp.

U-SQL is a great language addition to the Azure Data Platform.

And here's a video of the Azure Data Factory / Azure Databricks use case on YouTube:

So regarding the discussions between traditional SSIS and Azure Data Factory, each has benefits depending on the use case.

And when using Azure Data Factory, you have two good options: 
Azure Databricks or Azure Data Lake Analytics.

Thanks for reading~!


Just Tech, No More Fluff

Winding down vacation, some much needed downtime.

With that said, this blog will be about tech going forward. 

No more random blogs about non tech stuff.

Technology blogs should be about technology.

I won't be doing side projects going forward.

So I hope you've enjoyed the non-traditional posts.

Going forward, just tech.

Thanks for reading~!


SSIS & U-SQL Project Using Multi-Hops to Flow Data to Azure

Building out an Azure solution, using SSIS to pull from source database, apply some data messaging with dynamic Derived Columns to swap out "\n", flow data to Azure Data Lake Store using SSIS component, pick up the CSV file using U-SQL .Csv Extractor, into U-SQL database table.

Solid concept, yet it's also automated, to build out each independent SSIS file, based on source table / field name / field type, Derived Column's also built dynamically for any Strings (Varchar  Char) fields, Data Lake Store file path dynamic, as well as U-SQL dynamically built for each table, and dynamically swap back the "\n" characters,

If you've ever looked under the hood of an SSIS package, the XML is a complex set of code, yet it has patterns, and those patterns can be duplicated.  How is this accomplished?

Using an SSIS Script Component using c#, reading from Config file and Package Variables.  It also builds unique dtsConfig file, same naming convention of package DTSX file.

Files can be set to run using series of parent packages, executed from DTExec BAT file.

I've used Visual Studio 2015 with all the required installs and Add-Ons, and then tried VS2017, and prefer the newer version.  Have ability to create U-SQL projects to write U-SQL, execute jobs from on-premise VS, view job progress in Azure Solution Explorer, and verify data on Azure side, make sure it lands in correct place, data in correct format, and lands in U-SQL, and view all previous job, both success and failure.  No ability to set breakpoints on U-SQL code in real time, so execute view, modify script, repeat.

VS integrates with source control.

One thing to keep in mind, are Nulls.  When creating tables and writing U-SQL scripts, must append "?" indicate Nulls allowed.  And when pushing data to Azure Data Lake, it transposes Nulls to "\N", so you have to handle them in the U-SQL code, although a built Null Escape function handles this.  You do need to handle special characters embedded within source raw columns such as tabs, commas, new line feeds, etc. or the column counts will be off and throw errors.

U-SQL is quite powerful, as you can nest statements, use variables, reference DLL assemblies, and have a variety of formats to work with, such as CSV, Text, JSON file source data, and variety of data formats like Avro, ORC, Parquet and Text.  You can source data from Azure Data Lake Store, Blob Storage as well as other Azure data storage types.  So it goes beyond traditional Relational Database requirements of knowing Structure ahead of time, so semi-structured and Un-Structured data is okay.

After job runs, you can view run times, description of errors and it points to the specific error with "#' sign, and keeps exact script ran for future reference, and of course, its in the Cloud so others can view what jobs are running in real time.

With U-SQL databases, you can specify Database name, apply Schema names, Table names, Column names.  Data Types are standard, although there's no direct mapping from Source system to CSV to Database to SQL Server. 

You can write your own custom functions, in c#, upload the DLL to Azure, register and reference code to handle specific scenarios.

Because you can ingest JSON, you can pull in Streaming data from Azure Blob Storage or Azure Data Lake Store, pull the data in, parse, apply rules, and send data along the trail for storage and analysis.  This is nice for IoT data, reading in lots of data from mini-packets from sensors and devices out in the field.

As you know, storing data is not terribly expensive in Azure, so archiving data from On-Premise data can save money in the long run, make the data available for Ad-Hoc reporting, as well as integrate into other data sets like Azure SQL Data Warehouse,

U-SQL is a nice addition to the Azure platform, it provides coders much flexibility, easy ramp up time and handles Unstructured data.  Basically provides a language to do the heavy lifting once data reaches the cloud.

Combing with Azure Data Factory opens up lots of opportunity, flexibility and choices.  I'm just waiting to sink teeth into building out that next.

Thanks for reading~!


In With the New U-SQL

I worked on a solid Microsoft Business Intelligence project, went okay, then off to another.  Parallel Data Warehouse, for a while, then off to another.  Then a Hadoop - PolyBase - Microsoft BI - MDS - Tabular project, went okay, then off to another.  Then a quick Tableau project, went okay, then off to another.  Then a SQL against AWS Hadoop project, went okay, then off to another.

Where are we at now?  Building out an Archive project, using Microsoft SSIS, c# script component to automate, Azure Data Lake Store and Azure Data Lake Analytics components, U-SQL Databases.

You can see, the projects differ every time.  Get up to speed, become expert level, deliver, then change technology.

Traditional Data Warehousing never excited me, I was late to the party, will slowly lose luster and support, for variety of reasons.  I think the Cloud is the place to be.

U-SQL is a newer language that combines SQL, c#, Linq and ETL.  It runs as a job, can be kicked off from On Premise IDE Visual Studio or in the Cloud.  It's dynamic, works with Structured and Unstructured data.

I think the language will mature, the IDE install will be streamlined, the reporting on U-SQL Databases will be available to more Reporting tools and Intellisense will improve.

U-SQL reminds me of combination of Hive and Pig, in the familiar Visual Studio IDE we've known for years, executes in the Cloud Azure, stores past jobs both Success and Failed, run time, Error Messages.

Quite tasty.


Current State of Machine Learning and Artificial Intelligence

Machine Learning has three main features.  

For some, recommendation engines one websites are valuable tools, perhaps suggesting good movies similar to ones you've watched already, based on preferences

For some, clustering of homes on real estate sites are good tools.  What is my home value based on similar homes within the region.

And for others, classifying images is a good tool, perhaps recognizing your friends in a picture on social media, without having to enter the tags in manually.

Recommendation, Clustering and Classifying.  Three solid pieces of machine learning or types.  And a variety of algorithms to get to that goal.

There's also Neural Networks, that are useful for artificial intelligence, they have weights that get toggled true or false, depending on incoming variables, with multiple layers, which produce an end result weighted probability.

Some common themes are lots of data, to train the model, to learn from known data.  some data needs to be labeled ahead of time, which is "Supervised Learning".  The opposite is having the model learn on its own, with the luxury of "tagged" data ahead of time, called "Unsupervised Learning".

There're also models that learn from other models, to reinforce learned behavior, the event happened, this was the result, let me track that, for future reference.

Some Neural Nets are very large, with hundreds of layers deep, to get fairly precise results.  They do require more compute power and memory and take time to process.

There are a slew of languages and tools to use when working with machine learning and artificial intelligence, both on-premise, virtual machines and in the Cloud.

The fact we have bigger data sets, better compute power, more ram and machines chips that can crunch more data faster.  In the past, typically large institutions had access to these types of machines, so Universities or large computer organizations were the only place to work on this technology.  Now you can program this on a laptop in you living room if you desire.  So things have tricked down into the hands of the many.  Thus, faster progress, cleaner solutions, better results.

The results have gotten better, in image recognition, speech recognition, translation tools from language to language, some in real time and many more.  At this point machine learning is a useful tool to assist humans in everyday activities.  It has not gotten to the point where AI is replacing everyday jobs, in most cases.  But the tides will shift at some point, where people are competing with smart machines for everyday jobs.  And when robots are tasked with specific tasks, that too will automate some of the workforce.  And when smart machines can perform complex tasks in real time with minimal error, then we will take notice as computers do not necessarily require vacation time, or health insurance, or 401k matching contributions, and they work 24/7 without complaining or forming Unions.

So, we have come a long way, computers can crunch data and translate into meaningful information for consumption, to lower costs and automate menial tasks.  Where do we go from here?  More automation, easier to use tools, better integration across domains, access in everyday tasks, embedded IoT devices that assist in real time.  Artificial Intelligence should not be interpreted strictly as computers rising up, taking over man kind.  Surely, anything is possible, at this point in time, we are still telling the machines what to do.  And they do their specific tasks, in a sort of black box, and we do not necessarily know exactly how the results were derived.

With Moore's Law, things will only get faster, cheaper, easier to use and proliferate through everyday society.  That should keep us busy for a while, for sure.


Hop Aboard the Data Bandwagon

It does appear the Cloud is the place to be, certainly gained traction past few years.  Few key players to choose from.  Offer mostly the same services, just depends on what flavor you prefer, how well it integrates into current stack, and your developer pool staff skills and availability.

It seems Machine Learning is still hot along with Artificial Intelligence, although it bypassed many folks from the data pool skills set, requiring new languages like Python, R, Scala, Spark, Notebooks and algorithms.

Big Data didn't solve all our problems, but it's a nice addition to the data stack, has some good hooks to get data in and out, as well as reporting, although it won't be our traditional transaction database as expected.

Organizations still want their data, in readable format, in timely manor, with accuracy expected.  Although hiccups in the Extract Transform and Load still seem to play havoc on our daily reporting needs.

Data Science still seems hot, although just because you have talented DS, doesn't mean they can write standard SQL or knock out some traditional reports.  The two seem mutual exclusive and don't necessarily overlap, as new college grads jump straight into DS with little to no knowledge of traditional data ecosystem.

Data Engineer is a hot position, joining disparate data sets for others to work with.

Data Management has probably risen the fastest, as new GDPR rules necessitate solid data practices.  With that, having knowledge of where all the data resides, what it contains, how to access has pushed the Metadata into the spotlight, with Data Catalogs to handle such requests.

We are no longer at the point where knowing SQL or Access or Excel can guarantee you a data position.  Data skills have proliferated, grown, splintered, gone sideways and every which way.  The Data Ecosystem has exploded and many new comers have entered the arena, as developers, architects, software vendors and applications.

Throw in Blockchain, Agile Methodology, Streaming, IoT and Domain Knowledge, you can clearly have your hands full for the next decade or so.

Suffice to say, data is hot.  Anyone entering the workforce that's looking for a solid career, should look no further than data.  It's the bread and butter of every department in every organization.  There are some good companies foaming at the mouth, to get some talent in the door, to hit the ground running, and add value across the board.

So hop aboard the Data Bandwagon.


Automated Solution to Repetitive Coding

What if you were tasked with creating an SSIS package.  To pull from a database, send the data to Azure Data Lake Store as CSV.  Then pick up that data, flow to U-SQL database table.

Well, you'd probably need a source ODBC connection to the database.  And an SSIS package component to flow to Azure Data Lake Store.  And another to flow to U-SQL db.

So, once you go through the effort to complete the task, let's say you had to build 500 similar SSIS packages.  Well, it if takes 2 hours per package, and you don't make any errors, you may get done this month or next.

Or.  You could automate.  As in, build out each package, line by line, using an SSIS package Script component.  Maybe use XML template for repeatable code snippets, pass in data as variables, and create the 500 SSIS packages, in under a minute.

Well, that's what the team's been working on.  And it's been quite tasty.  Building c# code certainly is fun, different from writing standard SQL reports.  One of the great things about coding, you certainly get a different perspective every so often as projects change.

I've been working with SSIS package when they were DTS packages, not DTSX.  And pulling data from databases for a very long time.  Except flowing to Azure is sort of new.  I was lucky enough to figure out the solution, there's some documentation on the web, blogs and such, but for the most part, you have to figure it out yourself.  Those are the juicy projects and this is my 2nd one in the past few years.  The other project being the Hortonworks Hadoop Enterprise Data Warehouse using Visual Studio components and Polybase and Master Data Services.

Being an expert in anything is daunting task, as things change, nobody knows everything.  It's when you approach a project with fresh eyes, you stay open minded to see different options, try this, try that, see what works, document as you go.  And perhaps you become expert for a moment, until the next projects begins.

Programming is still a descent job.  I work out of my house, past 5 years actually.  You begin the day before most people, and you end the day after most people.  There's no commute.  There's no off button.  Yet, it seems to work just fine.  If you can work fast, if you're responsive, if you can deliver solutions to tough problems, what more can you ask for.

And so it goes~!


Supporting Public Facing Java Applications

I once supported a web application, public facing, many users.  First task was to add Cookies & Captcha.  After that, a re-branding of the website to update with different color schemes for consistent look and feel.

It actually was written in Java using IBM VA Java, which I was hired to be a Senior Java Developer.  Luckily I learned it on my own years back, they were just looking for someone that knew that flavor of Java.

In fact, it hooked to the Mainframe on the back end.  So we had copybook frameworks which mapped to the back end, which connected to Mainframe COBOL and returned data, pieces at a time.

Then we migrated to use Web Services with XML, and then migrated the entire application to JDeveloper java.

Also, the application had hooks into the IVR system which I supported, to accept credit card payments over the phone.  Was interesting to test the application using break points and calling in the number and seeing the code stop on specific line of Java code.

Lastly, it also hooked into the Kiosks which we had a few around town, which also accepted payments, and it too was written in Java connecting to the Mainframe back end.

It was fun to program in Java, although we had internal teams moving to Struts and I didn't get to work on much of that.

I also supported other applications in Java, it was a high profile position as all were public facing.  In addition to being the Business Objects Admin for a while, introducing Crystal Reports to the ecosystem and also had a stint of Project Management.

Overall an excellent job.  Lasted more than 4 years.  Not too shabby~!


SSIS Automation using SSIS

Do you ever sit down and write code.  For hours.  Never get up.  Even to eat or use the rest room.

That's what I've been up to the past few days.  Writing c# code.  In SSIS script component.

It's quite tasty.  It's actually programming.  Within SSIS.  In the data space.  It's an SSIS package to create other SSIS packages.  Using custom code, XML templates.  Reading from source system.  And Excel file template.  It outputs ".DTSX" packages.  To automate the process.  Rather than write each package one at a time.  Automation is key.

Also writing U-SQL scripts.  Create, truncate and dropping tables in Azure Data Lake Analytics U-SQL database.  And pushing CSV files up to Azure Data Lake Store.

Sure is fun to be heads down in the code.  I'll be there in 5 minutes.  2 hours later, where are you?  I'll be there in 5 minutes.

And so it goes~!


Azure Programming in U-SQL

Here's something I noticed recently.

I enjoy working in the Microsoft space.

When you work in Microsoft space of technology, things are fairly consistent.  Drop down menu's.  Command buttons.  Documentation.  It's fairly uniform across most applications.

When you compare that to open source or another vendor owned software, perhaps the fonts are different, or the screens are jumbled and difficult to read.

In other words, when you work with the same software apps by Microsoft, you intuitively know where things are on the screen, you know what to expect when you install an application, you have confidence the app is going to work as expected and most likely, it will integrate with other products of same vendor.

Now keep in mind, if you've ever tried to install Visual Studio for instance, you are aware of the number of dependencies and extra installs required just to get everything up and humming like dot net frameworks and such.  And there's a bunch of different places to download the software.  And try messing with the Gaac or Registry settings, no easy task.

All that is a side note.  What I"m talking about is ease of use, the common themes across products and its interaction with other apps.

We're not talking about deprecating good products, like Visual Studio 6, forcing developers to move to .net object oriented languages.

Speaking of new languages, I've been working with U-SQL for the past few months and I will say its a fantastic language.  You can develop the code in notepad and copy paste into the Azure portal, click submit and watch it run in real time, it shows the code, the errors, graphs, execution plans.  

Also, you can take that same code and run in Visual Studio then hop over to Azure portal and watch the job run in real time.  You can also see the Azure objects from Visual Studio, export data in VS or Azure, and watch jobs run in VS or view objects from Solutions Explorer.

It's amazing to write code in on a VM, execute the code to pull from on-premise data, push to the Cloud Azure Data Lake Storage Group, mount that data using U-SQL and send to U-SQL database table.  Hybrid Programming is the future.  When you execute a job, you can see the Estimated Cost / Charges on the web page.

This environment is so open and fluid and really opens up a lot of options, and it has the look and feel you get with similar vendor products.  I've been working with Microsoft products since 1996 professionally with Visual Studio 3,4,5,6, then ASP then .net, SQL Server, MSBI and now Azure.

Azure Cloud Programming has a lot of good features and opportunity.  Bite off a particular area to work in, I recommend U-SQL.  Good stuff.


Attending the Real BI Conference at MIT in Boston

Wrapping up day 2 of the Real Business Intelligence Conference, See Beyond the Noise, held at MIT in Boston.  Overall, it was a great event and glad to attend.


The nice thing about the conference, was the venue.  Just being at MIT, you sort of feel smarter.

Next, the quality of speakers was great, with a mix of topics ranging from Futurism to GDPR to Real World Data Project success stories.

Food, snacks, great.

I like the single auditorium, single track approach.  Lots of conference have a number of multiple tracks running parallel and having to choose between multi options.

The conference also had a community feel to it, as in getting to meet people from all over the world, have discussions of quality, on variety of topics around technology.

The two day approach was nice, as some conferences run 4 or 5 days, and very technical, by the end of the week, your brain is fried, don't remember much and actually skip some of the sessions to give the mind a rest.

This conference was concise, personal, quality speakers and content.

If you're interested in attending next year, the website is:  https://www.eventbrite.com/e/2019-real-business-intelligence-conference-tickets-47210074604

The most surprising topic was around GDPR, which is basically a world mandate, to protect EU user's data, the rules, and implications surrounding non compliance. Seems like a good opportunity for consulting firms to hold business' hands in getting compliant.

The main theme in almost every talk, "Ask better questions."  Overall, very happy to have attended the Real BI conference 2018 at MIT in Boston.  Even got a photo with Howard Dresner, who coined the term "Business Intelligence" term in 1989, his firm hosted the event

Thanks for reading~!


URL References for U-SQL Getting Started

Here's a basic list of URL to get familiar with U-SQL, which is Microsoft's fairly new language to access Azure Data Lake.  Sort of a blend of C#, Linq, SQL, Windows Functions and a bunch more:

Get started with U-SQL in Azure Data Lake Analytics

Plug-in for Azure Data Lake and Stream Analytics development using Visual Studio

Develop U-SQL scripts by using Data Lake Tools for Visual Studio


Operators (U-SQL)

SQL Sampling Methods

SAMPLE Expression (U-SQL)

Introduction to the U-SQL Tutorial (version 0.7)

U-SQL Language Reference

Introducing U-SQL – A Language that makes Big Data Processing Easy

Develop U-SQL scripts by using Data Lake Tools for Visual Studio

Analyze Website logs using Azure Data Lake Analytics

Overview of Azure PowerShell

Install Azure CLI 2.0

Accessing diagnostic logs for Azure Data Lake Analytics

Use the Vertex Execution View in Data Lake Tools for Visual Studio


3 Hot Tech Trends to Disrupt Everything

Artificial Intelligence on the edge.  Latest cutting edge technology.  In other words, let the AI model reside in Internet of Things devices sensors.  Ingest data, ping the Model, look for anomaly, fire off message to home base to alert.  Models can be created locally, pushed to the edge, where they reside.  Can update Models over time.  Seems like a good distributed AI model in real time.

We have devices to monitor people's vitals in real time, send messages back to home base, to alert if need be.  Combine the two, you've got some serious monitoring ability.

Due to security concerns, people have suggested embedding "chips" into children, so they are easily track-able.  Some push back from advocates, borders on ethical concerns, do we want to cross the boarder on people basic freedoms.

We can embed chips in cows perhaps, monitor them from a distance.  Seems like a stone's throw away, humans could be next.  First people volunteer, then offer service, similar to Flu Shots.

Internet of Things had security concerns out of the gate, opens up vulnerabilities, someone could tap into your home security through your thermostat, once in, scan your files, embed Trojan Horse, even Ransomware.  These are not good, yet they are real threats.  Suffice to say, if someone wants to hack you, they can usually find a way, as any device connected to internet is suspect.

With talks of cyber currency to overtake traditional paper money, we could soon see the disappearance of physical wallets.  Then all transactions will be documented in real time, audit trail, using the upcoming technology Blockchain.  Basically a distributed ledger system to handle transactions.  It uses a technique to add new transaction to the chain, by collectively validating the hash key, which is unique and created by hashing the prior key.  If you transaction is valid, it will be added to the stack and committed, and can never be altered, modified or deleted.  This should allow a valid history of all transaction.  With that said, financial transactions would no longer need to go through traditional methods, where the money is placed on hold until the nightly batch pushes the monies here and there.  It will be instant.  This can and will be applied to currency, voting, stock transfers, healthcare records, just about anything and everything.  This will disrupt all industries.

Taking the IoT example, what if they add microphones and cameras to people.  That would surely open up new avenues for monitoring, instead of current methods of smart phones, smart listening devices and cameras littered throughout society.  It would be a tighter mechanism for sure.

So it would appear, the latest hot trends in technology surround Artificial Intelligence, on the Edge, using Internet of Things, along with Blockchain.  These three technologies are primed to disrupt everything.


Close the IT Skills Gap by Encouraging STEM to Girls

Any mention of storing data in the Cloud a few years ago, people didn't trust it.  Fast forward, Cloud is where it's at, for all your data needs.  In addition to the arsenal of associated technologies to piece together to form infinite possible solutions.  Sometimes it takes a while to adopt change.

So one of the hot topics today is the recent talent shortage.  There just aren't enough qualified people to fill open positions throughout the world.  Although more emphasis on STEM programs at younger ages, the stigma of being a nerd still exists in schools.

So another hot topics today is the lack of women in IT.  Traditionally, IT was staffed by men and some of the cultures revolved around either the "good old boy" mentality or the "bro" mentality.  This was two fold, as it prevent women from entering the field and cause some women to exit the field.

So how do we get two birds with one stone.  By getting more women in the IT profession by removing the legacy "bro" mentality and teaching girls about STEM early on.  Math, Statistics, Science, Project Management, Management, Innovation, Creativity and every other skill in between.

Another factor at play, that needs to be addressed, is Women in Technology need to be assured equal pay for equal work.  There's no reason to subvertly discriminate a specific gender in this day and age, when all people are created equal.

So the solution to IT shortage across the globe is to involve girls in the STEM program at early age, remove any stigma associated with "intelligence" and pay people fair value across the board.


Hologram Keyboards Connected to Virtual Cloud Computers

If I had to use a thumb mouse to perform job functions, I'd have to find a new career.  I am unable to use this input device long term.  Which brings up the question of input devices.

From punch-cards, to keyboards, to mouse, to smart phone swiping, to smart watches, to voice recognition using AI Natural Language Processing.

We went from dumb terminals, to PC, to laptops, to Smart Devices and Pads, to smart watches.

Computers are no longer the "main" devices to connect to the web and interact.

What if we took the dumb terminal approach, had the PC hosted virtually in the Cloud, connect via any device.  That would reduce the costs for PC, Laptop, etc.  Access your computer from any device, anywhere, any time.

What if we had keyboards that were not physical, as in holograms.  Simply start up your hologram keyboard, connect through your internet connection via wireless network or smart phone, connect to your virtual hard drive in the cloud, that contains all your programs.

Seems like one plausible next step in the evolution of PC on every desktop.

Likewise, what if the hologram keyboard could connect to IoT devices out in the wild.

Would surely open up new opportunities, and markets.


Orgs Need a New Role to Manage New Cloud Offerings

Listening to the Microsoft Build live stream this week, it's clear that technology is changing.  AI and Azure are the hot topics.  It seems everything is moving to the Cloud.  And AI is to find its way into all products.  There were many new announcements, which many folks have reported on through industry sites and blogs.

It seems that Microsoft now does everything, they have tentacles in every technology.  And they are driving and partnering with a lot of industry leaders to create new products as well as open technology to the masses.

Blockchain, IoT, Machine Learning Models, Drones, PowerBI, Databases of all varieties, Live Code Sharing, and everything in between.

The presentations are fairly high level, yet profound, in that with a few clicks you can create webs, push to Azure and be live in no time flat.

Due to the fact the number of new products, the evolution of existing products, the integration with new and existing products, I would venture to say the newness is mind boggling.  And because nobody can know everything, you may have to identify a sector in which to master and become expert with deep knowledge, or go wide and learn some of the basic across a wide sector.

In addition, I would venture to say we need a new role to be created.  And that is someone within the organization to keep tabs of the available features, how they integrate with legacy and new features, and provide expert advice to internal teams including upper management.  Reason being, technology has exploded, splintered and fragmented, and due to the frequency of new products, features and integration mechanisms, we need an expert person or team of people to keep current with all the latest trends.  By staying abreast of current technology, you can gain leverage by producing newer technology, newer features, help migrate off legacy systems, to save costs and reduce complexity.

This person or team may also keep tabs of competing offerings from other Cloud solutions, for integration purposes and such.

We need a residential expert on Cloud offerings as new position within organizations.  I believe the Partner Program does a good job of this now, what I'm talking about is embedding within your org.  Reason being, the coders have enough task loads to meet agile sprint deadlines, keep internal and customers happy, meet their internal goals as well as ongoing career goals.  Having to burden the heavy load of knowing everything, may be the camel that broke straws back, or something to that effect (LoL).

Suffice to say, technology is the hot job market of today, and tomorrow!


Artificial Intelligence Winter is over Spring has Sprung

Listened to Microsoft Built this am.

Future is Azure + Office365.

Looking to host Azure as Worlds Computer.

Includes Intelligent Edge, Server-Less, Event-Driven, "ubiquitous" computing.

Azure stack is just a year old they said.

Runs on Linux and Windows.

Using AzureML, to cross languages.

Open Source Azure IoT Edge.

Need a data page, to identify where data derived.

50 Data Centers across the globe.

Azure IoT Edge plus AzureML Models allow alerts to be sent based on embedded sensors, identifies issues in real time.  Demo on Camera device in Drones for Intelligent Edge.

Commercial Drone License required to fly in auditorium.

Stream info from Pipes to AzureML, finds anomaly, sends in real time to laptop, to Model developed in Cloud.  Scales in real world, saving companies time and expense.  Then update AzureML Model and redeploy fast.

Not just insights, create Frameworks, to send to developers, to allow developers to "commoditize", allow all developers to have technology in hand and bake into custom applications.

Azure Cognitive Services now has 35 tools, which can be customized, bring your own label data, deploy models where you need it, in your applications: vision, speech, language few examples.

In order to democratize these AI devices, announced speech SDK and reference kits.  Embed and deploy in any device.  Consumer and industrial side.

Conversational AI, bots were talked about 2 years ago.  Need ability to brand customer facing agents.  Converse over multiple AI agents.

Who Owns the Data - the Chief Data Officer

Who owns the data? 

The data gets captured from front end systems perhaps, or capture web log files, or downloaded off the web, Hadoop clusters, perhaps CSV files or JSON, Streaming Analytics data sent from IoT mini burst packets, OData feeds, archives & backups, or good old legacy data.

So it would appear IT owns the data.  Because it resides in files, databases, mainframes that sit internally on a shelf in the data center on-sight or centralized location at another location.

Or perhaps it resides in the Cloud.  If so, the vendor stores the data and is responsible for back ups and concurrency across the globe, so does the Vendor own it?  Well, they only capture or store the data.  So does organization that owns the Cloud actually owns the data?  Or the Vendor?

Yet the data ends up in ETL jobs, converted into Data Warehouses, Data Models, Reports, Visualizations, Machine Learning models, etc.  So does the developer that cleanses, pushes the data to new systems, models, reports, aggregates the data, do they own the data?

How about the Business Units, they know the business model, or at least their piece of the puzzle.  Does the Business own the data? What about data residing on file shares across the network, does IT own that, or the business?

What about insights derived from the data, who owns that?

I'd say it needs to roll up the Chief Data Officer, a fairly new role, that intersects IT CIO and the Business, and everything else in between, and reports to the CFO or CEO.  Or the Data Competency Center, which performs similar if not identical roles.

The CDO is responsible for the entire data stack.  From data creation to data ingestion to data storage to data mashing to reporting to data science.  He or she can matrix other departments for skills, domain knowledge and assistance as needed, including the hiring of consultants.  The CDO works with IT and accounting to purchase software, align for costs savings, document data across the entire org as well as how and when data flows through the entire ecosystem.

Who owns the data?  I venture to say the Chief Data Officer owns the data.

Blockchain Must Overcome Similar Challenges as Hadoop

For Blockchain to be considered a global enterprise level database (ledger), it must scale at the transaction level, in real time, ensure security based on token (incremental keys) that guarantee authenticity, and must be transparent.

Hadoop tried to create real time transactions to mimic traditional databases, yet Map Reduced limited its ability.  It wasn't until Map Reduce was pushed up a level, to become just another tool in the toolbox, that we began to see improvements in query speed.  I'm not sure they were able to insert new records the Hive databases to match standard OLTP databases, although I have not been keeping up to date on this.

So for BlockChain to scale enterprise wide, it will need to overcome the challenges that Hadoop faced.  Hadoop was typically contained within an org or in the Cloud, where Blockchain is scattered across the globe, so distance is potentially greater.  And I imagine once the record is placed on top of the stack, the other nodes must be notified to establish the agreed upon contract to know its legit.

Also, the bandwidth must be able to handle thousands of transactions per second, to mimic OLTP databases, which handle insertions via locks and such.

So BlockChain must handle increased Volumes, across great distances, negotiate valid contracts and update across the chain, in potentially real time.  And since these contracts could be used for stock trades, currency exchanges, and voting polls, it will need to be 100% accurate, secure and transparent.

Tough order to fill.  Let's watch as time progresses, see how things pan out.


Accumulators of Data are New Gatekeepers of Reality

What is reality.  

The Sun revolves around the Earth, as everyone knows, those who disagree will be ex-communicated and beheaded.  That was reality a while ago.  Now we all know the Earth revolves around the Sun.

We all have a basic idea of what reality is and is not.  And for those who voice their opinions that don't mesh with current dogma, we no longer ex-communicate, we place straight jackets on them.

How does society obtain the correct version of reality.  Many ways actually.

Family upbringing.  News and media outlets.  Movies.  Arts.  Sciences.  Schools.  Playground banter.

We pick up clues and eventually assimilate into practical sound mind upstanding people of the community.

Only problem.  What if the information we obtain is not 100% accurate.  Well, its based on facts.  Facts determined by whom.  Scientists.  Who funds the scientific programs.  What programs are allowed through the filter and what are not.  Who maintains the gatekeeper role to decide which facts are allowed and which are not.

You see, I have a lot of free time.  Sometimes I look out the window.  Sometimes its raining and other times its not.  What's interesting is this.  I noticed that it rained on all the major holidays.  Why, because everyone was stuck inside their houses.  I watched this re-occur for 10 years.  Without fail, it rained every single holiday, including election day.  I know this to be true.

For some reason, I downloaded some weather data from the web, to do some analysis, and sure enough, when viewing the holidays for the past 10 years, there was no indication of rain or precipitation.  How could that be, I saw it rain, my clothes got wet, it happened.

Yet according to the data, it never rained.  So which is accurate.  That which is documented or that which is experienced.  Well, history tells us that majority rules.  Reality is shaped and played out based on community agreement.  In this case, I was over-ridden.  My view of reality was in correct.  Or was it.

Perhaps I simply downloaded an outdated or incorrect data set.  Maybe.  Maybe not.

So instead of regarding our view of the world through the text books we read in classrooms, our new reality will be based on what's in the data.  So this would dictate that those who keep and store the data, are the new gatekeepers of our view of reality.  And possibly, we could alter history with a few delete statements here and there, a few update statements, and perhaps a few insert statements.  Stranger things have been known to happen.

So I put it to you.  How important is the accumulation of data going forward.  Time will tell.  And if it's raining out, and its a major holiday, I say, it never rained.

And there you have it~!


Introducing a Simple Framework for Working with Data

This week I blogged about 5 new features to data.  It starts off simple, builds upon previous idea, to form the building blocks of Strong Artificial General Intelligence, a grandiose concept indeed:

Tag Data at Time of Inception - integrate a framework such that data gets tagged upon inception using XML tree like structure to capture meta-data for external use

Open Data Set Framework - standards applied to generic data sets for public or private consumption

Open Reporting Tools - generic report reader seamlessly ingest Open Data Sets - allow any user to work with data to find insights

Global Data Catalog - Cloud Based storage of Metadata for consumption by Open Data Set ingestion

Automate Machine Learning Artificial Intelligence Ingestion to dynamically scan Global Data Catalog for purposes of Unsupervised Machine Learning Ingestion, to automatically build and refresh Data Models in real time to answer specific questions in any domain

Programmers have built Frameworks for a variety of languages.  Frameworks serve the ecosystem by organizing concepts and techniques into re-usable patterns.  Not sure why the world of Data has steered clear for so long, I'm proposing a new foundation, a series of non threatening concepts, when combined, will produce results greater than each individual line item idea.

Remember to tell 'em who you heard this from first, before its gobbled up and re-distributed as someone else' idea.  Jon Bloom.

As always, thanks for reading~!





How to Survive the Rise of Automation, Intelligence and Robotics

The great chasm that divides society will be of knowledge and how that translates to marketable skills.  

With the rise of automation, many manual tasks will be performed by Robots and / or Algorithms.  Reason being, human capital is not cheap, automation is.

Once a computer model is trained in specific domain, at expert level, it's speed, accuracy and documented audit trail would be no match for average people.

In order to survive the next economy, one must have knowledge and the ability to translate that into a necessary skill that's in demand.

A Data Scientist could train a machine learning model, by feeding it information about court cases, going back 500 years.  The Model would learn the logistics, the exceptions, the probability of outcomes over time, and be a source of information going forward, so long its updated over time and verified for accuracy.

That translates to reduced demand for those in the legal profession, like research.  Imagine having tons of valid info at your fingertips, in real time, scanning millions of court cases on the fly.

Now, ripple that to scenario to other professions and you see very fast the impact automation will have on society.

Throw in Robots, Self Driving Vehicles, Transportation and Logistics, Food Service, Education and many more industries will be severely impacted.

With fewer individuals able to earn gainful employment, less money flowing through economy, perhaps slow down in GDP, the stress and burden on society could increase as costs and consumer debt rises, the picture becomes a bit more bleak.

There's mention of Basic Income, yet if you begin to review what a global welfare system would look like, you see very quickly there are many holes.  As in who will finance a great chunk of society, would crime and black market increase, what would people do during idle time, will population increase or decrease, what chance will offspring have to become educated and find employment.

However.  Those that have quantifiable legitimate skills, that are in demand, would find work.  Perhaps in technology, or a service that requires on-site tasks, or something creative that requires humans specifically.  They will have pick of the litter, luxuries not available at lower rungs, as their skills will be demand.

Looking at things from this perspective, you would imagine any youngster frantically learning everything they can get their hands on, as their future could depend on such knowledge and skills, in order to stay afloat, down the road, when automation and robotics make their way into mainstream society.

And there you have it~!


To Reach Artificial General Intelligence We Must First Tag All the Data at Time of Creation

What are the basic steps a report writer performs to do their job?

  1. Obtain requirements by mapping out required Fields, Aggregates, Filters
  2. Write SQL Statement(s) using Tables, Views, Joins, Where Clauses, Group By, Having clauses
  3. Validate Data
  4. Push to Production

What if we applied Self Service over this process.

  1. Users specify requirements by mapping out required Fields, Aggregates, Filters
  2. Table or View Joins were already created in background, User select fields, aggregates, filters, etc.
  3. Data validated prior to model deployment, so in reality data should be accurate
  4. Model uses Production data, can save off Self Service report, schedule to run on frequency

What if we applied Semi-Automated-Self-Service process to deliver reports.

  1. All data elements, tables, views, fields, existing reports with report title, report use / function, existing fields, parameters, would all get stored into a Metadata repository similar to Data Dictionary or Data Catalog ahead of time.
  2. User specify what problem they are trying to solve
  3. System would pull specific fields from pool of available fields that correspond to answering the asked question
  4. Report would self generate for user consumption

What if we applied Weak Artificial Intelligence to deliver reports.

  1. User specify what problem they are trying to solve
  2. AI would process request, pull associated data to support answer to question
  3. User receives instant response with high percentage probability correct answer

What if we applied Strong Artificial Intelligence to deliver reports.

  1. AI system would generate their own questions
  2. AI system would know where to find their answer
  3. AI system would solve their own problems unassisted by human intervention

How do we get to Strong AI?

My guess, AI Systems require data which is labeled or tagged, to perform Unsupervised Machine Learning, to build and run Models, to derive fair amount of accuracy of probability.  Most of the world's data is not tagged.  It also doesn't mash well, out of the box, with other data sets.  For example, if you have a data set of financial transactions of specific customers, how do you join that data set to a data set of home values over time.  There aren't any pre-defined keys that you are aware of.

So if we tag the data at time of creation, sort of like a self referencing, self documenting XML file associated with a data set or SQL Table, you basically create a WSDL of high level data structure, along with audit trail to track changes over time, along with any revisions or changes or updates or deletes to the record set, perhaps IP address of where data was born, time stamps, etc.

Any ingestion process could read this new self defining WSDL type file, determine what the data set consists of, fields names, field types, etc. such that it could automatically deduce the contents of the data, without having to ingest everything.  By doing so, the AI ingestion process, could read a global encyclopedia of archived data sets, continually added over time, and pull in any required data set for consumption, to add to the model, refresh, in order to derive an answer to a question, with high degree of accuracy based on probability.

What I'm saying is by tagging the data at creation time, with an externally consumable file, the AI ingestion system is empowered to pull in specific data sets it finds useful to support a model to answer questions.  This open data framework is flexible to support automation and would support the building blocks to Artificial General Intelligence at rudimentary levels with room to grow into full blown true Artificial General Intelligence (AGI).

Similar Post: 

Self Describing Data Tagged at Time of Creation