4/24/2018

To Reach Artificial General Intelligence We Must First Tag All the Data at Time of Creation

What are the basic steps a report writer performs to do their job? 
  
  1. Obtain requirements by mapping out required Fields, Aggregates, Filters 
  1. Write SQL Statement(s) using Tables, Views, Joins, Where Clauses, Group By, Having clauses 
  1. Validate Data 
  1. Push to Production 
  
What if we applied Self Service over this process. 
  
  1. Users specify requirements by mapping out required Fields, Aggregates, Filters 
  1. Table or View Joins were already created in background, User select fields, aggregates, filters, etc. 
  1. Data validated prior to model deployment, so in reality data should be accurate 
  1. Model uses Production data, can save off Self Service report, schedule to run on frequency 
  
 What if we applied Semi-Automated-Self-Service process to deliver reports. 
  
  1. All data elements, tables, views, fields, existing reports with report title, report use / function, existing fields, parameters, would all get stored into a Metadata repository similar to Data Dictionary or Data Catalog ahead of time. 
  1. User specify what problem they are trying to solve 
  1. System would pull specific fields from pool of available fields that correspond to answering the asked question 
  1. Report would self generate for user consumption 
  
What if we applied Weak Artificial Intelligence to deliver reports. 
  
  1. User specify what problem they are trying to solve 
  1. AI would process request, pull associated data to support answer to question 
  1. User receives instant response with high percentage probability correct answer 
  
What if we applied Strong Artificial Intelligence to deliver reports. 
  
  1. AI system would generate their own questions 
  1. AI system would know where to find their answer 
  1. AI system would solve their own problems unassisted by human intervention 
  
How do we get to Strong AI? 
 My guess, AI Systems require data which is labeled or tagged, to perform Unsupervised Machine Learning, to build and run Models, to derive fair amount of accuracy of probability.  Most of the world's data is not tagged.  It also doesn't mash well, out of the box, with other data sets.  For example, if you have a data set of financial transactions of specific customers, how do you join that data set to a data set of home values over time.  There aren't any pre-defined keys that you are aware of. 
 So if we tag the data at time of creation, sort of like a self referencingself documenting XML file associated with a data set or SQL Table, you basically create a WSDL of high level data structure, along with audit trail to track changes over time, along with any revisions or changes or updates or deletes to the record set, perhaps IP address of where data was born, time stamps, etc. 
 Any ingestion process could read this new self defining WSDL type file, determine what the data set consists of, fields names, field types, etc. such that it could automatically deduce the contents of the data, without having to ingest everything.  By doing so, the AI ingestion process, could read a global encyclopedia of archived data sets, continually added over time, and pull in any required data set for consumption, to add to the model, refresh, in order to derive an answer to a question, with high degree of accuracy based on probability. 
 What I'm saying is by tagging the data at creation time, with an externally consumable file, the AI ingestion system is empowered to pull in specific data sets it finds useful to support a model to answer questions.  This open data framework is flexible to support automation and would support the building blocks to Artificial General Intelligence at rudimentary levels with room to grow into full blown true Artificial General Intelligence (AGI). 
 Similar Post:  

Self Describing Data Tagged at Time of Creation