Introducing a Simple Framework for Working with Data

This week I blogged about 5 new features to data.  It starts off simple, builds upon previous idea, to form the building blocks of Strong Artificial General Intelligence, a grandiose concept indeed:

Tag Data at Time of Inception - integrate a framework such that data gets tagged upon inception using XML tree like structure to capture meta-data for external use

Open Data Set Framework - standards applied to generic data sets for public or private consumption

Open Reporting Tools - generic report reader seamlessly ingest Open Data Sets - allow any user to work with data to find insights

Global Data Catalog - Cloud Based storage of Metadata for consumption by Open Data Set ingestion

Automate Machine Learning Artificial Intelligence Ingestion to dynamically scan Global Data Catalog for purposes of Unsupervised Machine Learning Ingestion, to automatically build and refresh Data Models in real time to answer specific questions in any domain

Programmers have built Frameworks for a variety of languages.  Frameworks serve the ecosystem by organizing concepts and techniques into re-usable patterns.  Not sure why the world of Data has steered clear for so long, I'm proposing a new foundation, a series of non threatening concepts, when combined, will produce results greater than each individual line item idea.

Remember to tell 'em who you heard this from first, before its gobbled up and re-distributed as someone else' idea.  Jon Bloom.

As always, thanks for reading~!





How to Survive the Rise of Automation, Intelligence and Robotics

The great chasm that divides society will be of knowledge and how that translates to marketable skills.  

With the rise of automation, many manual tasks will be performed by Robots and / or Algorithms.  Reason being, human capital is not cheap, automation is.

Once a computer model is trained in specific domain, at expert level, it's speed, accuracy and documented audit trail would be no match for average people.

In order to survive the next economy, one must have knowledge and the ability to translate that into a necessary skill that's in demand.

A Data Scientist could train a machine learning model, by feeding it information about court cases, going back 500 years.  The Model would learn the logistics, the exceptions, the probability of outcomes over time, and be a source of information going forward, so long its updated over time and verified for accuracy.

That translates to reduced demand for those in the legal profession, like research.  Imagine having tons of valid info at your fingertips, in real time, scanning millions of court cases on the fly.

Now, ripple that to scenario to other professions and you see very fast the impact automation will have on society.

Throw in Robots, Self Driving Vehicles, Transportation and Logistics, Food Service, Education and many more industries will be severely impacted.

With fewer individuals able to earn gainful employment, less money flowing through economy, perhaps slow down in GDP, the stress and burden on society could increase as costs and consumer debt rises, the picture becomes a bit more bleak.

There's mention of Basic Income, yet if you begin to review what a global welfare system would look like, you see very quickly there are many holes.  As in who will finance a great chunk of society, would crime and black market increase, what would people do during idle time, will population increase or decrease, what chance will offspring have to become educated and find employment.

However.  Those that have quantifiable legitimate skills, that are in demand, would find work.  Perhaps in technology, or a service that requires on-site tasks, or something creative that requires humans specifically.  They will have pick of the litter, luxuries not available at lower rungs, as their skills will be demand.

Looking at things from this perspective, you would imagine any youngster frantically learning everything they can get their hands on, as their future could depend on such knowledge and skills, in order to stay afloat, down the road, when automation and robotics make their way into mainstream society.

And there you have it~!


To Reach Artificial General Intelligence We Must First Tag All the Data at Time of Creation

What are the basic steps a report writer performs to do their job?

  1. Obtain requirements by mapping out required Fields, Aggregates, Filters
  2. Write SQL Statement(s) using Tables, Views, Joins, Where Clauses, Group By, Having clauses
  3. Validate Data
  4. Push to Production

What if we applied Self Service over this process.

  1. Users specify requirements by mapping out required Fields, Aggregates, Filters
  2. Table or View Joins were already created in background, User select fields, aggregates, filters, etc.
  3. Data validated prior to model deployment, so in reality data should be accurate
  4. Model uses Production data, can save off Self Service report, schedule to run on frequency

What if we applied Semi-Automated-Self-Service process to deliver reports.

  1. All data elements, tables, views, fields, existing reports with report title, report use / function, existing fields, parameters, would all get stored into a Metadata repository similar to Data Dictionary or Data Catalog ahead of time.
  2. User specify what problem they are trying to solve
  3. System would pull specific fields from pool of available fields that correspond to answering the asked question
  4. Report would self generate for user consumption

What if we applied Weak Artificial Intelligence to deliver reports.

  1. User specify what problem they are trying to solve
  2. AI would process request, pull associated data to support answer to question
  3. User receives instant response with high percentage probability correct answer

What if we applied Strong Artificial Intelligence to deliver reports.

  1. AI system would generate their own questions
  2. AI system would know where to find their answer
  3. AI system would solve their own problems unassisted by human intervention

How do we get to Strong AI?

My guess, AI Systems require data which is labeled or tagged, to perform Unsupervised Machine Learning, to build and run Models, to derive fair amount of accuracy of probability.  Most of the world's data is not tagged.  It also doesn't mash well, out of the box, with other data sets.  For example, if you have a data set of financial transactions of specific customers, how do you join that data set to a data set of home values over time.  There aren't any pre-defined keys that you are aware of.

So if we tag the data at time of creation, sort of like a self referencing, self documenting XML file associated with a data set or SQL Table, you basically create a WSDL of high level data structure, along with audit trail to track changes over time, along with any revisions or changes or updates or deletes to the record set, perhaps IP address of where data was born, time stamps, etc.

Any ingestion process could read this new self defining WSDL type file, determine what the data set consists of, fields names, field types, etc. such that it could automatically deduce the contents of the data, without having to ingest everything.  By doing so, the AI ingestion process, could read a global encyclopedia of archived data sets, continually added over time, and pull in any required data set for consumption, to add to the model, refresh, in order to derive an answer to a question, with high degree of accuracy based on probability.

What I'm saying is by tagging the data at creation time, with an externally consumable file, the AI ingestion system is empowered to pull in specific data sets it finds useful to support a model to answer questions.  This open data framework is flexible to support automation and would support the building blocks to Artificial General Intelligence at rudimentary levels with room to grow into full blown true Artificial General Intelligence (AGI).

Similar Post: 

Self Describing Data Tagged at Time of Creation


Self Describing Data Tagged at Time of Creation

It used to be, here's a 4 million dollar project, implement a software system, go live.  Uh, what about the reporting.  Oh, we should probably deliver some reports, 3 months after go live.

Reason being, data never got respect.  The Johnny Dangerfield of IT.  Nobody took consideration of the data, the downstream reporting aspect.  No wonder the data was so difficult to work with, not joining to other data sets, 20 tables to get a query to work, left outer, right outer, inner, cross, why are these queries taking so long to execute, why don't these number match the other reports, or these numbers are great, except they're a month old stale, sure would have been great at quarter end.

So how do we fix things?  Go to the root of the matter.  Tag the data at inception time of creation.  How do we tag it, by self referencing the key aspects pertinent downstream, perhaps an XML tree like structure, self referencing and self describing.  Future developers or automatic ingestion programs could add to the tree downstream, original value, original field type, original date, then who modified, when what is new value, perhaps why, etc.

Self describing data, down the the database -> schema -> table -> field -> row -> exact field.

Make it generic, use a standard format template framework, using XML for each field, table, schema, database.

And make it consumable by a generic report reader.

But we'd have to modify our storage databases, to handle this new self describing XML format.  Okay, modify them.  That's considered progress.

So what are the benefits of tagging data at time of creation?

  • Accountability
  • Audit Trails
  • Consumable data ingestion without tagging time for Machine Learning (unassisted by human intervention)
  • Consistent Patterns
  • Transparent Data Sets
  • Public Open Data Sets
  • Store data for life
  • Certify Data Sets

What if we stored the type of data, the key elements available for joining to other data, we could create programs to read the WSDL self referencing files, have the program perform the joins unassisted, throw that data into a model, without having to tag the data, for unsupervised machine learning processing.

How about expose that data via Web Service, so people can pull that data, in ODATA format, or JSON format, or XML format, you name it.

And how about receiving data directly from Internet of Things (IoT) devices, process events and micropings of data across the internet.

And how about reading from BlockChain distributed ledgers across the globe, exposed via Web Service.

Or how about sending data to a Web Service, and storing the returned data contents back to your system.  Here's a row of data, which specific characteristics, what is the result, let's store that off, and build models off that.

So that opens up a market for providing pre-built Web Service data, for subscription or per data item fee.

The potential is unlimited.

Self Describing data, tagged at time of creation.  Even Johnny Dangerfield got some respect eventually.

See similar post: http://www.bloomconsultingbi.com/2018/04/apply-structure-to-data-sets-for.html


Apply Structure to Data Sets for Accountability to Ripple Across Globe

When I began coding in Visual Basic 4, 5 and 6, there were many advantages.  Such as rapid application development, code was dynamic in that entry level people to accountants to highly complex code could be written and maintained over time.  The downside, the code was not object oriented and so not considered a true language by the core programmers.  It was soon shelved for the most part although code still exists in production in many shops as well as VBA embedded in Office products.

When I programmed Java J2EE, the in thing were frameworks like Struts or Hibernate, the list has grown tremendous.  The nice thing about frameworks are the consistency, so you could dive into someone else's code and get up to speed fairly quick.  It also segregates things into nice buckets, like front end UI or middle tier layer or back end database layer so people can become experts in specific niches.  Each iteration of framework seems to correct some things and possibly cause new issues or obstacles and there's always the backwards comparability issue and having to completely re-write applications.

Let's take a look at the world of data.  Data gets created and stored in dozens of applications including Microsoft SQL Server, Microsoft Access, Oracle, Sybase, MySQL, and the list goes on and on.  We started with VSAM Flat files on the mainframe, but have grown into Cloud based solutions, Graph databases, NoSQL value / pair database, and Hadoop storage using HDFS or Storage in the cloud across clusters of servers.  Data is data.  There're different types for strings and numbers and blobs, there's XML and JSON to send data to other systems, places, people, there's Excel to manipulate, store and transport data and we can access reports via web, mobile, email, network drives, PDF and we can schedule that data to run specific times.

When XML first arrived on the scene, there was mention of replacing Object Oriented coding patterns, perhaps premature speculation.  Except you could flow data using XML and apply a transformation using XSLT.  And for the web pages developers, we have CSS to apply standard tags to elements so they display in certain way, a way to organize and formalize the structure and output of web pages.  Although no design or architecture is bullet proof to handle every possible scenario, these are good efforts to apply standards to technologies.

Back to the world of data, why don't we have similar standards.  Why don't we have the ability to tag our output data sets with specific information.  For example, where was the date created, in what system, date time stamp, format of data upon inception.  And then have tags along the way, so when the data gets sent to others, we have an audit log of who touched the data when, how, and perhaps why, maybe throw in some IP address', some user ids, etc.  Perhaps embed these attributes within the data, as a self referencing data view-able to others along the trail.  So we have some accountability.  This would benefit the proliferation of data, especially in the world of open data sets.  Not just a text file describing the data, but row level and field level attributes as well.

In addition to audit trails, perhaps apply attributes such as field description, such as text:varchar(27), float(9,2), etc. so the data could be plopped into another reporting tool without modification or effort.  Perhaps having a generic report format, with a generic report reader, that can read any data set, with some features to modify the data once in the tool, like group by this field, sort by this, filter by this.  Yet it would be generic and not specific to Windows or Linux or Web or Mobile, it would be completely transportable and completely dumbed down such that it would simply appear in the report reading / writing tool and the user could be off and running, with the knowledge of where the data was derived from and who touched it along the way.

Lastly, we could offer some type of certification as the data.  This set of data was validated on such and such date, by whom, what the data consists of, and is available on the web, for free, or perhaps purchase, bundled and ready to go.  Think about the downstream ramifications, we tagged this set of data with key attributes, which is validate and certified, and you can obtain this data set, to load into your machine learning systems or artificial algorithms for perhaps unsupervised AI.  And if others exposed their data as well, you could have unsupervised AI talking with other AI all day every day creating and updating models in real time with a certain degree of accuracy and accountability via audit logs.  Throw in Internet of Things (IoT), and you have devices talking with servers and proliferating that data to others in real time for a pinball effect of data bouncing all over the place in a network of machine learning AI model based digital world.

See similar post: http://www.bloomconsultingbi.com/2018/04/self-describing-data-tagged-at-time-of.html


Hand Writing to become Dead Language similar to Latin

I don't speak Latin.  Nor do I understand hieroglyphics.  Because they aren't taught in everyday schools, except medical schools for Latin.

How many times per day do you write something, with a pen or pencil.  For me, probably never.  Except at the bank, filling out deposit or withdraw slips.  Or perhaps signing name on credit card machine using fictitious pen applied to screen.

Seriously, I don't write anything with pen and paper anymore.  There's no need.  I jot down notes using Google Keep, its accessible on smart phone and laptop.  I keep to-do list, notes from appointments, ideas that pop into head, lists of pro's and con's on variety of subjects and decisions.

I feel that standard writing will soon go away.  You're already seeing signs of this regarding cursive handwriting.  Typing is the current trend.  And that's probably not the definitive input device, as voice commands are already quite common, on the smart phone as well as car voice commands.

Yep, basic handwriting may go the way of Latin, perhaps taught in schools, as a dead language, handwriting could become a thing of the past.

If you have comments or questions on this blog post, send your hand written comment, we'll be sure to reply as soon as possible.

And so it goes~


Massive Data Breech of Significant Proportions

There's a lot of commotion in the news, justly so, regarding data privacy.  Many corporations are selling data.  Some of this data could be considered private, as in medical, as in perhaps Hippa data.  That's sort of a no-no.

You have to look back at the Equifax data breech, to realize the implications.  Before doing reports, I approved loans, as in reading credit reports for Equifax, TransUnion and TRW.

Credit reports tell us everything about a person, their shopping habits, payment records, previous address', any bankrupcy/foreclosure/unpaid student debt, etc., in a nice, easy to read format.  I could ask the customer their basic demographics over the phone, enter into the mainframe terminal at my desk, and pull the credit report, have a decision with an average talk time of under 2 minutes.  That's average, meaning many calls were under 2 minutes.

What does the app + credit report tell us.  It tells us their potential repayment probability, plus, throw in some good old fashion experience plus hunches, into the credit decision to approve or decline, you'll receive a letter in the mail in to 7 to 10 business days explaining the reason why.  

Of course, there were over-rides to the system, but not many, maybe 10%.  So we agreed with the computer model 90% of the time.  And the model bounced up the demographic info such as age, address, length of time on job along with credit report and spit out an answer almost immediately.  It would compare against a model of past customers and their payment history.  We do the same thing now, yet its called data science.

My boss said, from reading the customer's credit report, you should know if they drink whiskey, beer or milk.  In other words, we should have a good picture in our minds of who this person is based on the credit report and demographics.

Back to the main story, with a data breech of credit reports, that data can be constructed to form a financial picture of every person in that data set, not to mention all the credit card info, loan info, social security numbers, address and previous address, alias, you name it.

Its a massive breech of significant proportions.  And that data is swimming through the hands of dark web, being sold and resold.  How about that, food for thought.

And so it goes~!

For Data to Truly Become Valuable

Data is the new Oil.  Data is the new Gold.  Data is the new energy from the Sun.

If so, why haven't citizens latched on to something of such great wealth.  Seems only business' and organizations are reaping benefits of connecting the dots through data to find insights.

Why haven't average Joe's reaped the benefit of data intelligence.

Primarily, they don't have access to their own personal data, yet their data is everywhere.  People don't know where there data is, how to access it, how to consolidate it and how to report on it, to gain insight.

If data is going to really grow into its hype, it needs to trickle into the living rooms of everyday people.  Then, it will truly be of value to everyone.

Currently, only business' are reaping the rewards, using the people's data, sometimes, without their knowing.

And so it goes~!

Throwing in a picture of a deer, cute!


Compete with Autonomous Beings

Autonomous Intelligence is the future.  And here's why.

When you look at any product or service, the most expensive part of the equation is human capital resource.

In ongoing effort to reduce costs and increase profit, from caveman days to printing press to steam engine to factor lines to personal computer to now, the bottom line is what counts.

When you consider movies based on the future, we see some patterns emerge.  Let's take Star Wars as example.  You have some humans directing things from above, and you have Android beings carrying out orders.

Androids are portrayed as non gender specific, they don't use toilets, they don't stop during the day at the bank to do withdraw, because they don't purchase goods or services.  In fact, they own no belongings, which prevents money flowing into the economy on buying goods and services.

In fact, the only things that belong to an Android is their shell or hardware, the software used to run the thing, and its energy supply.

No sick days.  No insurance.  No bathroom breaks.  No maternity leave.  No pensions.  No yearly costs of living increases.  No Unions.  No nothing.  A fixed asset with predictable life span amortized over time with some ongoing maintenance.

How can humans compete with that.  This is just one scenario based on one movies series.  Obviously there's some wiggle room for error and standard deviation.  However, suffice to say, from a cost savings benefit, humans may be sent out to pasture with the printing press, steam engine.

Perhaps time will tell.


Train Legacy Programmer or Bring in Expert

Hey there's a hot new technology.  Cutting edge.  Seems really cool.

Yet you work full time for an organization.  That organization doesn't jump into cutting edge technology for variety of reasons.  So you continue to work on code that was hot 5 years ago.

And then, you get wind that a new project is starting.  They'd like to use new technology.  Does the org let you get your feet wet and tackle the new project with new technology.

Or do they bring someone in from the outside that already has experience.

What do you think?

Train legacy programmers or outsource cutting edge technology.

How does one get experience using new technology without having experience using new technology.  That's another age old question.

From what I've seen, some people claim to have experience when they don't.  Others have experience, yet if you look under the hood of code at their "experience" project, its a nightmare.  Yet they get the work, at much higher rate, and the full timer gets to support the app after its thrown over the fence.  And get the bugs out, down the road.  While the guy / gal who wrote it is long gone, making hefty chunk of change.

This is an age old question in the industry and affects lots of programmers.  Yes, the full timer has some degree of stability and job security, steady paycheck.  It's a question each full timer must ask.

And so it goes~!

Go Deep or Wide

We all know what we know.  Some are good at what they do.  Yet that's all they do.

The best baseball player in the league, may not know how to balance a checkbook.

The best financial guru may not be a great tennis player.

We know what we know.

You know.

It's when you realize how much you don't know that you become aware.  Once aware, you realize just how big the world is.  And how much there is to learn.

We cannot know everything.  Yet we can learn about what interests us.  Or what's useful to fortify our position.

This surrounds the age old question.  Do we focus on specific items to learn to become expert, or do we go wide and learn a little about a lot, high level knowledge.

One thing to keep in mind, if you go deep on specific topic or area, when things change, you are no longer the expert and have to re-learn.  If you go wide, you never become expert in specific thing.

So expert level in specific niche, or high level knowledge about lots of subjects.

And there you have it~!


Sports Prepares Us for Adult Career Life

When you play sports the goal is to win.  Could be individual sport or team sport.

When you play individual sports, like swimming or tennis, your individual effort determines your outcome.  That means you two things: nobody to lift you up & nobody to pull you down.  Yet, you may be part of a team that plays another team of individual events.

When you play a team sport, you have members of your team, each with different positions such as baseball or soccer.  Everyone knows their assigned responsibility and work as a unit to overcome the opponent.

People who grew up playing sports, in my opinion, have something extra they bring to the workforce.  Workers who play or played sports have mental toughness.  That toughness shows through in their everyday work, as well as crunch time when pressures rise.  Like a big deadline, big demo, big sales call.  Sports players tend to rise to the occasion and push the nervousness to the side in order to take care of business.

Team players also know the goal is to win as a team, such that, you help out when possible and ask for help when needed.  There are small battles that need attention every day, but the end goal is to win the effort.

Sports players also tend to bounce back after tough days as they know we all have hiccups along the way.  The trick is to view the event, learn from it, and move on.  Tomorrow is another day.

I remember working in the branch of a finance company.  One day I did particularly well and sold a lot of loans, to exceed the monthly quota.  Next day, the boss said yesterday's sales are over, today is a new day, start from scratch.  And that's the mentality you need in the workforce, never rest on yesterday's wins, never accept defeat from yesterday's loss.  Next day start anew.  To bring your A-game.

People that grew up playing sports tend to have a good attitude in the workforce overall in my opinion.  Sports can teach us a lot about the business world.  Playing sports in youth prepares for the battles of life, this includes 40+ year career.  Do you witness this in the workforce - yes or no?


Real Time Learning

With the advent of cloud architecture, we no longer do specific things.  We do a variety of things in variety of ecosystems for a variety of clients doing a variety of offerings.

So you work with data.  You may build and design databases.  You may load databases.  You may report on databases.  You may build cubes.  You may build solutions using web based product.  You may build solutions using on-premise products.  You may build build hybrid solutions across variety of vendors and variety of technologies.  You may work with relations databases.  You may work with pair valued databases.  You may work with large data sets, location data, disparate data sets, you may work with graph databases.  You may embed your solutions within other applications.  You may build dashboards that pull data across the internet.  You may scrape data from the web in real time.  You may build models stored in the cloud.  You may call other people's models.  You may build machine learning applications.  You may build Artificial Intelligence applications.  You may do all of the above a pieces of each.

You no longer simply do a specific thing.  You do many things.  And those many things will evolve and change over time.  We are working in a moving environment.  Things are changing in real time.

When someone asks if you have x years in specific technology, that's really not the question.  The question is how do you apply past experience to latest technology.  Because nobody knows every technology.  And even if you did, it's already changed.  Specific years is outdated.  Real time learning is where its at.