Data Catalogs will Create Structure and Fortify your Org

We've seen the rise of the report writer, to the data modeling OLAP Cube developer, to the Self Service proliferation of data to the masses, to big data, to cloud data, and streaming data, and machine learning algorithmic data, to unstructured and semi-structured data and even graph data.

We've seen the rise of the Chief Data Officer, a well need position to fill the gap between who owns the data.  IT doesn't own the data, the business does, so we're told.  The CDO regulates the data, decides what vendor products to use, to determine what staffing is needed, to build in-house or purchase pre-canned software, what licensing, maintenance agreements, what systems to shut down because they're outdated, broken or cost hogs.  And lastly, the CDO documents the flow of data, from low level process to high level enterprise, into diagrams and data dictionaries.

In my opinion, Data Dictionaries have risen to the top of the heap, or at least they should be.  Because Data Governance should be a huge priority, right behind security, and because new laws are coming down the pike, to install data privacy.  Not only that, Data Dictionaries are the glue that bonds the entire organization.

You see, the data flows in and out of every department.  What makes the data folks valuable is they look at the entire picture, document it, and build data flows to extract and roll up ALL the data.  That insight into the deep bowels of the org are what provide the value.  Because they see all the tentacles, the nasty processes, sometimes manual, many times actually, and the business rules never documented or layered in some Excel or Access application 9 layers deep.

In order for a company to flourish, have competitive advantage, and to soon comply with data mandates, every organization needs to be cognizant of the concept of the Data Catalog.

A Data Catalog knows where all the data lives, in real time.  It knows where the duplicate fields are, it knows the data types, it knows where the data got created, perhaps how it flows through they system, with each hop along the way.  Some data dictionaries provide a glimpse, they're easy to spin up, not easy to maintain, but they do the job.

The more advanced Data Catalogs are perhaps Cloud based, have built in intelligence as in machine learning, to scour the internal data ecosystem regularly, to extract the data metadata in every corner of the business.  They can scan database logs to look for heavily used tables or queries or fields.  

Companies need structure, around the data, which encompasses the process and people and flow of business.  Data Catalog fill a hole in the data ecosystem to document the data, so IT and the Business know and understand the data, in order to extract knowledge, to gain insight, run the business, lower costs, and streamline the processes.  Documenting processes seams rather boring perhaps, that's why some vendors have gone a step further and automated chunks.  Because having the data updated automatically takes the heavy lifting off full time resources, so they can tend to other issues and tasks.  And by automating, it gets done regardless of roadblocks.

One last thing, having a Data Catalog helps to support the Data Governance team do their job.  Which is to structure the orgs data, dictate definitions of things, hierarchies, processes, when and how new elements are added or deprecated, to create a holistic, company wide roadmap of the data ecosystem.  As mentioned on this blog in the past, new laws are here from Europe, perhaps Canada and soon to be others, to make sure your org has structure in place, to keep customer data private and not fall into the hands of hackers.  And purge data when no longer needed.   And account for 3rd party handlers of data.

So getting a Data Catalog is not just a cool new concept, it should be baked into the heart of every organization to assist and steer the direction of the org, to tap into the pulse, make assessments on set frequency, so you can pivot on a dime, if the situation demands.

Data Catalog's structure your organizations data and unite departments.  Its the glue that bonds the org together.

Thanks for reading~!