Ramifications of Ignoring Sensitive Data within Data Ecosystems

Working with data since 1996 Crystal Reports and Crystal Info version 5, data has been the main focus for a while now.  Saw the rise of data model cubes in OLAP, big data and now Cloud.  Data is now the lifeblood of many orgs, or soon to be, once they realize the hidden value.

Self Service has opened up the vaults of data to everyday employees, to manage and find the pulse of an organization.  However, there are some things to be cognizant of.  Sensitive data.

Security must be baked into the flow of data.  And by that, we are talking about sensitive data.  Some data is not meant to be viewed by everyday eyes.  Some examples are: HIPPA, PCI, PII (Personal Identifiable Information), GDPR,  as well as internal sensitive data (salary, sales quotas, etc.).

When creating models for data warehouses or consumable data, some forethought is required.  Rules must be established up front, documented, and managed by the Data Steward and or Data Governance Committee.

This requires established and defined processes for documented the data model, in the form of Data Dictionary and / or Data Catalog.  This segment has really taken off and lots of products available both on-site and in the Cloud, as well as simple Excel files shared on network, SharePoint, Office365, etc..

In addition, data must be identified, if it falls into one of the sensitive buckets.  Should everyone see Social Security numbers, address information, medical information, credit cards, salary, etc..  This data should NOT be viewed by everyone, and there are laws that reinforce this, with pre-defined penalties that can vary and be quite expensive.

Once identified, data should be "masked" from prying eyes, based on a pre-defined security model, processes must be in place to handle data that is compromised.  Not only that, with new GDPR rules, there must be processes in place to handle "data requests" from any customer with established citizenship in Europe.

That means, if a customer places a data request, the organization must have a process to provide the customer's data, how its being used, if data is provided to third parties, and how long the data is being held.  This is no easy task.

You must know every database which the customer's data reside, which could be the sales database, marketing database, financial database and everything in between.

And if the customer wants there data removed, there must be processes in place to eradicate the data from all places, once no longer needed.

Imagine what this entails.  Sounds overwhelming, because, it is.  This involves companies to work in tandem with IT, legal departments, up the chain of command, because not complying can be costly.

The rise of data driven organizations has proliferated the data to all corners of the company.  And the amount of data created is exploding exponentially.

So we have competing mandates.  Use data to gain competitive advantage, yet establish processes to safeguard sensitive data, escalation and notification of data breaches.

We must remain diligent in creating pre-approved processes to handle our data, the lifeblood of the org.  No longer can we throw together half baked data initiatives in constant emergency mode to handle immediate request.

We must have C-Level buy in, which evangelize the importance of creating pre-defined processes to handle multiple variations of established rules and laws to safeguard customer data.  These rules must trickle down from the top.  The penalties for non compliance can be great if not followed and should not be taken lightly or brushed under the rug.

What does this mean for data professionals?  We need to be aware of these rules and bake them into new projects and / or retrofit existing data ecosystems.  Will this cost money.  Yes.  Will this take time from existing projects.  Yes.

At this point, we must take sensitive data seriously.  Because we can expect more rules down the road and there's no better time to prepare than now.