10/31/2018

Ramifications of Ignoring Sensitive Data within Data Ecosystems

Working with data since 1996 Crystal Reports and Crystal Info version 5, data has been the main focus for a while now.  Saw the rise of data model cubes in OLAP, big data and now Cloud.  Data is now the lifeblood of many orgs, or soon to be, once they realize the hidden value.

Self Service has opened up the vaults of data to everyday employees, to manage and find the pulse of an organization.  However, there are some things to be cognizant of.  Sensitive data.

Security must be baked into the flow of data.  And by that, we are talking about sensitive data.  Some data is not meant to be viewed by everyday eyes.  Some examples are: HIPPA, PCI, PII (Personal Identifiable Information), GDPR,  as well as internal sensitive data (salary, sales quotas, etc.).

When creating models for data warehouses or consumable data, some forethought is required.  Rules must be established up front, documented, and managed by the Data Steward and or Data Governance Committee.

This requires established and defined processes for documented the data model, in the form of Data Dictionary and / or Data Catalog.  This segment has really taken off and lots of products available both on-site and in the Cloud, as well as simple Excel files shared on network, SharePoint, Office365, etc..

In addition, data must be identified, if it falls into one of the sensitive buckets.  Should everyone see Social Security numbers, address information, medical information, credit cards, salary, etc..  This data should NOT be viewed by everyone, and there are laws that reinforce this, with pre-defined penalties that can vary and be quite expensive.

Once identified, data should be "masked" from prying eyes, based on a pre-defined security model, processes must be in place to handle data that is compromised.  Not only that, with new GDPR rules, there must be processes in place to handle "data requests" from any customer with established citizenship in Europe.

That means, if a customer places a data request, the organization must have a process to provide the customer's data, how its being used, if data is provided to third parties, and how long the data is being held.  This is no easy task.

You must know every database which the customer's data reside, which could be the sales database, marketing database, financial database and everything in between.

And if the customer wants there data removed, there must be processes in place to eradicate the data from all places, once no longer needed.

Imagine what this entails.  Sounds overwhelming, because, it is.  This involves companies to work in tandem with IT, legal departments, up the chain of command, because not complying can be costly.

The rise of data driven organizations has proliferated the data to all corners of the company.  And the amount of data created is exploding exponentially.

So we have competing mandates.  Use data to gain competitive advantage, yet establish processes to safeguard sensitive data, escalation and notification of data breaches.

We must remain diligent in creating pre-approved processes to handle our data, the lifeblood of the org.  No longer can we throw together half baked data initiatives in constant emergency mode to handle immediate request.

We must have C-Level buy in, which evangelize the importance of creating pre-defined processes to handle multiple variations of established rules and laws to safeguard customer data.  These rules must trickle down from the top.  The penalties for non compliance can be great if not followed and should not be taken lightly or brushed under the rug.

What does this mean for data professionals?  We need to be aware of these rules and bake them into new projects and / or retrofit existing data ecosystems.  Will this cost money.  Yes.  Will this take time from existing projects.  Yes.

At this point, we must take sensitive data seriously.  Because we can expect more rules down the road and there's no better time to prepare than now.



10/24/2018

Hadoop Changes Course with Recent Merger of Top Two Vendors

When I hopped aboard the Hadoop "express" around 2012, I really thought it would take over the world of data.  There was so much hype and talk about the expectations and progress and adoption.  Yet, when speaking with industry folks, there weren't many use cases in production, mostly used as a store house to keep all the data, at relatively low costs, across cluster of commodity hardware.

And as we know, it was primarily Java back then, and Java folks weren't traditionally Data folks, and Data folks didn't know or care to know Java.  Catch 22.  Plus the administration side, managing volumes of hardware infrastructures, including the software required to keep the thing running, there was no easy path from legacy SQL developers into the world of Hadoop.

And the hype continued, lots of new players, connectors, new Apache products, then 3rd sort of dropped from the top 3 vendors providing proprietary Hadoop experts and consultants.  And now, it appears the two rivals, Cloudera and Hortonworks, have teamed up, in an buyout / merger, combining consolidation of industry expert Hadoop vendor.

Keep in mind, there's also a Cloud based version of Hadoop, from Microsoft, called HDInsight, which contributes open source code to the Apache projects.

I guess everyone is waiting to see the next steps in the life cycle of Hadoop, how the customers will react, how people will know which flavor to run, maintain and support going forward.  All in all, Hadoop is a great product, designed for specific purpose, to store large volumes of data across commodity hardware, fault tolerant and integrate with other products.  However, it never matured to the point of accepting up to the minute, transaction level production databases, as in inserts, updates, deletes in real time, therefore, Hadoop is now another tool in the Data Tool Belt.  A great product, for specific use cases.

And so it goes~!



10/23/2018

Azure Data Box Basics

Azure Data Box is a new addition to the Azure Data Ecosystem.  It allows portability of large data to Azure.  Was introduced as Preview in 2017.  

You can find Azure Data Box in the Azure Portal:






To get started, click here.


10/22/2018

Connect to Azure Analysis Services Tabular Model from Uploaded PBIX file

Decided to explore Azure Analysis Services today.

To summarize steps to connect to Azure Analysis Service, we: 


  • Connected to Azure
  • Created a Resource Group
  • Created Azure Analysis Service
  • Pointed to Power BI PBIX file (uploaded)
  • The Power BI file contained the sample Model
  • Downloaded drivers, to connect to Azure Analysis Model from on-premise
  • Added user as Admin to the AAS model
  • Set Firewall rule to allow user's IP Address to tunnel through  


Here's the steps screenshots:

Create single resource group:




Resource Group Added:




Add Analysis Services, select from different pricing options:




Azure Analysis Services added:




Attempt to add Model using Power BI pbix file:




Error, must be higher version file / model:





Fix this by grabbing newer version of Power BI Desktop.  So, Downloaded latest version of Power BI Desktop & Install...



Installing Power BI Desktop:





Installing...




Success...




We opened the Contoso downloaded PBIX file with newer version of Power BI Desktop, then saved file, no other changes...




Attempted upload again into Azure Analysis Services, it accepted the PBIX file this time, and loaded its Model, shown here...









Added table, created a diagram...




Online editor allows the model to be viewed in variety of tools, show here... Visual Studio Projet, Power BI Desktop or Excel...




Can view the table Relationships in the online editor...



And here we see the saved latest Model, after file was uploaded from Power BI Desktop, into Azure Analysis Services...






The model is functional...




We selected a few measures, we see actual data, in the browser...




We can view/see the underlying DAX code in online editor as well...




From the Azure portal, we can see the Azure Analysis Service info...





In order to connect from on-premise, we need to download the MSOLAP drivers...




Installed drivers...




Now we need to add a user from our local Active Directory in Azure Analysis Services.  We type in user email, it auto generates an email to the person.  User simply clicks on the button, gets added to Azure Analysis Services Admin...




We need to add a Firewall rule to allow on-premise user ability to tunnel through and see the model in Azure, added my local user…




We grab the URL Connection from Azure, place in SQL Server Management Studio 2017 (SSMS), and connect to Azure Analysis Services.  Here you can see the Model, Connections, and Tables...




Now, we can connect to Azure Model from Power BI Desktop.  Created new report, added same Azure Connection here...





We didn't specify exact database as its optional...




Can see the Model in Azure using Power BI Desktop...


Added a few measures and row(s), to build a quick report:  





Basically its pulling from Azure Analysis Services, from on-premise Power Bi Desktop, connected using AD account which got added to Azure model, after confirmation, and opening firewall for specific IP address...

And the nice thing about creating a single Resource Group, we can blow all the artifacts away in single swipe...




And that completes our Azure Analysis POC to 


  • Upload a Power BI sample Model to Azure Analysis Tabular Model, 
  • Create model using online tool
  • Add user to connect from on-premise
  • Set firewall rule to open specific IP Address
  • Create connection(s) from online applications: SSMS and Power BI

It should be able to connect from SSRS as well using the same connection and user info.  Why would we do this?

One reason is to "productionalize" a useful Power BI report to the Enterprise, maybe the report has useful information, vetted calculations, lock down users and row level security, leverage the power of Azure Analysis Services Tabular format, which extends the DAX language, for easy consumption, from variety of applications either in the Cloud or On-Premise.

And of course, you can follow similar steps to connect to Azure Analysis Services that were not built from Power BI Desktop PBIX files as well.

Thanks for reading~! 



Reference URLs:


Tutorial: Add a sample model from the portal
https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-create-sample-model



Download Power BI Desktop:
https://www.microsoft.com/en-us/download/details.aspx?id=45331



Download sample PBIX file ( ContosoSalesForPowerBI) :
https://www.microsoft.com/en-us/download/details.aspx?id=46801



MSOLAP Data Providers:
https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-data-providers

Install Microsoft Machine Learning Server & R Client

Today we installed Microsoft Machine Learning and R Client.  Documented the process:






























And some reference links:

Machine Learning Server


Install Microsoft R Client on Windows

Quickstart: Run R code in R Client and Machine Learning Server
https://docs.microsoft.com/en-us/machine-learning-server/r/quickstart-run-r-code



Thanks for reading~!