Happy Everything!

Merry Christmas & Happy Hanukah & Happy New Years!!!

From the Blooms!


Domain Knowledge is Key to Successful Data Projects

Data is a primary factor in decision making.  All shapes and sizes and formats.  Stand alone data is good, but its only part of the story.

What is domain knowledge?  Its how the business operates.  Perhaps industry specific, or company specific, or department specific.  All the rules that exist, including product information, customer information, market knowledge, legacy information, all under the domain knowledge umbrella.

What sits between domain knowledge and tangible information nuggets for decision making, well obviously the data, blended with domain knowledge in the form of business rules.

The trick is to know the data inside and out, and know the domain knowledge.  How is that information obtained?  By meeting with the business, asking them what they do.  Yes, I see that, can you explain this, and what do you mean when you say that, and how does this happen, what are the exceptions.  You won't find that information in some document or manual, yet that's the gold mine.

You see, that information is dissected and translated into rules.  Those rules get spliced into the flow of the data, in the Extract Transform and Load process, typically, you'd be surprised how much business rules are nestled in Excel files that call other workbooks with VLOOKUPS that call other VLOOKUPS 9 layers deep that reference a file on the accountants desktop, none of it documented or in source code repository.

Having solid data framework is not enough.  Having documented business processes to extract the domain knowledge is key.  In addition to data governance, data dictionary, data stewards, master data management, change management, documented code.

Documented code is also key.  Have you ever opened a stored procedure 6000 lines long, calling views that call views, with temp tables calling temp tables, with cursors that loop through tables, without a single line of comments to let the next developer know what's happening.  As you may be aware, SQL developers have upward trajectory and can find new jobs fairly quick, so you may be the 4th developer to touch the code, everyone else is gone, you have no idea where to begin.

When I started first coding job, the senior developer would sit at my desk, open the code and walk through line by line.  Picking out flaws, why are you doing this, when over here you did it like this, why are you trying to be fancy with the code, better to write a function in 25 lines that's easy to understand, than write 5 lines that nobody can interpret, and comment everything, and trap all errors and flow upstream, and make it easy for the next maintenance coder to maintain this thing.

Domain knowledge is underrated asset for internal employees, infrequently documented, littered throughout the code across various departments.  Getting to that information is the task of a data consultant.  And then translating to business rules, that get documented and applied in code.

If you can do that, you can solve the clients problems, by deciphering business rules into code to obtain nuggets of gold information, to make decisions and grow the business.  That's the way I see it.

And here's my dog Sammie, eating a shoe~!


Data Catalogs will Create Structure and Fortify your Org

We've seen the rise of the report writer, to the data modeling OLAP Cube developer, to the Self Service proliferation of data to the masses, to big data, to cloud data, and streaming data, and machine learning algorithmic data, to unstructured and semi-structured data and even graph data.

We've seen the rise of the Chief Data Officer, a well need position to fill the gap between who owns the data.  IT doesn't own the data, the business does, so we're told.  The CDO regulates the data, decides what vendor products to use, to determine what staffing is needed, to build in-house or purchase pre-canned software, what licensing, maintenance agreements, what systems to shut down because they're outdated, broken or cost hogs.  And lastly, the CDO documents the flow of data, from low level process to high level enterprise, into diagrams and data dictionaries.

In my opinion, Data Dictionaries have risen to the top of the heap, or at least they should be.  Because Data Governance should be a huge priority, right behind security, and because new laws are coming down the pike, to install data privacy.  Not only that, Data Dictionaries are the glue that bonds the entire organization.

You see, the data flows in and out of every department.  What makes the data folks valuable is they look at the entire picture, document it, and build data flows to extract and roll up ALL the data.  That insight into the deep bowels of the org are what provide the value.  Because they see all the tentacles, the nasty processes, sometimes manual, many times actually, and the business rules never documented or layered in some Excel or Access application 9 layers deep.

In order for a company to flourish, have competitive advantage, and to soon comply with data mandates, every organization needs to be cognizant of the concept of the Data Catalog.

A Data Catalog knows where all the data lives, in real time.  It knows where the duplicate fields are, it knows the data types, it knows where the data got created, perhaps how it flows through they system, with each hop along the way.  Some data dictionaries provide a glimpse, they're easy to spin up, not easy to maintain, but they do the job.

The more advanced Data Catalogs are perhaps Cloud based, have built in intelligence as in machine learning, to scour the internal data ecosystem regularly, to extract the data metadata in every corner of the business.  They can scan database logs to look for heavily used tables or queries or fields.  

Companies need structure, around the data, which encompasses the process and people and flow of business.  Data Catalog fill a hole in the data ecosystem to document the data, so IT and the Business know and understand the data, in order to extract knowledge, to gain insight, run the business, lower costs, and streamline the processes.  Documenting processes seams rather boring perhaps, that's why some vendors have gone a step further and automated chunks.  Because having the data updated automatically takes the heavy lifting off full time resources, so they can tend to other issues and tasks.  And by automating, it gets done regardless of roadblocks.

One last thing, having a Data Catalog helps to support the Data Governance team do their job.  Which is to structure the orgs data, dictate definitions of things, hierarchies, processes, when and how new elements are added or deprecated, to create a holistic, company wide roadmap of the data ecosystem.  As mentioned on this blog in the past, new laws are here from Europe, perhaps Canada and soon to be others, to make sure your org has structure in place, to keep customer data private and not fall into the hands of hackers.  And purge data when no longer needed.   And account for 3rd party handlers of data.

So getting a Data Catalog is not just a cool new concept, it should be baked into the heart of every organization to assist and steer the direction of the org, to tap into the pulse, make assessments on set frequency, so you can pivot on a dime, if the situation demands.

Data Catalog's structure your organizations data and unite departments.  Its the glue that bonds the org together.

Thanks for reading~!

The Modern Data Ecosystem is the New Kid in Town

What better than to create something out of nothing.  As kids we learn to build with blocks.  Such a rudimentary game, yet the underlying principles are timeless.

You've got different shapes and sizes.  You can stack high, wide, whatever you want.  These building blocks are the underlying foundation for today's shift in technology.

ETL is a block.  So are source systems.  Staging databases too.  And Data Warehouses.  Self contained units of substance.  You can build each block traditionally, on-premise.  Or you can take pieces and apply the same block in the cloud.

Source system on-premise.  ETL to the Cloud, just lift and shift (yes another overused term today).  There are lots of connectors to lots of data sources run from on-premise or Cloud.  Just pick up the block and move it to the Cloud.  Store in Data Lake, use bits or pieces as needed.

From there, the Staging block fits nicely, many options to choose from, same with Data Warehouse.  Merge with other data sets.  Options are limitless.

Each stage along the trail is simply a block, and the way technology is evolving, it doesn't matter where it lives.  Technology has broken each piece into re-usable patterns with no predefined home.

A good example is SQL Server Managed Instance, basically took the high end version and ported to the Cloud.  And of course, there are other options that run as a service.  There is a solution for almost every scenario.  Or take bits and pieces and apply the to your unique environment.

But not only that.  That's the easy part.  All the other blocks play nicely with the other parts.  Its a smorgasborg of various components that adapt and connect with other pieces of technology.  The vendors are staying a step (many steps) ahead of the developers, with high level architecture in mind.  The new Modern Data Warehouse has links to traditional Data Warehouse, yet the MDW has grown exponentially.

It really is mind blowing when you think back to the days of single reporting tool, against a single database, with a single select statement, running against a production database, slowing down the entire application for the entire company, is someone running a large query or month end in production?.

Those who work in data, and carved out a niche on a single specific technology, will need to revamp their game.  DBAs, Query Tuning, Report Developers.  The world of data is in completely new territory, and there's no turning back.  The Modern Data Ecosystem is the new kid in town.


Getting Started with Azure Data Studio Extension SERVER DASHBOARD

Azure Data Studio is a new product which is similar to SQL Server Management Studio, yet way cooler.  You can manage traditional SQL Server Databases, write T-SQL Queries, as well as new features.

Azure Data Studio has a plethora of Add-In features.  

In this demo, going to outline a new feature: Server Dashboard for Azure Data Studio

You can find the tutorial steps here as well.

After opening Azure Data Studio, we click on "Extensions" button on the left hand side panel, select "Server Reports":

Next, we execute a statement to Alter our database set QUERY_STORE = ON:

I had to add permissions to the local SQL Database to execute the query, it did run:

Next, in Azure Data Studio, a cool new features select View from the dropdown list, select "Command Palette..." a window opens, a list of commands appear alphabetically, type in "SETTINGS", select "Preferences: Open User Settings":

Next, select Dashboard Database Widgets:

Next, follow the web tutorial instructions:

In Azure Data Studio, you will see 2 panes left and right.  The Left hand side is not editable (you can't eat it!), so what I had to do is copy the contents from Left hand pane, paste into the Right hand pane, splice carefully or the JSON will show red highlight that it's not formatted correctly:

When you close out, it will prompt to save, or click Ctrl+S :

Next, back in the Servers window, select the Server, right click, select "Manage":

A window opens to the right, containing our new widget "SERVER DASHBOARD":

Not much activity on this database, if there were, you can analyze deeper.

And that concludes our demo on Azure Data Studio Extension for Server Dashboard.  There are many other Extensions to download / install as well so this should keep us busy for a while.

Thanks for reading~!


Getting Started with Dynamic Data Masking in Azure SQL Database

Azure SQL Database provides Dynamic Data Masking out of the box. 

This blog reviews the commands required to set up a Dynamic Data Masking, and bundles in Row Level Security as well.

First, we logged into Azure Portal with valid credentials and subscription  Then selected Azure SQL Database:

Next, we created a new database.  The wizard prompted for information:

Next, we created a new Resource Group for this demo and created a new Server and Database:

Our newly created Azure SQL Database was provisioned:

We already had the new Azure Data Studio downloaded and installed on laptop.  It's a new tool similar to SQL Server Management Studio (SSMS).  

We connected to our newly created Azure SQL Database.   Azure SQL Database requires a setting in Azure SQL Database Firewall, so we need to add current IP Address to allow access from remote laptop:

Using traditional SQL, we ran some  scripts to test the Data Masking functionality, as follows:

------------------------------ -- DROP USERS IF THEY EXIST   DROP USER IF EXISTS TestUser;   DROP USER IF EXISTS TestUser2;   DROP USER IF exists TestUser3; -- DROP TABLE IF EXISTS   DROP table IF EXISTS dbo.Membership; ------------------------------
-- CREATE DATABASE TABLE FOR TESTING CREATE TABLE dbo.Membership    (MemberID int IDENTITY PRIMARY KEY   UserName varchar(100) NULL,    FirstName varchar(100)   NULL   LastName varchar(100)    NULL   Phone varchar(12)        NULL   Email varchar(100)       NULL,    SSNbr varchar(11)        NULL);  -------------------------------- -- INSERT 3 ROWS 1 FOR EACH USERNAME INSERT dbo.Membership (UserName,FirstName, LastName, Phone, Email, SSNbr) VALUES   ('TestUser2','Roberto', 'Tamburello', '555.123.4567', 'RTamburello@contoso.com','123-45-6789'),  ('TestUser3','Janice', 'Galvin', '555.123.4568', 'JGalvin@contoso.com.co','9876-54-321'),  ('TestUser', 'Juan','Mu', '555.123.4569', 'ZMu@contoso.net','555-44-3211');  ------------------------ -- CREATE A SCHEMA FOR SECURITY CREATE SCHEMA Security; ------------------------ -- CREATE A FUNCTION TO TEST ROW LEVEL SECURITY CREATE FUNCTION Security.fn_securitypredicate(@Username AS sysname    RETURNS TABLE  WITH SCHEMABINDING  AS      RETURN SELECT 1 AS fn_securitypredicate_result   WHERE @Username = USER_NAME() OR USER_NAME() = 'TestUser2'; --------------------- -- CREATE A SECURITY POLICY THAT REFERENCES ABOVE FUNCTION CREATE SECURITY POLICY FilterDemo ADD FILTER PREDICATE Security.fn_securitypredicate(UserName)   ON dbo.Membership  WITH (STATE = ON);  --------------------- --CREATE TEST USER 1 CREATE USER TestUser WITHOUT LOGINGRANT SELECT ON dbo.Membership TO TestUser; ----- --CREATE TEST USER 2 CREATE USER TestUser2 WITHOUT LOGINGRANT SELECT ON dbo.Membership TO TestUser2; ----- --CREATE TEST USER 3 CREATE USER TestUser3 WITHOUT LOGINGRANT SELECT ON dbo.Membership TO TestUser3; ------------------------ -- RUN SELECT STATEMENT AS USER 'TestUser' RETURNS 1 ROW -- AS ROW LEVEL SECURITY IS IMPLEMENTED EXECUTE AS USER = 'TestUser'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser' as QueryType,User,* FROM dbo.Membership;  REVERT; ------------------------ -- RUN SELECT STATEMENT AS USER 'TestUser2' RETURNS 3 (ALL) ROW -- AS ROW LEVEL SECURITY IS IMPLEMENTED EXECUTE AS USER = 'TestUser2'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser2' as QueryType,User,* FROM dbo.Membership;  REVERT; ------------------------ -- RUN SELECT STATEMENT AS USER 'TestUser3' RETURNS 1 ROW -- AS ROW LEVEL SECURITY IS IMPLEMENTED EXECUTE AS USER = 'TestUser3'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser3' as QueryType,User,* FROM dbo.Membership;  REVERT; ------------------------
The first query returns 1 row, run as "TestUser".

The second query returns 3 rows, run as "TestUser2"

The third query returns 1 row, run as "TestUser3".

Reason being, we implemented a function, to handle Row Level Security.  Row Level Security is another Security feature out of the box in Azure SQL Database.  You can see the LastName and SSNbr fields are not masked, yet.

To get Dynamic Data Masking, we add the following code snippets:
ALTER TABLE dbo.Membership  ALTER COLUMN LastName varchar(100) MASKED WITH (FUNCTION = 'default()'); --- -- APPLY DATA MASKING TO SSNBR FIELD OF MEMBERSHIP TABLE ALTER TABLE dbo.Membership  ALTER COLUMN SSNbr varchar(100) MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)'); ------------------------ -- RE-RUN SELECT STATEMENT AS USER 'TestUser' RETURNS 1 ROW -- 2 FIELDS ARE MASKED AS EXPTECTEDEXECUTE AS USER = 'TestUser'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser' as QueryType,User,* FROM dbo.Membership;  REVERT; ------------------------ -- RE-RUN SELECT STATEMENT AS USER 'TestUser2' RETURNS 3 (ALL) ROW -- 2 FIELDS ARE MASKED AS EXPTECTED EXECUTE AS USER = 'TestUser2'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser2' as QueryType,User,* FROM dbo.Membership; 
REVERT; ------------------------ -- RE-RUN SELECT STATEMENT AS USER 'TestUser3' RETURNS 1 ROW -- 2 FIELDS ARE MASKED AS EXPTECTED EXECUTE AS USER = 'TestUser3'SELECT USER_NAME()  as UserQuery,'Run as User=TestUser3' as QueryType,User,* FROM dbo.Membership;  REVERT; ------------------------
Here's the output from all 3 queries. You can see the Dynamic Data Masking. Fields are LastName & SSNbr.

------------------------ -- TO GET A LIST OF FIELDS THAT HAVE DYNAMIC DATA MASKING, RUN THIS SCRIPT SELECT c.name, tbl.name as table_name, c.is_masked, c.masking_function  FROM sys.masked_columns ASJOIN sys.tables AS tbl       ON c.[object_id] = tbl.[object_id]  WHERE is_masked = 1; ------------------------
Source code for this demo was found here. Thanks for reading~!


Using Microsoft Data Migration Assistant

Worked with Microsoft Data Migration Assistant.  Simple and powerful application.

Simply download, open program, Click + for new, selected Migration...

Connect to Source Database...

Select the source database you wish to migrate from the available list...

Had to create an Azure SQL Database destination in Azure Portal to migrate too...

Connect to Destination Database...

Select the objects to migrate...

The database scripts get generated automatically...

Next, execute the scripts...opening the Azure SQL Database from SQL Server Management Studio:

We can see the table got created in Azure SQL Database...

You can store off the scripts for re-use, and you can push the data to Target database from within the application as well.

Nice product offering from Microsoft to Migrate data to and from places.

Thanks for reading~!

Getting Started with Azure Serverless Functions

Today decided to test out Azure Functions.  Functions are stand alone server-less functions that reside on Azure.  Here's a get started link.

In Azure Portal, we select "Create a Resource" --> "Serverless Function App":

Selected Serverless Function App...configured:

Objects were created:

Going to Resource Group, we select our newly created objects:

Viewing Dashboard:

Viewing objects in Microsoft Azure Storage Explorer:

Back in Azure, we can see the Services information:

Create a new Function:

More functions to choose from:

And more functions to choose from:

We selected "HTTP Trigger" which should send back a Response based on a web Request.

New Function:

It dynamically created the output code, we select the hyperlink to copy the URL:

Opened a new browser, appended "&name=Jon which resulted in:

There's also a console which expands, to test the Function, simply copy the URL string, add a Parameter "name" & "JonB", be sure to change to "Get" rather than "Post" as Get allows query string parameters, ran, success!

Can also view the "logs" in real time:

Going back to Microsoft Azure Storage Explorer, we now have more data to view:

Then we deleted the Resource Group which removed all artifacts associated with this example.

Again, here's the link I followed to get started.

As always, thanks for reading~!