4/28/2015

Easy ETL Using Power Query

PowerBI seems like the logical next step for Business Intelligence in the world of Microsoft.

It's found in Excel.  It's easy to use.  It's powerful.  Tons of connectors.  Can build models, which can be pushed to Tabular SSAS.  Pull from on-premise data sources.  As well as the cloud.

I've been working with the M language for a few months now.  I love it.  Easy to use, flexible, it's fun to write.  It's a blend of Wizards with custom coding with database design and data modeling and reporting and graphs and pivot tables which allows self service BI, in the Cloud, with auto refreshes, even against SSAS databases.  And you can export to Excel worksheet or Power Pivot.  One of the best features is the ability to massage data, as we all know most data must be cleansed, parsed, merged, appended, etc. and it's fully capable of handling the ETL tasks.

I think it will challenge traditional SSIS.  And traditional SSAS.  And traditional SSRS.

It just keeps getting better with every new version.

Now I will say that I lost some source code because of the inter-phase not connected to TFS.  And I write the code in notepad++ with the add-in for M language instead of within Power Query.

So the Power Query plug in, provides data connectivity to a variety of data sources, it stores data in Excel or Power Pivot models, you can perform ETL with custom code or Wizards and stores the code for later use and you can do reporting from within Excel through Pivot Tables or Pivot Charts.

It does everything a business intelligence tool needs.  And it's accessible in the cloud.

I see this as the future of BI in the Microsoft space.

4/23/2015

Team Foundation Server OData Feed via Power Query

Did you know that you can pull data from your Team Foundation Server through an OData project.  Yep.  Here's the link to get started:

https://tfsodata.visualstudio.com/

I inherited a project to do just that.  The client was using the Excel plug-in for TFS, yet the process was manual.  My task was to pull data from the OData feed through Power Query, to build a Power Pivot Model and have reports and dashboards based on that.

After connecting to TFS, there were three objects we were going after, Features, User Story and Task.  So we found code on the web to recursively pull the data.  Because the Feature has a Links column, which you can expand, which provides links to other Parent records, Child records, and others.

So once you have that Id, you call a function to grab that record, and do the same to get the Task.

However, there were performance issues, taking many hours to run before timing out.

So after rearchitecting the Power Query, there were now 3 separate calls to TFS, one for Feature, one for User Story and one for Task.  Then doing a Left Outer Join to pull them all together.  The trick was to only go after the Child records.  And that ran in about 11 minutes.

But then there was another problem. It turns out that the query they used in the TFS Plug in for Excel had fields not contained in the WorkItems object.

After scanning the web for a long time, determined it was not possible, yet I needed someone to agree.  So I took a chance and emailed someone from Microsoft, and believe it or not, they responded within an hour.  And yes the fields we needed are not exposed and they do not support the product or have the resources to expand the functionality.

So after letting the Project Manager and my boss know, it was decided to build out the Visual Studio project ourselves and add the fields.  So a .net developer was brought in, got up to speed, and we worked together to add the fields, which took less than a day.

So as things go in programming, there's always one issues after another.  And the next issue was, I pulled a report from the Excel TFS Plug in, the computer hung, shut down and never came back.  Logging in as admin, was able to pull the needed files to a thumb drive, drove to the office, picked up another laptop.  Was down for about 7 hours getting the new laptop in working condition.

The next problem was the OData Visual Studio project would not compile on the new laptop.  The error was a conflict in DLL, as there were 2 TFS dlls with the same method.  After researching, changed the reference in the VS 2013 project to point to the GAC instance of the DLL and now the project compiled.  The next problem was the localhost page would not render from the VS iis, but recreating the Virtual Web solved that issue.

At 10:30 last night, the Power Query was pulling data, through VPN client, from the local Visual Studio project, pointing at the client's TFS server and pulling data within 30 minutes.  And that includes the 25 new fields added to the VS project that aren't included in the WorkItems object.

Now, all that's left is to build the 8 Pivot table reports.  Validate the data.  And ship.

This project offered many challenges in addition to doing .net again after years layoff as well as writing complex code in Power Query "M" language.

Things are looking up~!

4/14/2015

SQL Server Replication vs Service Broker

I'm currently working on a task to compare and contract SQL Server Replication versus Service Broker.  There's lots of good information out there on the web and I've been accumulating in order to present a final document.

SQL Server Replication

Replication is a good option to move large (raw) data sets, SQL Objects (Tables, Functions and Stored Procedures) from one instance of SQL Server to another.  This can be achieved through Local and Wide Area Networks, wireless, VPN and the internet.

Definitions:
Publisher - hosts the source database.
Distributor - is the "pass-through" database.
Subscriber - is the final destination of the data.

There are 4 basic types of Replication:

Snapshot Replication - selected data is captured from the Publisher SQL Server and sent to the Subscriber server.  This is a good scenario if your database doesn't change much or is smaller in size.  It's also the easiest type to setup and maintain.

Transactional Replication - this is for Server to Server scenarios that require high throughput.  Transactional consistency is guaranteed within a publication as the data is transferred in the order the data was saved or updated.  Data is read from the Transaction Log files and each touch of the data point is captured, including Inserts, Updates and Deletes.  DML and DDL schema changes are captured and replicated.  There can be multiple Subscribers to a single Publisher.

Merge Replication - process of distributing data from Publisher to Subscriber and is typically used for Mobile applications or distributed server applications like a POS system or combining data from multiple sites.  It's the most difficult to setup and maintain.

Peer to Peer Replication, is built on the foundation of Transactional Replication, basically used for 'real time' synchronization.  Enterprise Edition will allow bi directional replication.

Installation
To create SQL Server Replication, you can use the built in wizards to assist with the installation.  You define which database will become the Publishing Database, Distribution Database and Subscription Database(s).  Optimally, they should all reside on separate servers, however, this is not mandatory.  Deployment of Replication is a combination of SQL Server Management Studio (SSMS) Wizards and scripts for automating purposes.

SQL Agents
There are SQL Agent jobs that do certain tasks, including the Snapshot Agent, Log Reader Agent and Distribution Agent.  You can set your Agents to run all the time or at specific intervals.  And you will need to create Users to run the SQL Agent Jobs. 

Advantages:

  • Multiple Subscriber databases can point to a single Publishing database.
  • Transactional consistency.
  • Potentially less coding required to setup and install Replication, including wizards and T-SQL Scripts.
  • Replication has built in solutions for Synchronization, Monitoring and Recovery.
  • Ability to adjust the frequency of the Replication by adjusting when and how often the SQL Agents run.
Downsides:
  • Increased administration time depending on the number of objects moved and subscribers.
  • There's no automatic fail over for Disaster Recovery.  
  • There is with no easy way to Transform (ETL) the data during transit.
  • You can apply filters to limit which data gets copied, although there could be a performance hit in high traffic servers.

Service Broker

Service Broker is suitable when you are not just trying to maintain identical copies of the data, but rather each application maintain its own data and the apps engage in an exchange of logical messages over a business conversation.  When and how these messages are sent or received is completely driven by the app, rather than the database.  Service Broker solves receiver management issues by activating queue readers as required when messages arrive.

Installation
To install and configure Service Broker, one must have a good understanding of the hierarchical objects which comprise the architecture.  This can become quite complex as there are no built in wizards or front end utilities.
 
Advantages:
  • All messages in a dialog are received in the order sent, with no gaps in the message sequence, even if the server goes down in the middle of a transaction.
  • Database integration enhances application performance and simplifies administration.
  • Message ordering and coordination for simplified application development.
  • Loose application coupling provides workload flexibility.
  • Related message locking allows more than once instance of an application to process messages from the same queue without explicit synchronization.
  • Automatic activation allows applications to scale with the message volume.
  • Asynchronous Queues enable the flexible scheduling of work, which can translate to big improvements in both performance and scalability.
  • Service Broker allows the ability to transform and manipulate the data as its moved between databases.
  • Service Broker has the ability to initiate transfer of data on Triggers.
  • Data is encrypted between servers to protect the connection.
  • You can use certificates to secure the machine endpoints with SSL encryption.
  • Servers do not have to reside within same network.
  • Service Broker allows custom Audit tables to track the flow of data.
  • Service Broker allows immediate data transfers with asynchronous transactions.
  • Service Broker allows each database side to own its own schema so they do not have to match.
  • Service Broker allows multiple servers talking to a single back end service as well as forwarding.
  • Service Broker has efficient transactions by leveraging SQL Server's internal transactions.
  • Service Broker Queues are database structures, so they are backup via standard SQL Server backup procedures.  They gain all the benefits of incremental and differential backup and restore processes provided by SQL Server.
  • Service Broker provides a way to send messages when the receiving queue is not available.
  • Service Broker allows for Prioritization of Queues.
  • Service Broker has the ability to Multicast messages (SQL Server 2012) which enables a single initiator service to send messages to multiple target services.
Downsides:
  • Service Broker requires you to do a fair amount of coding.
  • Although Service Broker can be much faster, it is not tuned for large data transfers.
  • Service Broker could experience performance issues, if the data transfer rate is high.
  • Service Broker has no built in mechanism for monitoring and debugging.
  • Service Broker does required additional configuration in order to work with the Always On Availability Groups.
  • Adding more data centers or servers require lots of new code on both servers preventing easy scale out.
So choosing the correct tool for transferring data between SQL Server depends on your unique situation, business and configuration rules and coding skills.  However, by understanding the benefits and downsides to each, you'll be better equipped to make the right choice.

4/04/2015

Create SQL Server Proxy Account to Run SSIS Package

When scheduling SSIS Packages to run in SQL Server 2012 Integration Services, you may need to create a Proxy account to mimic the rights of a specific user, so the package runs successfully.

I can't remember where exactly I found this code, but here it is:



--Script #1 - Creating a credential to be used by proxy
USE MASTER
GO
 --Drop the credential if it is already existing
IF EXISTS (SELECT  1 FROM  sys.credentials  WHERE name = N'SSISProxyCredentials')
BEGIN
 DROP CREDENTIAL [SSISProxyCredentials]
END
 GO
CREATE CREDENTIAL [SSISProxyCredentials]
WITH  IDENTITY  = N'DOMAIN\USER'
, SECRET  = N'Password'
GO
----------------------------------------------
--Script #2 - Creating a proxy account
USE  msdb
GO
 --Drop the proxy if it is already existing
IF EXISTS (SELECT  1 FROM  msdb.dbo.sysproxies 
WHERE name = N'SSISProxy')
BEGIN
EXEC  dbo.sp_delete_proxy
@proxy_name  = N'SSISProxy'
END
 GO
--Create a proxy and use the same credential as created above
 EXEC msdb.dbo.sp_add_proxy
@proxy_name  = N'SSISProxy',
@credential_name=N'SSISProxyCredentials',
@enabled=1
GO
--To enable or disable you can use this command
EXEC  msdb.dbo.sp_update_proxy
@proxy_name  = N'SSISProxy',
@enabled  = 1  --@enabled = 0
 GO
 ----------------------------------------------
 --Script #3 - Granting proxy account to SQL Server Agent Sub-systems
 USE msdb
GO
--You can view all the sub systems of SQL Server Agent with this command
--You can notice for SSIS Subsystem id is 11
EXEC  sp_enum_sqlagent_subsystems
 GO
 ----------------------------------------------
 --Grant created proxy to SQL Agent subsystem
--You can grant created proxy to as many as available subsystems
--EXEC  msdb.dbo.sp_delete_proxy
--@proxy_name=N'SSISProxyDemo'
--GO

EXEC  msdb.dbo.sp_grant_proxy_to_subsystem
@proxy_name=N'SSISProxy',
@subsystem_id=11 --subsystem 11 is for SSIS as you can see in the above image
GO
 --View all the proxies granted to all the subsystems
EXEC  dbo.sp_enum_proxy_for_subsystem
 ----------------------------------------------
 --Script #4 - Granting proxy access to security principals
USE msdb
GO
 --Grant proxy account access to security principals that could be
--either login name or fixed server role or msdb role
--Please note, Members of sysadmin server role are allowed to use any proxy
EXEC msdb.dbo.sp_grant_login_to_proxy
@proxy_name=N'SSISProxy'
,@login_name=N'DOMAIN\USER'
--,@fixed_server_role=N''
--,@msdb_role=N''
 GO
 --View logins provided access to proxies
EXEC dbo.sp_enum_login_for_proxy
GO
 ----------------------------------------------

Mountain Living