User Groups are the Grass Roots of Technology

You knows something.  The first Apple computer was first presented at the Homebrew Computer Club.

User Groups are part of the programming community.  I remember my father going to User Groups in the 1980's in Tampa and bringing home free software for us to try.

And with Meetup, you can find several local User Groups meeting throughout the month.

Here's what I found out though.  Finding a meeting space is not an easy task.

How do User Groups find a place to hang out for a few hours and discuss programming topics?

Well, what I've seen is many groups have sponsors who allow time and space for events.  Perhaps it's a recruiting firm.  Or a business that somebody works at.  Or whatever.

When I tried to track down a place to hold a User Group meeting, here's what I saw.

I placed several calls to several places, left voice messages.  Some called back.  "We rent our spaces out in 4 hour blocks at a cost of $400."  Uh, we're a non-profit organization and there's no way we could pay that kind of cash each month.  Don't you have anything for free, in an effort to disseminate information to people to get them excited about new technology and create a Grass Root culture that attracts some of the brightest minds out there?  No, we don't do that.

I let the User Group go and somebody else picked up the reigns and did a good job reviving it, hosting meetings at a recruiters office.

User Groups are the grass root movement for the technology industry.  It's been there from the beginning.  If business' want to attract top talent, they have to volunteer space in which the community can grow.  Because community is like a snowball rolling down a mountain.  It builds momentum and everyone benefits.

And there you have it~!


Getting Started with Tableau Software Step 3

Welcome to Step 3 in our series 'Getting Started with Tableau Software'.

In case you missed Step 1, you can find it here.  And Step 2 here.

So let's get started.

In order to create a Dashboard, we go to the Dashboard drop down menu and select 'New Dashboard'.

 A new Dashboard opens on the screen and it the 'Name' appears at the bottom of the screen left hand corner:

So we right click and rename to 'Dashboard Demo':

You can choose from in the left hand top corner, here we see the workbook created in Steps 1 & 2, 'Internet Sales':

You simply click on 'Internet Sales' and drag it to the canvas:

Click on the button 'Presentation Mode' or press F7 and a new window opens up:

In this mode, the it's entirely dynamic.  You can select or deselect 'Product Key' from the Filter section or click on the 'Product Key' below and it highlights on the canvas.

Next, we can create a Story.  Simply click on the Story drop down menu, select Story and a new window appears.  We can rename bottom left hand corner, 'Story Demo'.

From this page, we see our Worksheet and Dashboard available.  In this case we selected the 'Dashboard Demo' and dragged onto the canvas:

Click on the button 'Presentation Mode' or press F7 and a new window opens up:

And that basically sums up Step 3 in our Series 'Getting Started with Tableau'.  Thanks for reading~!

Getting Started with Tableau Software Step 2

Welcome to Step 2 in our series 'Getting Started with Tableau Software'.

In case you missed Step 1, you can find it here.

So let's get started.

From the worksheet, you can click on the button here in Red and you can see the raw data which comprises the Chart visualization:

There's a 'Copy' button on the right, when clicked, you can copy the raw data into Excel:

Next, to create a 'Filter', drop a field, in this case, 'Product Key' and drag to the 'Filters' box.  A Dialog Box appears, where you can apply rules.  In this case we 'Select from list' and chose the Product Keys that begin with '2':

And we see the 'Product Key' filter:

The Chart changes accordingly to limit the data set to the items selected in the filter:

And you can see the filter on the right hand side of the screen, where you can select different filter items:

Above, you can see we selected a few additional Product Keys and they instantly appeared on the Graph:

If you hover over an item or right click, different menu's appear with different settings and options like 'Exclude' or 'Hide' or 'Create Set' or 'View Raw Data'.  Likewise, you can do the same thing when selecting the green area in the chart above.

Also, there's a box 'Show Me' which allows you to change the Chart type of the fly:

With a single click we can change to:

Or 'Bubbles' chart:

Thanks for reading Step 2 in our series 'Getting Started with Tableau Software'.

Getting Started with Tableau Software Step 1

To get started with Tableau, the first thing to do is download the client software, current version is 9.3 from their website.

After the download completes, begin the install.

Once installed load the software.  You will be prompted to begin the evaluation now, later or register the product.  Choose your appropriate option.

Once, registered, open the software.  First you'll need to connect to a data source.  I connected to a local instance of SQL Server 2012 with AdventureWorksDW2012.  Enter your credentials and Tableau connects to the data source and takes you to a screen displaying the connection as well as the available tables:

Cool.  Next thing to do is drag some tables to the canvas, I selected the 'FactInternetSales' fact table:

Next, I set the Sort Fields to 'A to Z ascending':

In the upper right hand corner of the canvas, there's an option to connect to your data source 'Live' or 'Extract':

I selected the 'Extract' option:

Notice the message displayed 'Extract will include all data.'

Next, there's an option to add 'Filters':

After clicking the 'Add...' link, a dialog box pops up, click 'Add...' and we see:

I choose the 'Product Key' field.  The Filter Dialog box appears:

I selected all, and said 'Ok'.

 Next, saved the Data Source file and it opened to the Worksheet mode.

Here you can see familiar Dimensions and Measures.  So I added the 'Order Quantity to the report, and it automatically Sums the field.  Next added 'Product Key' to the Rows.  And it automatically created a nice Chart:

Next, the bottom left hand side of the screen, I went ahead and right clicked the tab to rename, calling it 'Internet Sales':

Next, click the save button to save the Tableau Workbook as 'Internet Sales':

You can easily change the color of the Chart by selecting the 'Color' option, changed to Green:

Then you can easily change the Chart type to another option by selecting something from the drop down box 'Automatic':

Here I selected the 'Area' type:

And of course, Save your work.

So that's 'Step 1' of how to get started with Tableau Software.

Stay tuned for more blog posts on Getting Started with Tableau Software.

Thanks for reading~!


Domain Knowledge is Important but not Most Important Skill Required

To work in the data space, you have to have a lot of skills.

Programming.  Databases.  Reporting.  Dashboards.  ETL.  Data Modeling.  SQL.  And an ancillary list of "other" skills.

And we tend to think of Data Scientist as a blend of Math/Science/Logic + Programming/Database/Visualizations/Storytelling + Domain Knowledge.

And people will ask you, so how much experience do you have in X industry?  As in Banking, or Healthcare, or Utilities, or Government.

I've been working in IT since around 1995 or so.  And worked in a variety of sectors.  Including Healthcare, Government, Finance, Timekeeping Software, Credit Card Processing, Homeowner Insurance, Marketing, Education, Utilities, Lighting company, Taxi Software company, Travel company, Banking and Retail.

Guess what.  Each industry has specific business rules.  Guess what else.  You can learn the business rules rather quickly.  Guess what else.  Lots of industry processes overlap.

When you get to a new project, you get onboarded.  And in doing so, you dig into the business rules.  And soon you become the expert in those business rules.  And sometimes you find holes in the business rules.  And sometimes you know ALL the business rules of an org.  They may have people that know bits and pieces of the entire picture, but the BI person, who touches every part of the org, knows ALL the rules and how each affects the other.  And how the information flows throughout the org.

I once worked in a place where the business rules were hidden. Actually, there are many orgs that have people who purposely hide the business rules in their head for job security.  And that one place, the owner or gatekeeper of the big picture got super promoted.  And in many instances, those people who hold the info hostage for personal gain, are very difficult to extract out of the picture.  They truly have job security.

But as a BI consultant, it may take a day, or a week, or years, but eventually, you learn the logic also.  A person who does this for a living is able to extract that knowledge.  So to say that a person needs 10 years business experience in a specific domain, that may very well be true.  But then again, if a consultant learns how to learn the business rules,  Domain Knowledge isn't really the most important skill required.

I will say though, if you can take a specific domain and make that your specialty, well, that would give you a leg up.

And there you have it~!


Smart Machines and Robots May Be Inflexible in the Future

In society, you are free.  Free to do as you please.  As long as you don't harm yourself or others.  And don't incite riots or threaten people in high positions.

Other than that, you can sort of do what you want.  Just keep in mind, you're not going to win any popularity contests by spewing out things people don't want to hear.

They spend a great deal of effort avoiding certain subject, heads in the sand, ostrich style.

The other issue, lots of orgs are scanning and capturing your tweets, posts, blogs, everything really.  They form these social profiles of everyone.  And those can and will be used against you.  Can cost you your job.  All sorts of implications.

So let's say we take one of the Artificial Intelligent bots, let it scan the entire internet, form it's own conclusions.  One thing it may pick up is the fact that lots of people have lots of opinion.  How do you determine what's acceptable and what's not.

Well, one group force fed a Bot to think that Hitler was a good guy.  The ramifications aren't that serious, except it exposes a great weakness in the process of training a Bot.  It learns over time, based on information.  You feed it, it learns, you feed it some more, it learns more.

How is a Bot supposed to learn what's okay and what's not.  There are so many exceptions.  People say that learning the English language is so difficult because the same word could have different meaning and different spelling, for example, To, Too, Two, and There, Their, etc.

Learning the patterns of the planet must be a lot more complicated.  Especially the norms of each society.  In other words, the culture which is made up of written and unwritten rules that base the values of society.

If a Bot learns that people drive auto's on the right side of the road, yet see images in England for example, they drive on the left hand side.  And how do you know to use the metric system vs the American units of measurement?  So many details to learn.

Most of it can be learned through context.  Like a dictionary of items to events to places and time and sentiment.  A complex web of neural networks of acceptable behavior.

So who trains the models?  The computer programmers?  No, they only apply the business requirements from the business.  And where do they get requirements, from upper management.  And where do they get requirements, from the board and shareholders.  Or mandates from Government or the Military.  Could go high up the chain actually.

Yet who authorized them?  Just because we do things a certain way, why should we dictate the rules?  Just because we have the technology and resources and knowledge and ability?  How can we ensure that the proper morals and ethics are applied evenly and consistently.  And who's allegiance are the intelligent machines accountable too?

Who basically owns the responsibility of ensuring ethical and moral smart machines?  That's a good question.

Just like any new technology, can be used for good and / or evil.  Could you imagine, years from now, an army of robots patrolling, who issues their orders?  Simply an invisible layer of autonomous beings hiding the order givers behind the curtains.

How do you feel about IVR systems and traffic lights?  They automate to some degree, yet very inflexible.  I sure hope these new smart machines are a bit more tolerable and easy to work with than our current systems.

Yes, automation and robots can and will automate jobs.  But they could also patrol the place, keep humans in line.  Not sure robots respond to bribes or friendship history to get you off the hook.  But then again, they may apply the rules evenly, so no discrimination based on color, creed, sexuality or what have you.

So there you go.  Covered a lot of ground in this post.  Thanks for reading~!


Data Quality Services Presentation and Thoughts

I went to the Tampa SQL Server BI User Group in Tampa last night.  Great presentation on DQS or Data Quality Services.

It's a product from Microsoft, bundled into the SQL Server install along with SSRS and MDS.  It installs a client on the server along with three databases.  Essentially you create a Knowledge Base repository, where you customize rules for each and every field you wish to track.  Then the power user can run a project, point to the Knowledge Base, and it will find the issues, log them off somewhere, for someone to go back and correct or base a decision on how to handle.  Stuff like duplicate data, missing data, data spelled incorrect, synonymy, etc.

You can also call DQS from SSIS components.  I asked the speaker a question, in SSIS as the data flows through, when it finds data that needs a human pair of eyes, does it stop the flow of data?  Apparently no.  It gets mashed back into the ODS after it's cleaned up, and SSIS picks it up from there to move to the data warehouse.  Another approach would be to use Staging database, which is how we do it.

I asked a question about that, is there a way to flag the data as it flows in from source, because you don't want to send up the same records to Azure for validation every time the ETL runs, seems like a waste of money.  He suggested a technique to create a Hash based on the concatenation of data, check it before sending to Azure to see if data changed.  Someone in the audience asked if "CheckSum" was used, sometimes, but not as flexible as another option, which handles upper and lower case changes as well as increased field lengths.

The speaker showed us quite a bit on how to leverage Microsoft Azure to pull in data from "Melissa Data".  Seems kind of cool, easy to use, some costs associated with it.

The entire process needs sign off to allow the data steward to be part of the life cycle of data, typically found in a Master Data Management shop with a board of people who make key decisions on how to interpret and handle data.

In thinking about data quality, I often wonder if that will trickle into self service data quality, as in PowerBI.  It would make sense that many shops have similar data patterns, and perhaps there's a way to re-use some of the knowledge databases in the cloud, or a way to import to your PowerBI notebook or something.

Because we all know that having accurate reports is key.  And with the rise of self service, would be nice if there reports were correct as well as IT's enterprise level reports that use DQS.

Just a thought.

And there you have it~!

Every Company Is now in the Data Business

Name a business.  Any business.  Doesn't matter what they do.  I bet they have some software applications.  And I bet those software applications store off their data into some type of repository.  And I bet some of that data could potentially be used to increase sales, reduce costs or streamline processes.  Just a hunch.

Because every company, organization or team is now in the data business.  Like it or not.  Data matters.  Data is an asset.  Data can be leverage to help your organization compete and thrive in the years ahead.

Or not.  We've always done it this way.  I have my secretary print out my emails so I don't have to connect to a PC.  I'm really not that fond of all this IT gibberish, we'll keep doing things the way we've always done them.

For me, in the mid 1990's the management team wasn't getting good reports from IT.  So they asked for a volunteer to create some production reports.  I volunteered.

Soon managers from all over were calling to ask for reports.  I was sort of a junior level programmer, entry level salary, yet they were calling to get information, which I could provide, in the form of Crystal Reports.

Except being young and ambitions, who wants to stick around making entry level salary while the consultant sitting next to you is earning 3 times your salary to do the same thing.

I went in search of more experience.  Eventually the salary would have to start climbing.

My next job was to create Crystal Reports against Oracle databases for healthcare clients.  The reports were used in Hospitals for timekeeping, the company was called Kronos.  I created a lot of reports to keep track of hours from different angles.  That was a fun job.  Except I was still at entry level salary.

So I went to the Electric company, to program in Visual Basic against Oracle. And that was a fun job.  A real IT department with really smart people.  I met a bunch of consultants who taught me a lot.  Except my salary was still entry level.

After they announce the a buyout from another company, I was the first to find another job.  Finally a substantial bump in salary.  Now I'd be writing reports for a credit card processing company, using a product called Actuate.  And that was a fun job.  

The bottom line, all companies have data.  They all need reports, and Business Intelligence.  My IT career started out in data, writing reports, for upper management.  And I realized sooner than later how important the data was.  Sure the front end apps were slick and fancy, but the management was more interested in the data.  Back then, data was the red headed stepchild, one step below documentation writer.  But over time, people saw the light.  And now data is the new oil, the new electricity, the new asset for people to leverage.

I feel like I've been there from the beginning.  Probably because I was, 20 years ago.  Still trying to get the numbers to match after all these years.

And there you have it~!


How do we get to the next step of the Information Age?

Databases store information, typically transaction data, from a front end application.  Each proprietary vendor has a proprietary database.  The report developer has to learn each database, typically studying the visual data dictionary diagrams, very painful.  Perhaps over time, they push out a pleathera of reports based on custom SQL statements, going through hoops to apply proper table joins, views, stored procedures, can become quite large, and messy.  And all the knowledge is in house.  That gives the report developer some status as he / she can 'get' the information you need.

However, each org typically has more than one proprietary database with unique database schema.  How does one merge the two or more databases, to get a complete insight into the organization?  Very difficult.  Typically requires an ETL developer, data modeler, data warehouse architect and team of developers and someone who knows the business domain.

Next, let's add in data from outside the organization, like the cloud.  For example social media like Facebook or Twitter or LinkedIn.  Or how about Google Analytics.  How does that data get integrated into a cohesive whole?  Very carefully.

You can see right off the bat that proprietary databases from vendors has created an entire industry of report writers and database developers and business intelligence tools.

Now, let's add in Hadoop.  It can handle the volume you're looking for, just shove all your data into Hadoop, then apply the "links" to integrate the disparate data sets for your big picture analytics.  Still, setting up Hadoop ecosystem is not an easy task, requires a Hadoop admin, Hadoop developer, pick an out of the box Hadoop distribution, load onto commodity server and begin.  Still, not an easy task.

However, Microsoft offers a cloud based solution on the Azure platform.  To store data in HDInsight Hadoop, SQL Database, SQL Data Warehouse, AzureML to build models in WYSIWYG cloud based IDE, plenty of adapters to bring in data from other cloud offerings as well as on-premise or data born in the cloud.  And do your reports in PowerBI.

So the issue of bringing together 'all' your data is now available.  But the inherit issue of merging differnt data sets still exists.  Requiring an ETL developer.  Or business domain knowledge expert.  ETL still seems to be the bottleneck of easy automation of full life cycle analytics.  You have to apply business rules, figure out how to mash customer x from Salesforce database to Google Adwords to Microsoft Dynamics to the Call Center data.  No easy process for this.  Unless you bake a solution into each part along the trail.  Typically done in-house.

We now have all the pieces required for full life cycle analytics.  However, there's still a few more problems to solve before complete automation can occur.

In the meantime, the amount of data continues to expand, thanks to Social Media and Internet of Things.  Perhaps in the near future the missing link of automated ETL will occur and then we'll take all that data, apply machine learning in near real time.  Then we'll be one step close to a cross domain artificial intelligent brain used to predict probable future based on past events, segregate things into buckets, find anomalies in data outliers and create recommendation engines without the assistance of people.  That should increase sales, reduce costs and create efficiency across the board.  And propel humanity into the next stage of the Information Age / Revolution.

And there you have it~!

The Microsoft Data Platform has come a long way

In the mid 1990's I was a Microsoft Visual Basic developer against Oracle databases.

At that time, the reporting solution was to use Crystal Reports, which either bundled with VB or to use Microsoft Access.

Access was great back then, because it allowed you to store data, pull in data from flat files through ODBC, write custom queries or use the WYSWYG editor and finally write reports.

Better yet, you could access Microsoft Access through OLE, as in call from Visual Basic, where the developer had access to just about every piece of functionality.

Likewise, it also had the ability to write complex custom VBA coded within Access.  We had some cool logic, that was reused, similar to real object oriented languages.  As you may or  not know, back then Visual Basic was not a true Object Oriented language, it's major downfall, and in fact VBA was a subset of VB, so it really was not Object Oriented.  That made Java an attractive language back then, as it was OO and worked on Microsoft Windows and Unix and embedded.  We actually had Javascript as well, which we used for client side validation in classic ASP apps.

I worked on one app written in Visual Basic, that seamlessly called MS Access and the user was unaware there were two apps running.

Granted, you had to compress the database periodically, as the size grew, sort of like a 'defrag' to reduce the size.

I worked at one company, where they implemented in early form of Instant Messenger using MS Access.  With 150 people on the phones, they could issue a broadcast message to the group, and it would appear on everyone's PC, as they also build an Active X control to ping the Access database every 5 seconds.

Unfortunately for me, I was tasked with porting the MS Access database to Oracle 8i on a Unix box.  When the app went live and all 150 people pinged the Oracle server every 5 seconds, it brought down the Unix server.  I was given the label, Unix killer and spent time in management's office explaining what just happened after reverting everything.  They eventually purchased an off the shelf instant messenger app.

Also back then, I developer classic ASP websites and placed a copy of Access on the hosted server which served up ASP pages to users for my websites.  Was a cost effective approach to storing data remotely with easy backups, as I'd copy the db periodically.  I think the Access got hacked at one point though.

The security wasn't row level back then, in fact the security wasn't that robust.

Enter stage left, PowerBI.  If you look closely, you can see similarities between MS Access and MS PowerBI.

Store data
Pull in data via ODBC or native drivers
Write queries through WYSIWYG editors or M Language
Write Reports
Call from remote apps using OLE
What does PowerBI have that Access didn't?

  • It has PowerBI.com to host files.
  • It can call other data sources in the Cloud.
  • It can call data sources back on-premise.
  • You can schedule those data refreshes.
  • Built in security, integrates with Active Directory.
  • Maps.
  • Natural Query Languages.
  • Stand alone IDE for Power BI Desktop.
  • Same functionality works in Excel.
  • Branding.
  • Works with Tabular Model SSAS cube data.
  • Great for Pivot tables.
  • Great Dashboards.
  • Drill down capabilities.
  • Linked reports/dashboards.
  • R language integration.

PowerBI has come along way in a short time, with organic growth from within the community.  I think there was a hiccup at first by moving PowerBI into Office365, but the move to independent uncoupled web PowerBI was a smart move.

And the changes occur weekly, so you have to keep up all the time, although the backwards compatibility still works in Excel.  I'm not sure there are any competing vendors that have Natural Query Language out of the box.  And the Mapping is really a great feature.

So to summarize, Microsoft was late to the party in the Reporting world back in the mid 1990s.  However, they caught up fast recently and now has a tentacle in every facet of Business Intelligence and Analytics.  

This blog post didn't mention other tools on the Azure platform like HDInsight Hadoop, Streaming Analytics, Elastic Azure Data Warehousing, AzureML Machine Learning, putting the ability to create Models using WYSIWYG cloud based editors into the hands of mere mortals and the tight integration with Visual Studio in the Cloud and On Premise, DocumentDB.  And of course, SQL Server has caught up in a big way regarding Relational Databases and is now considered a top notch enterprise ready product which incorporates the R language in the next version.  Each product is an industry leader in their domain.

So to summarize, not only has MS caught up, they are definite leaders in just about every category of data and analytics.  They've leveraged a lot of functionality from it's predecessor MS Access along with new technologies such that they have many products with great features covering all aspects of data.

I wonder at some point, if they'll collapse some more features into PowerBI like Machine Learning, DocumentDB, built in HDFS system to store flat files as well as work on Linux.

The Microsoft Data Platform has come a long way.

Read my blog from February 2015: http://www.bloomconsultingbi.com/2015/02/microsoft-innovation-full-court-press.html

And Microsoft Transformed Over Time http://www.bloomconsultingbi.com/2015/05/microsoft-transformed-over-time.html