Some More Fuzzy Logic

I've been playing with some Fuzzy Logic code.

Basically, you tell it to match on DomainName (from the user's email address) plus their Country and it looks for matches based on that.

Then the Fuzzy part, you tell it the Company name to search for and it looks for matches based on similarities based on percentages.


@SimilarThreshold FLOAT

-- Jaro-Winkler returns a value between 0 and 1, the closest to 1

-- the more similar it is. This variable allow us to ignore matches

-- with a lower score.


@SimilarThreshold = 0.825;

So you tell it to match on 0.825 and it only returns data equal or above that range for matching percentages.

I've been running the code for 375k rows and it runs for about a half hour give or take.

After it renders results, I can identify the percentage of matches found, and adjust the percentage accordingly.

Actually, I'd prefer to keep the percentage higher than lower, to make this code air tight.

However, once I get the figures in place, then I can begin testing lower percents as well as do some Fuzzy logic for other fields, like IP Address, Address, etc.

This really opens up a lot of doors when trying to match desperate data source accross servers that otherwise would be able to mash.

Fun stuff for a Friday!


Face to Face Contact Obsolete

Life is a little bit like poker.

Guard your hand, protect your cards, be aggressive when bluffing.

If you ever read this blog, you'll see that there's a huge amount of transparency.

Which is the way things are headed in real life.

Secrecy is a thing of the past.

However, some people still use cold war tactics.

They hide their true motives, they cloak their agenda's, they pretend to help while they steal you blind.

It's getting hard to tell the good guys from the bad guys, to be honest.

The dividing line has diminished, everyone's out for themselves.

Passive aggressive is the new norm.

People are superficial and there's no real substance.

Conversations tiptoe around the weather, television shows and sports.

Has the world of internet overridden true community?

With people's short attention spans and unquestionable thirst for newness and excitement, do Humans stand a chance in the new digital world?

Has face to face contact become obsolete?

As people stare at their smart phone in replace for conversation.

Personally, I think everyone's out for themselves one way or another.


Some Links on New BI Technologies

I was watching Chris Webb's video from the Nordic Pass Session: Chris Webb - The best Microsoft BI tools you've never heard of

and discovered a new technology called "Open Data Protocol" or "OData".

You can view it's site here: http://www.odata.org/introduction

I imagine lots of this open data can be consumed via Power Pivot.

Chris explains the protocol similar to an "ODBC" for the internet which makes sense.

He also talks about a new product called Microsoft Codename "Data Explorer".

Website is here: http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx

He equates it to SSIS for Excel.

You can connect to a variety of data sources and apply a series of repeatable steps.

Next topic was Node XL where you can import data from a variety of sources.

And another link mentioned was NodeXLGraphGallery which has some good graphs.

Another free download is LayerScape developed by Microsoft Research.

It takes data from Excel and allows data powered by World Wide Telescope.

So to summarize, Business Intelligence is growing really fast.

I'm glad we have organizations like PASS to help disseminate information easily.  You can watch some great video clips from the Nordic site here.

And thanks to Chris for the good info tidbits!



Fuzzy Logic

Today I'm going to work with Fuzzy Logic.

What that does is looks for matches of data based on a threshold.

So you can specify what percentage is an acceptable match.

I've done this before on a contract through Bloom Consulting, where the logic resided within an SSIS component.

However, today I'll be working with the Fuzzy Logic within T-SQL function is called Jaro-Winkler.

The goal is to find similar matches of data, so we can plug some missing values in some tables.

And by doing that we will essentially close the loop on some otherwise disperate data sources.

Wish me lucK, I'm going in!


Never Give Up!

I know a guy who was diagnosed with a terrible disease at an early age, around 21.

He could have given up, sat on the sidelines of life, chosen the easy path.

Except he didn't.

He chose to accept it and deal with it and live with it.

Although he knew deep in the back of his mind that it was not true.

Although nobody questioned the diagnosis.

Nobody bothered to offer assistance.

Or help in any way.

He was left alone do deal with life with his disability.

He tried to see other doctors to get another evaluation.

None of them bother to listened or help.

Primary care physician didn't help.

Had to pay extra for Life Insurance for extra risk.

Meanwhile, my friend never gave up, kept pushing forward.

Until one day, he called a doctor and scheduled an appointment.

The doctor listened to his story from start to finish.

And what he said was amazing.

After twenty years of living with a debilitating disease, the doctor said it was highly unlikely to be the case and was confident it was not true.

I don't think people can realize what a struggle it was for my friend.

Everybody gave up on him, nobody bothered to help.

And after twenty years, my friend found the truth.

Which he knew all along.

So here's to my friend and his new found life.

Sometimes hope does pay off.

And miracles do happen.

Never give up!


ROI Riddle

How much business was generated based on a specific campaign?

Good question.  How do you know that?

You'd have to have data for all your campaign info.

As well as you Lead info.

And determine which Leads were Qualified.

And of the Qualified Leads, which ones converted into sales?

So what percentage converted, sliced by product, region, etc.

It can be done, just takes time.

And I have time.

So I meet with people on the Analytics side.

On the Business side.

And the Technology side.

To document the processes.

And look for the gaps / holes.

And find solutions.

To produce a workable high level analysis from start to finish.

Every touch point a customer stops along the way.

Once the architecture is in place, it's just a matter of consolidating the data sources.

And creating a Tabular Model for the big wigs to slice and dice.

And that's how it's done.

It won't be long now until I have something to show.

And the riddle of ROI will be solved.


BI= Business + Technology + Analytics

Business Intelligency is comprised of Three Items:

How does the business work?
How do Apps work?
How do Processes work?

What Servers, Databases, Tables, Fields are available?
How does Data get Joined between sources?
What gaps are there in the Data?

What Questions / Hypothesis are to be answered?
How do we find Insights from the data / information?

Traditional Reporting consisted of Technology plus some business knowleged.
Business Intelligence added more of the Processes / Business lingo to the equation.
Data Scientist encapuslates more of the three at deeper levels.


Business Intelligence Is More than Just Data

When people think of Business Intelligence they think of different things.

Some people will rattle off the top 10 vendors in the BI space.

Other will differentiate the key functionality between Cloud BI, Mobile BI, Big Data, Traditional BI, In Memory BI, ETL, Data Governance, etc.

Still others will get into the nitty gritty aspects of how the functionality works.  How to best optimize queries, how to mash data, how to get Unstructured Data onto a report.

And yet still others will talk about the Business Rules for an organization.  How data flows between departments, between systems, between databases, and how it all integrated into a cohesive whole.

Business Intelligence is not just data into information to produce actionable tasks.

It's about Collaberation.  Discussions about technology.  About Business Processes.  About Analytics.

It's about communication.  Discovery.  Detective work.  Finding the truth.

It blends art and science, faith and facts and micro and macro.

What else could you ask for?


Using Data to Answer Hypothesis - Step by Step

In order to solve a problem you must first state your hypothesis.

Once you determine the questions to be answered, you must ask yourself some basic questions.

Who, What, Where, When, How and maybe even Why.

So if you are tasked to find out your customer base for example, here's how I'd approach it.
  • Who: Who is your Customer?
  • What: What Products did they purchase?  How many units?
  • Where: Where do the customer's reside?  (Region, Territory, Country, State, City)
  • When: When did they purchase products?  How long interval between first touch and closed sale?
  • How: How did the customer find our product?  (Campaign, Google Search Word, Advertisements)
Those questions would help to answer the "Dimensions" part of the equations, so you can slice and dice.

The "Measures" or numeric values would be:
  • Sum(Purchase Amount)
  • Average(Purchase Amount)
  • Count(Customers) 
Next step would be to identify the Data Sources, where does my data reside.  What fields would help to answer my questions?  What information is just noise and can be disregarded?

Next step would be to ETL, Extract, Transform and Load your data, assuming your data has already been cleansed and data governed.

Next step would be to create a workable Query or Queries to pull the data into Dimensions and Measures, with the final resting home to be a Cube for slicing and dicing.

That's how I would approach it anyway, as my expertise is from the Business Intelligence side.

If you were a Data Scientist, I think you would follow the same steps, except you may be pulling from Big Data or UnStructured Data.

And you may be applying Models for Statistical or Mathematical predictions and calculations.

In the end you should either have answered your question / hypothesis, or discover you need to modify your approach, find more data, revise your models, etc.

And document along the way, your assumptions and findings, so you can later reproduce the experiment.

That's the way I see it anyway.

Yearly Review

So this year we're trying something different.

We've given the responsibility of evaluation to the brightest minds.

Our pets.

Here's what they had to say.

Mickey the Cat: Jon has done well to provide a stable environment.  He provides boxes frequently for me to sit in.  He plays with me in the morning.  Although he is not the actual person who feeds me, he does a descent job scooping the litter box.  Rooms for improvement, I could use some more time on the porch to watch the lizards.

Charlie the Cat: Over the past year, Jon is an excellent provider, although he tends to spend the majority of time with the canines.  Keep the treats coming and there won't be any issues.

Sunshine the Cat: I usually hide upstairs until it's feeding time so I don't have any complaints except the two dogs tend to sniff me a lot.

Maddie the Dog: Overall Jon is outstanding at what he does, which is to let us out in the morning to do our business.  He is generous with the treats, and he's usually good for a walk in the evening.  I like to go on car rides and find he is very accommodating on that front.  I am impressed with the new swimming pool, however, I feel that the swims should be daily, not occasionally so there's room for improvement.

Chloe the Dog: I would say I'm treated fairly, there's been talk of a Union, however, for the record I'm not a part of that.  Jon has the capacity to think like a dog, so he understands what we want, which is either some petting, some treats, a walk or swim or car ride or more treats.  Jon shows potential and I feel with the right attitude, he will go far.

Overall Rating: 4 out 5 Paws

Signed: The Pets


Assembling Data Sources Into Single Query

I've been putting together the mother of all queries.

Joining data from 2 separate data sources, then combining that with SalesForce, then combining that with a final data source.

Let me say there's a lot of data.

And let me say the query is pulling from multiple servers and multiple databases.

And the queries take a long time to run.

So I've been piecing them together, slowly.

Mostly in Temp tables, with indexes.

So today I moved some of the data closer to the source, and now the query runs a lot faster.

So I can actually see the data results after a few minutes opposed to hours.

Which means tomorrow I should be able to dissect the relevant information to answer some questions we posed as hypothesis.

My goal is to answer the questions, and document the methods in which the data was gathered.

As there will always be exceptions, assumptions and work a rounds.

However, in the end, we should have a global view of the data mashed with multiple data sources to then find insight.

And give the uppers an ability to manage the business in a way they were never able before.

And put the data into a Tabular Model cube for interrogation.

That's really what it's all about.


Programmers for Life

I was active on LinkedIn today.

And I noticed something unusual.

Many people I worked with in the past are really successful.

Managers, Directors, VPs, SVP, Senior Programmers, Architects, etc.

I wonder if you do something for long enough, the natural progression is upward.

It doesn't necessarily mean that you are a great programmer, to become a VP.

Sometimes it's the exact opposite, they promote those who can't program because they do less damage higher up.

Except some people are really smart and deserve to be in charge of departments and such.

I'm glad to see these people climb the ladder of success.

I tried being a Supervisor and I found it to be a no win situation.

My skills were drying up, there's no method for motivating people at g0v't jobs, the pay was lousy, my peers were making 20k more than I was, there's pressure from above to keep up with the work queue, there was no training, there was no way to say "no" to a customer no matter how difficult or feasable the request, there's no way to fire people at g0v't jobs, I felt little support from my peers if anything there was competition, and lastly, I felt their was almost no recognition for doing good work.

So I'm back to being a programmer.  And I'm luvin' it!

And my hat's off to all those programmers who stayed in the trenches throughout their careers.

It ain't easy solving problems day in and day out.

And so it goes!

Shadow IT

We've seen 'Shadow IT' brought out into the open.

So what exactly is it?

From my perspective, the business users go out and hire their own people, typically contractors, to write some application which IT was not able to deliver.

Sometimes those shadow IT contractors build applications in languages not suppored by the IT infrastructure.

And they are not limited to the change management processes.

Or naming conventions.  Or documentation rules if any.

And sometimes the rougue contractors leave and IT is left to support the applications.

And to their surprise, they've just inherited a pile of interconnected crap held together by duck tape.

Other times, the business will have one of their own people maintain their printers, add users, hardware, software, a sub-IT department.

And other times I've seen these rogue developers create intricate interconnected Access applications that utilize corporate data and then add more data to their micro databases.  Then you'll be in a meeting and someone whips out a report, and nobody knows the source of the data.

Then you do some digging and find out there's 20 interconnected Access databases that are running entire deparmtments, under the radar, with little or no support.

And it would take years to convert to real packages.

And sometimes these apps are suppored by a lone wolf who has built his job security into this mess and there's no way to get rid of them.  Because only they know how it works and if you fire them and something goes wrong, disaster.

So everyone talks of Shadow IT as a good thing, I say take with grain of salt.

The business can really put you into some hot water if you're not careful.

And there you have it!


Become A Computer Programmer

Are you an introvert?  Do you like to solve problems?

Then you may want to become a computer programmer.

Everyday you can take on challenges that no one else wants.

Find simple solutions to complex problems.

Learn new technology on a daily basis.

Solve production bugs while under pressure.

Become an expert at viewing the Big Picture.

At the same time be able to be in the trenches and dig through the weeds.

Get to know your customers because once they figure out you can solve their problems, they be contacting you ever 30 seconds about their latest emergency.

You see, being a Computer Programmer is about the best occupation you could ask for.

It's never too late!  Sign up today!  Money back guarantee if not completely satisfied after 20 years.

Cost or Asset?

What kind of IT shop do you work in?

Do you have:

  • Quality Assurance department?
  • Business Analysts?
  • Project Managers?
  • Change Managers?
  • Incident Managers?
  • Asset Managers?
  • Documentation Specialist?
  • Deployment Managers?
  • Software Architect
  • Data Architect
I'd say that new company's starting out could forgo at least some of them.

Now let's be conservative and add a salary of $50,000 per position, you just saved $500,000 a year.

And be realistic at $75,000 and you're now talking some big money / savings.

The job details will  need to get dispersed to other people however, most likely the programmers.

Do you think all the people are overhead, or do they return value?

Cost or asset?

You decide!


New Employees Are Happy

Sure a new employee is helpful, friendly, enthusiastic, eager, positive, etc.

It's only after he's been there a while.

And seen the egos, the lack of assistance, the infomation hoarders, the gossipers, the backstabbers, the work dumpers, the lazy ones, that he gets a real feel for the work environment.

And then it becomes an effort of how to handle the new environment.

You can do what I used to do, and that is to switch jobs every time you cross paths with some jackass.

Except that's bound to happen at every work place.

So the best bet is to do your job, ignore the morons and make the best of it.

And there you have it.


Namespaces for Public Community Data

When we talk about Community data, we often think of sets of data available for download from a particular institution.

However, when mashing data, I think it would be optimal to stamp that data from where it came from.

Similar to how we put "Namespaces" on web services to identify the unique location on the web.

Do the same with Universal Data.

In addition, instead of downloading that data once, mapping it out and mashing it up with your local data, why can't we do joins across the Internet, similar to Web Services.

That way you could say:

Select u.FirstName, u.LastName, u.UserID, d.DriversLicNbr
From Server.Database.owner.User u
Inner Join [Namespace=DMV].[Server].[Database].[Driver] dr
On u.UserId = dr.UserId

Basically, a 5 part naming convention instead of a 4 part naming convention.

Similar to calling a database in the Cloud, we could then call Universal Public Database in the Cloud.

Free the data.  Expose the data for user consumption.  Imagine the world of interconnected data.

Think of the possibilities.  Mashing data from the Federal Level, to the State level, to the County Level and back again.

Such potential!

Consider the Report Writers When Designing Applications

Something I learned early on in the world of IT and programming.

People spent a lot of time up front designing the web application and designing the database.

And then the system goes live.

And whoops, we forgot about the reports.

"Here Mr. Report man, figure out how to get the data out of the database."

Oh by the way, the requirement is to merge our data with all the other data in the company.

Except we didn't think it through well enough to put hooks into the database to allow for this.

So we'll just throw it over the fence and let you figure it out.

And there you have the world of reporting, at least from what I've seen.

It would be great if the people designing the system considered the report writers in architecture and design by looking at the entire ecosystem.

Databases used to exist in silos.

And we paid high priced consultants to converge the data into a Data Warehouse.

Which was rigid, slow and costly with steep upfront costs, software costs and maintenance costs.

We really need to think things through about how the data will be handled downstream.

With tools like Tabular Model and Power Pivot, we can now easily mash data across distant data sets.

However, without a "hook" on common fields, this is still a difficult task.

So going forward, include the lowly report people into your design sessions.

It will save time downstream and the insight obtained will be invaluable.

Not sure if the application people have realized the transition, but the lowly report writer has always had a direct link to the top executives to get the data out of the database.

More so now trying to get ROI and insights.

The org can easily outsource the front end programming and even the database maintenance and support.

Except Data Engineers / Report Writers / Data Scientist will still have jobs long after you've been outsourced.

So get with the program and play nice.

And that's about it!


Repetitive vs Free Form

For some reason I find myself doing the same thing over and over.

I don't listen to the radio so I have a CD in the car.  I listen to the same CD over and over and over for months.

And when searching the web, I only check about 4 or 5 sites.  Over and over.  I almost never venture out into the vast ocean of the internet on a search and discover mission.

When ordering food, I tend to order the same meal without change.

I cut my own hair for 15 years, always the same.  Until recently, starting to get hair cuts from the lady who cuts my wife's hair.

However, some things are very free form.

Like driving to and from work.  I never go the same way twice.  Always trying new routes see different scenery watch the different traffic flow patterns.

My workday is never the same twice.  If it were, I'd be looking for a new career / job.

So I suppose some things are repetitive and some are free form.

A bit of conservatism mixed with a bit of wild freedom.

A conservative rebel.

Programming = Freedom

Being a programmer means taking responsibility for your job and your work day.

What that means is the boss is not breathing down you neck.

Telling you what to do, how to do it, when to do it, in what order.

They leave that up to you.

Because they don't micromanage their workers.

Partly because they don't know exactly what programmers do explicitly.

They are at the mercy of the technology.

And that gives us a fair amount of freedom to complete our jobs.

And we have to manage ourselves.

Each day I go to work, I'm responsible to get my work done.

It's up to me to decide what to work on based on priorities.

And I get to decide how to tackle the project.

At what pace.

In the manner which is good for me.

You see, they give us the freedom.

And we produce quality work on time within budget.

It's pretty simple.

Yet there will always be some smart or lazy person who takes advantage of the system.

Milking the projects, blowing smoke up the bosses you know what, telling the customer's they can't have certain features or technology because they don't want to do it, not because it's not possible.

Anyhow, being a programmer has tremendous freedom inherent in the position.

That's what I like the most about being a programmer.

Oh yeah, and to be of service :)


LinkedIn Recommendation Sort Of

Well, when I left the School Board I asked to be endorsed for some of the work I did.

I got some feedback today.

Except the only problem is somehow I got disconnected from the person.

So when they gave me "Recommendation" it came in as a message, not a recommendation that can be posted on LinkedIn.

Not sure if I'll bother the person to correct the issue, but here's her email:

Date: 11/08/2012
Subject: RE: Can you endorse me?
Jon, is an awesome project manager and is one of the best critical thinkers we had in MIS. He has the ability to look at situations from all angles to solve problems. 


Why Be a Programmer?

Most people work for a living.

Some people do data entry, some cook french fries, some defend people in court and other type on a computer all day long.

These computer people are called programmers.

And they are the unfortunate souls who must think for a living.

Swimming in details all day every day.

And logic too, how horrendous.

And trying to understand the customer's never ending wish list and whims and changes.

And the technology changes every year or two with steep learning curves.

Why would anyone in the world want to become a computer programmer.

Because they get to think every day.

Because they swim in the details every day.

Because they live and breath by logic in an un-logical world.

Because they they get to interact with customers.

Because they they never stop learning.

And most of all, because they get to solve problems, different ones every day, using nothing but their wit, creativity and curiosity.

Personally, why wouldn't you want to be a programmer.

Best job on planet Earth.

Technology, Business and Value

People think that programmers sit in a 4 x 4 cube and write code all day.  Yet this is clearly not the case.  We are assigned projects, interact with customers, gather specs, WRITE CODE, have meetings, test the code and deploy to production.

Yet when getting a new project, one should really look at it from three angles:  Technology, Business  and Value.  If you become an expert in all three areas, you will become valuable asset to the org.

Here's some questions on a fictitious project to show an example.

Let's say you are on a conference call with a person from another department.  You were told that they own a set of data.  You would like to interrogate them to see if that data could be of value.  Your existing project is to gather as much data as possible, mash it together, to create a complete customer profile from start to finish.

What server is it stored on?
What type of web server?
What is the IP Address?
Is it a relational database or log files.
What protocol is used to transmit data files?
What language are the websites?
How often are the backups available?
How can we obtain your production data?

What are the business rules for the company?
What are the products we sell?
Who sells our products?
Who creates the products?
Who do they report to?
What is the pricing matrix?
Who heads that department?

What information are you capturing?
How can we use the data?
What does the data contain?
How can we mash the data?
What insights can be captured?
Can we get the big picture from small details?

I see lots of project meetings where people spend 25% introductions, 50% re-defining the project when everyone already knows the scope, then racing through the difficult details in 25% of the time.  A real time waster.

By separating the spec gathering into three distinct buckets, you can really get to the guts of the problem faster.  When speaking with a business knowledge expert, the trick is to get them to open up and explain how things work from the Technology perspective, from the Business perspective and from the Value perspective.

So instead of weaving and wandering through the conversation, you can approach the project systematically with intent.

Get to the point.  Get the information.  And get busy.

Top 13 List of IT Musings

  1. Technology changes often.
  2. Technology often become more complex.
  3. The rate at which technology becomes more complex becomes increasingly smaller over time.
  4. Users are never happy, for long periods of time.
  5. Paid training is hard to find.
  6. There's always someone who knows more than you.
  7. Most programmers prefer to develop than to maintain code.
  8. Business problems are constant, we just throw new technology at them.
  9. Some Senior IT people are not that smart, some entry level people are brilliant.
  10. Bullshit artists will always find a home in IT smooth talking the technological ignorant.
  11. Your personal online information is already being sold today.
  12. IT will continue to get blamed for all the problems.
  13. And last but not least, you get what you pay for.

Meeting Preparation

Everyone jokes about the wastefullness of meetings.

We sit there while some people babble on about trivial matters and usually go off topic first chance they get.

When people would tell me they are preparing for a meeting, I used to think, what does that mean.

You just show up for your meeting, state your case, get some more action items, then go back to work.

Except sometimes you really do need to prepare for a meeting.

Because people have expectations of discussing specific matters and if you're not prepared it really shows.

So I plan to give more attention to meetings and preparing for them.

I used to think people were full of BS for preparing, some excuse to not do real work.

Except in the mad dash to cross everything off the to-do list, you can actually lose points for showing up unprepared in some instances.

That's all I'm saying!


Report Writer vs Business Intelligence vs Data Scientist

So with all the discussion lately on Reporting, Business Intelligence and Data Scientist I've been wracking my mind to find a distinction.

And here's what I've come up with for my own personal definitions.

When thinking about data there is the technical side.  One who extracts data into a report is a Report Writer.

The next step is if you know how the business works and combine that with Report Writing, ETL, Data Cleansing, Data Governance and Data Mashing then you are a Business Intelligence Analyst.

The third part of the equation is the Analysis side.  If you know how the business works and combine the technical side of BI and then go a step further by adding Analytics, Data Mining, Forecasting, Predictive Models to derive Hypothesis then you are a Data Scientist.

A Report Writer can gather spec, translate into code and deliver to customer.

A Business Intelligence person can explain the report in Business terms and track the data back to it's source.

A Data Scientist can explain the report in Business terms and then explain the patterns and meaning behind the report, give recommendations on how to steer the org and slice the data in meaningful ways to provide different levels of insight.

I'm sure we could go into a lot more detail than these simple explanations.

However it boils down to three things:
  1. Technology
  2. Business
  3. Analysis
Can you work with data?  Can you relate the business aspects of the org with the data?  Can you find insight for your organization through technology.

Everything starts with the data.

That's the way I see it.

Data is collected in at unbelievable rates.  Costs have become lower for hardware, for software and now huge volumes of data are now available to the "Average Joe Developer".  Many organizations can find Report Writers, Business Intelligence Developers and Data Scientist to sift through the data, mash with alternate data sets, Mine the data for patterns, Predict future behavior / events by finding the Golden Nuggets of Insight just sitting there in the data, waiting to be found, unearthed.

So what are you waiting for, start digging through your data and get the competitive advantage!

Add a comment, share your thoughts...