11/30/2012

Some More Fuzzy Logic

I've been playing with some Fuzzy Logic code.

Basically, you tell it to match on DomainName (from the user's email address) plus their Country and it looks for matches based on that.

Then the Fuzzy part, you tell it the Company name to search for and it looks for matches based on similarities based on percentages.

DECLARE

@SimilarThreshold FLOAT

-- Jaro-Winkler returns a value between 0 and 1, the closest to 1

-- the more similar it is. This variable allow us to ignore matches

-- with a lower score.


SET


@SimilarThreshold = 0.825;

So you tell it to match on 0.825 and it only returns data equal or above that range for matching percentages.

I've been running the code for 375k rows and it runs for about a half hour give or take.

After it renders results, I can identify the percentage of matches found, and adjust the percentage accordingly.

Actually, I'd prefer to keep the percentage higher than lower, to make this code air tight.

However, once I get the figures in place, then I can begin testing lower percents as well as do some Fuzzy logic for other fields, like IP Address, Address, etc.

This really opens up a lot of doors when trying to match desperate data source accross servers that otherwise would be able to mash.

Fun stuff for a Friday!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Thoughts to Ponder