For our latest project, we were tasked with performance issues using Fuzzy Logic in SSIS.
Apparently, Fuzzy Grouping and Fuzzy Lookup tax the server heavily.
Both components do a great job for small data sets, except when the number increase to let's say 13,000,000 rows.
What it does it store everything in TempDB temp tables, using cursors (uhg!) and the package we were supposed to troubleshoot looked 16 times x 4 for 48 passes over the 13 million rows.
It ran in 5 days before we got there. For both Fuzzy Grouping and Fuzzy Lookup.
Our plan of attack, investigate everything.
Hardware, software, SQL Server, Memory usage, Server configuration, background processes, VM settings, Host settings, Indexes, Queries, Locks, Blocks, Latches, Memory allocation, Paritioned tables, tweaks to SSIS components, Parallelization, etc., etc.
The first week we investigated, base lined and documented. Second week also. Then we provided a document to the Client with recommendations and findings and suggestions.
Third week we tried various combinations of suggestions. Fourth week, we were still getting errors while running the package. We ran Perfmon each run to gather metrics.
At the end of the fourth week of the contract, we finally had some success. A combination of all the recommendations reduced the package runtime to just over 11 hours, down from 5 days.
Our job was to get it under 24 hours so they could run it twice on the weekend if necessary.
So we succeeded with the contract in budget and on time.
I still think that Fuzzy logic is not the most efficient tool in the shed, however, it serves a purpose.
I enjoyed the contract. And I believe next week I'll start with a new client, building a Data Warehouse from scratch with some Crystal Reports. Can't wait!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...