10/03/2013

Compilation of Research on SSIS Performance and Fuzzy Matching

This blog post is in no way an attempt to steal other people's work.  It's basically an conglomeration of notes from research I did this morning to assist on an upcoming project to troubleshoot slow performing SSIS packages, specifically surrounding Fuzzy Lookups.
 
 
Fuzzy Lookup and Fuzzy Grouping in SQL Server  Integration Services 2005
 
the error-tolerant index (ETI) is built. The ETI is the main data structure that Fuzzy Lookup uses at run time.
 
After the ETI is built, all the input rows are processed and the results are written to the destination. SSIS Designer gives you feedback about the progress of the pipeline by displaying the number of rows processed by each component. You can also place a Data Viewer on the pipeline by right-clicking the path between Fuzzy Lookup and the OLE DB destination. This allows you to visualize in real-time which rows Fuzzy Lookup has matched to your input rows. In addition to matching tuples, Fuzzy Lookup outputs a record-level similarity, individual column-level similarities, and a confidence score for each match result. For more information about these scores, see Interpreting the Results later in this paper.
 
Because ETI construction for a very large reference table may take a non-trivial amount of time, Fuzzy Lookup offers the option of storing an ETI on the server and re-using it at a later date. This option takes a snapshot of the reference table and allows you to avoid re-building an ETI every time you run Fuzzy Lookup. If your ETI takes too long to re-build for every run, consider creating it once and re-using it in subsequent runs. To do this, select Store new index on the Reference Table tab, and then specify a table name.
If you would like to store your ETI, but your reference data changes from time to time, you can also enable Maintain stored index. This feature installs a trigger on the reference table that detects modifications and propagates them to the ETI, keeping it up-to-date. If you do not install table maintenance, Fuzzy Lookup will match against a snapshot of the reference table as it existed when the ETI was created.
 
For datasets which have attributes whose values are predominantly a single short token, one alternative, if Fuzzy Lookup is having trouble finding matches which you think it should find, is to set the Exhaustive component property to True. This will cause Fuzzy Lookup to ignore the ETI and instead directly compare the input record to each and every record in the reference table. This approach is prohibitively time-consuming for large reference tables, so only attempt exhaustive search on small reference tables
In determining best matches, the most important parameter is the MinSimilarity threshold. You can set this custom property by using the Fuzzy Lookup UI. A reference tuple will be returned only if it has a Similarity that is greater than or equal to the MinSimilarity threshold. By setting a high similarity requirement, Fuzzy Lookup will consider fewer candidates and, as a result, may not return any matches. If you set MinSimilarity low, Fuzzy Lookup will consider more candidates and may be more likely to find a match, but the search could take longer.
 
If you want to view more than the single best match for each input, set the MaxOutputMatchesPerInput property to a value larger than one. Fuzzy Lookup will then return up to that many matches for each input row. Note that increasing the value of this property may increase the time it takes to process each input row. For more information about the impact on performance, see Understanding Performance later in this paper.
 
Fuzzy Grouping uses Fuzzy Lookup under the covers to perform the grouping
 
Understanding Performance:
 
 
Despite the simple interfaces, Fuzzy Lookup and Fuzzy Grouping are complex processes, and understanding their performance requires some analysis. The following sections explain common performance issues and provide sample measurements. As you will see, the primary determinant of Fuzzy Lookup and Fuzzy Grouping performance is data size. For Fuzzy Lookup and Fuzzy Grouping, this means the number of input rows, tokens, and bytes. For Fuzzy Lookup, there are additional considerations regarding the size of the reference data. The secondary determinant is the computing resources available, which includes memory, space on the database server, network bandwidth, and the way these resources are set up.
Setup Considerations
To obtain the best performance from Fuzzy Lookup and Fuzzy Grouping, it is useful to follow some guidelines regarding memory use and architecture.
To minimize memory requirements on your computer:
  • Use the more lightweight DTExec.exe rather than the full SSIS Designer to execute packages in production.
  • Drop unused columns in your pipeline because they require memory.
If you elect to run the SSIS client and the server on the same computer, you will avoid potential network problems but may encounter memory contention. You can use sp_configure and set the set max server memory setting to 256 to mitigate memory contention between the two processes. You can also set the MaxMemoryUsage Fuzzy Lookup custom property using the Advanced Editor in the SSIS designer.
All experimental trend-lines reported in the following sections were obtained using the following setup:
  • A server that has an Intel Pentium III 1-gigahertz (GHz) processor, 512 megabytes (MB) of RAM, and Microsoft Windows Server 2003 Enterprise Edition.
  • SQL Server running on the same machine as the SSIS client.
For recurring Fuzzy Lookup tasks in which the reference table is considerably larger than the typical input table, you should consider pre-computing the index. In those cases, re-building the index could dominate the running time of the actual lookup and make it worthwhile to manage the additional table.
By default, Fuzzy Lookup will load the ETI and reference table into available memory before starting to process rows. If you only have a few rows to process in a particular run, you can reduce this time by setting the WarmCaches property to False.
For large Fuzzy Lookup input tasks, you may want to consider using multiple SSIS clients together with a central server. In this setup, the central server hosts the reference table and the pre-computed ETI. Each SSIS client performs Fuzzy Lookup on a piece of the large input table tasks by using the central reference table and ETI. After the Fuzzy Lookups are complete, you re-merge all the output tables produced by the individual SSIS clients. You can also do this in a single package by using the conditional split transform to partition your input into multiple parts and then send each part into a separate Fuzzy Lookup transform. Since SSIS uses multiple threads for such pipelines, the matching process will run in parallel if your system has multiple processors. Note that, as of this writing, the following workaround is necessary to achieve full performance in this scenario. After each conditional split output, you must send the data into a Union All transform before feeding into each Fuzzy Lookup. This forces the pipeline to treat each output asynchronously and thereby use a separate thread for each conditional split output.
Note   This architecture does not apply to Fuzzy Grouping.
 
Measurements
The greatest impact on Fuzzy Lookup and Fuzzy Grouping performance comes from the size of the data used. In the case of Fuzzy Lookup, this means the size of the reference and input tables. In the case of Fuzzy Grouping, this means the size of the input table. The size of the input matters in two ways:
  • The number of rows and columns has the greatest impact on performance. The more data you have, the more resources Fuzzy Lookup and Fuzzy Grouping require. The figures in the following sections show specific data for various scenarios.
  • The average number of tokens per string column on which a fuzzy match is performed also has an impact on performance. Fuzzy transforms are not meant for document retrieval. For longer fields (greater than 20 tokens), it might be more efficient to use the SQL Server full-text indexing features.
The following sections present some detailed graphs for Fuzzy Lookup and Fuzzy Grouping performance. These graphs are intended to suggest trends for the transforms and not to give absolute numbers. Some graphs express the proportion of increase or decrease over a baseline measurement. In those cases, the graph can be used to infer roughly what input size increase would lead to a doubling in running time.
 
Fuzzy Lookup Performance
The two main variables that determine Fuzzy Lookup performance are the size of the reference data and the size of the input data. These variables correspond to the two phases of Fuzzy Lookup: building the ETI and actually performing the lookup. In general, these tasks are linear in the size of their respective inputs. Depending on the size of your input, the actual lookup time may be smaller or larger than the ETI building time.
Figure 3 illustrates the time required to build the ETI versus the size of the reference table. As explained earlier, building the ETI can be performed once and the results saved, so as to avoid paying the cost every time. The size of the resulting ETI is at most twice the size of the indexed reference columns. You can track progress by observing the progress messages reported in the Execution Results tab of the SQL Server Business Intelligence Development Studio.
The size of the ETI also affects the run time. The more data the ETI contains, the more expensive it is to use it because it represents a larger search space. Figure 4 illustrates the run time performance for Fuzzy Lookup versus the size of the reference table for a fixed input size. The curve in Figure 4 also depends on the distribution of tokens to various rows which can play a large role in lookup times. Having more memory can mitigate the effects of a large reference table and ETI. At run time, Fuzzy Lookup caches portions of the ETI in memory up to the constraints specified in the MaxMemoryUsage custom property. You must use the Advanced Editor to set this limit.
 
Figure 4. Fuzzy lookup runtime versus reference table size
After data size, the greatest impact on performance comes from the remaining transform parameters, such as number of matches to return, similarity threshold required, and number of columns on which to match.
The more matches Fuzzy Lookup is required to return, the slower the search is. This is because the search continues into the list of candidate tuples until enough matches are found.
The greater the match similarity that is required, the faster the Fuzzy Lookup search will be. This is because Fuzzy Lookup can discard candidate matches more aggressively during searches that require a high similarity.
Fuzzy Grouping Performance
Fuzzy Grouping depends on fewer variables than Fuzzy Lookup. The main variable that affects performance for Fuzzy Grouping is the size of the input data. Figure 5 shows the cost as the input size becomes large. The non-linearity in the early part of the graph is due to the ETI build-time. As the match threshold is increased, execution time eventually drops because no close tuples are found to create groups. This is illustrated in Figure 6. Figure 6 also shows that the distribution of the data plays a large role and may result in some non-linearities in running-time. Be aware that running a large Fuzzy Grouping task will result in large temporary objects on the server that you specify as your temporary connection. During the Fuzzy Grouping pre-processing step, you may track progress by viewing the progress messages reported in the Execution Results tab of the SQL Server Business Intelligence Development Studio.
 
Good for troubleshooting, otherwise slow it down.
 
 
Flat files & other Sources
Leave Unneeded Flat File Columns as Strings – don't convert them to dates etc. unless absolutely necessary
  • Only Parse or Convert Columns When Necessary – Reorganize the data flow to eliminate the Type Conversion Transform if possible. Even better, if possible, modify the source column data type to match the type needed in the data flow.
  • Use the FastParse Option in Flat File Source – Fastparse is a set of optimized parsing routines that replace some of the SSIS locale-specific parsing functions.
  • Eliminate Unneeded Logging – logging is useful for debugging and troubleshooting but, when deploying completed packages to production, be mindful and careful about the log entries you leave enabled and the log provider you use. Notably, OnPipelineRowsSent is somewhat verbose.
File Systems can impact performance if writing or reading from network share
What else is running no the machine?
  Using CPU/Memory?

Use Lookup Partial or Full Cache Mode
Depending on the requirements and the data, you should choose one of these two modes to speedup the Lookup. Partial cache mode is useful when the incoming data is repetitive and only references a small percentage of the total reference table. Full cache mode is useful when the reference table is relatively small and the incoming data references the full spectrum of reference table rows.
Lookup, Aggregate and Sort Transforms
While performant for what they do, and important when absolutely necessary, these transforms invariably degrade data flow performance. If possible, eliminate them from your data flows. Sort and aggregate data at the source, and use the MergeJoin Transform instead of the Lookup, if possible.
Increase DefaultBufferMaxSize and DefaultBufferMaxRows
Increasing the values for these two properties can boost performance by decreasing the number of buffers moving through the data flow. However, you should avoid increasing the values too much to the point where the Execution Engine starts swapping out buffers to disk. That would defeat the purpose.
Implement Parallel Execution
Both the Execution Engine for the Dataflow Task and the Execution Engine for the Control Flow are multithreaded.
  • Use EngineThreads Property – controls the number of worker threads the Execution Engine will use. The default for this property is 5. However, as you now know, by simply adding a few components, data flow thread requirements will quickly exceed the default. Be aware of how many threads the data flow naturally needs and try to keep the EngineThreads value reasonably close to it.
  • Set MaxConcurrentExecutables – if there are multiple Dataflow Tasks in the Control Flow, say 10 and MaxConcurrentExecutables is set to 4, only four of the Dataflow Tasks will execute simultaneously. Set, test and measure various value combinations of this property and the EngineThreads property to determine the optimal setting for your packages.
Increase “INSERT” performance:
Turn on Table Lock
This option is available in the OLEDB Destination Editor. Selecting "Table Lock" also enables fast load, which tells the adapter to use the IRowsetFastload bulk insert interface for loading.
NOTE:
Fastload delivers much better performance, however it does not provide as much information if there is an error. Generally, for development you should turn it off and then turn it on when deploying to production.
Disable Constraints
This option is also available in the OLEDB Destination Editor by unchecking the "Check constraints" option.
Identify and prioritize the constraints

The most common high priority constraints are, in no particular order:
  • Amount of Memory Installed - Perhaps the most important factor for processing complex data.
  • CPU speed - The raw processing power.
  • Number of CPUs - More CPUs widen the processing pipe.
  • Disk IO speed - Using slow IDE or optimized RAID drives? Important for big data throughput.
  • Network Speed - Impacts cross network loads, extracts and queries.
Measuring with Performance Counters
Performance counters are useful for measuring internal or otherwise hidden processes. Integration Services provides the following useful performance counters.
  • Buffers in use
  • Flat buffers in use
  • Private buffers in use
  • Rows read
  • Rows written
  • Buffers spooled
The most useful of these for performance measurements is Buffers Spooled. Values greater than 0 indicate that the Execution Engine has swapped buffers to disk. You should always try to avoid swapping out buffers to disk, which is usually an indication that you don't have enough memory or that you need to rearrange your data flow.
Best Practice #9 - How DelayValidation property can help you
SSIS uses validation to determine if the package could fail at runtime. SSIS uses two types of validation.  First is package validation (early validation) which validates the package and all its components before starting the execution of the package.  Second SSIS uses component validation (late validation), which validates the components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE) property.  If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution.
Collecting performance counters and using SQL Server to analyze the data
 
SSIS Performance Tuning: Max Default Buffer Rows
 
SSIS Dataflow Performance tuning
Video sqlbits
 
 
 
Flat files & other Sources
Leave Unneeded Flat File Columns as Strings – don't convert them to dates etc. unless absolutely necessary
  • Only Parse or Convert Columns When Necessary – Reorganize the data flow to eliminate the Type Conversion Transform if possible. Even better, if possible, modify the source column data type to match the type needed in the data flow.
  • Use the FastParse Option in Flat File SourceFastparse is a set of optimized parsing routines that replace some of the SSIS locale-specific parsing functions.
  • Eliminate Unneeded Logging – logging is useful for debugging and troubleshooting but, when deploying completed packages to production, be mindful and careful about the log entries you leave enabled and the log provider you use. Notably, OnPipelineRowsSent is somewhat verbose.
File Systems can impact performance if writing or reading from network share
What else is running no the machine?
  Using CPU/Memory?

Use Lookup Partial or Full Cache Mode
Depending on the requirements and the data, you should choose one of these two modes to speedup the Lookup. Partial cache mode is useful when the incoming data is repetitive and only references a small percentage of the total reference table. Full cache mode is useful when the reference table is relatively small and the incoming data references the full spectrum of reference table rows.
Lookup, Aggregate and Sort Transforms
While performant for what they do, and important when absolutely necessary, these transforms invariably degrade data flow performance. If possible, eliminate them from your data flows. Sort and aggregate data at the source, and use the MergeJoin Transform instead of the Lookup, if possible.
Increase DefaultBufferMaxSize and DefaultBufferMaxRows
Increasing the values for these two properties can boost performance by decreasing the number of buffers moving through the data flow. However, you should avoid increasing the values too much to the point where the Execution Engine starts swapping out buffers to disk. That would defeat the purpose.
Implement Parallel Execution
Both the Execution Engine for the Dataflow Task and the Execution Engine for the Control Flow are multithreaded.
  • Use EngineThreads Property – controls the number of worker threads the Execution Engine will use. The default for this property is 5. However, as you now know, by simply adding a few components, data flow thread requirements will quickly exceed the default. Be aware of how many threads the data flow naturally needs and try to keep the EngineThreads value reasonably close to it.
  • Set MaxConcurrentExecutables – if there are multiple Dataflow Tasks in the Control Flow, say 10 and MaxConcurrentExecutables is set to 4, only four of the Dataflow Tasks will execute simultaneously. Set, test and measure various value combinations of this property and the EngineThreads property to determine the optimal setting for your packages.
Increase “INSERT” performance:
Turn on Table Lock
This option is available in the OLEDB Destination Editor. Selecting "Table Lock" also enables fast load, which tells the adapter to use the IRowsetFastload bulk insert interface for loading.
NOTE:
Fastload delivers much better performance, however it does not provide as much information if there is an error. Generally, for development you should turn it off and then turn it on when deploying to production.
Disable Constraints
This option is also available in the OLEDB Destination Editor by unchecking the "Check constraints" option.
Identify and prioritize the constraints

The most common high priority constraints are, in no particular order:
  • Amount of Memory Installed - Perhaps the most important factor for processing complex data.
  • CPU speed - The raw processing power.
  • Number of CPUs - More CPUs widen the processing pipe.
  • Disk IO speed - Using slow IDE or optimized RAID drives? Important for big data throughput.
  • Network Speed - Impacts cross network loads, extracts and queries.
Measuring with Performance Counters
Performance counters are useful for measuring internal or otherwise hidden processes. Integration Services provides the following useful performance counters.
  • Buffers in use
  • Flat buffers in use
  • Private buffers in use
  • Rows read
  • Rows written
  • Buffers spooled
The most useful of these for performance measurements is Buffers Spooled. Values greater than 0 indicate that the Execution Engine has swapped buffers to disk. You should always try to avoid swapping out buffers to disk, which is usually an indication that you don't have enough memory or that you need to rearrange your data flow.
Best Practice #9 - How DelayValidation property can help you
SSIS uses validation to determine if the package could fail at runtime. SSIS uses two types of validation.  First is package validation (early validation) which validates the package and all its components before starting the execution of the package.  Second SSIS uses component validation (late validation), which validates the components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE) property.  If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution.
Collecting performance counters and using SQL Server to analyze the data
 
SSIS Performance Tuning: Max Default Buffer Rows
 
SSIS Dataflow Performance tuning
Video sqlbits

 
 
Avoiding reading unnecessary columns
Create Queries instead of entire table
Filter data in Source instead of “CONDITIONAL SPLIT”
8) Beware of Non-blocking, Semi-blocking and Fully-blocking components in general
The dataflow consists of three types of transformations: Non-blocking, Semi-blocking and Fully-blocking. And as the names suggests, use Semi-blocking and Fully-blocking components rarly to optimize your packages.
Jorg Klein has written a interesting article about it with a list of which component is non-, semi- or fully blocking.

 10) SQL Server Destination Adapter vs OLE DB Destination Adapter
If your target database is a local SQL server database, the SQL Server Destination Adapter will perform much better than the OLE DB Destination Adapter. However the SQL Server Destination Adapter works only on a local machine and via Windows security. You have to be absolute sure that your database stays local in the future otherwise you mapping will not work when moving the database.
create benchmarkings.
Make a note of time taken to execute for every task and process in the ETL life cycle.
·         Make a note time taken by source only
·         Make a note of time taken by tranformations
·         Make a note of time taken by destination loading
Know the ecosystem, operating system, network, file system, etc.
 
Goal: More effective, less risky changes
 
For bulk operations, Experiment with adjusting packet size –SQL Default 4096 - network add “jumbo frames”,


Don’t use same connection for ole db command and bulk inserts, keeps the packet size different, ole lower, bulk higher

Check out Blog post: “Scaling network traffic with windows” on MSDN site
 
Easiest, tune the queries,
Use With (nolock) – reading Dirty Data for faster performance
Select only the columns you need, don’t use SELECT * from TABLE
                This includes doing “LOOKUPS”, just grab the fields required + use with (nolock)
Push “JOINS” upstream to the source query
Splitting Execution Trees was depreciated in SQL 2008, don’t use!
Aggregates, don’t use 2 or more in same flow, rework the query upstream
Make Date Types as small as possible
Use “CAST” or “CONVERT” in the Data Source speeds up
Only use “SORT” when merging two data sources, otherwise push to ORDER BY in SQL
Use “GROUP BY” instead of “AGGREGATES”
Use “MERGE” instead of SCD
Use “INSERT INTO” instead of Data Flow because all work done on Server
Destination – use OLE DB option
with commit size = 0
Drop Indexes when loading, (not when using single Clustered Index, time taken to drop and recreate now worth it)
only drop & recreate indexes when size increases 100%
Bulk loads lock the destination table – keep in mind
If job running slow, what other jobs are touching the destination table
When doing reloads, use “TRUNCATE” not “DELETE”

Integration Services: Performance Tuning Techniques
 
The run-time engine
For the most part, the performance of the run-time engine is most heavily influenced by conditions external to SSIS, such as network bandwidth, and interaction with external systems such as database servers, FTP servers, or email servers.”
 
“the Execute SQL Task is more dependent on the performance of the query execution than on the SSIS run-time engine”
 
The data flow engine
With data integration solutions, you will likely spend a large part of your performance tuning time optimizing the data flow engine”
 
“there are a variety of settings that you can manipulate to tune the data flow engine’s performance based on the requirements of the data integration operations”
 
Buffer Usage
“the data flow engine manipulates data as it is transferred from source to destination”
 
SSIS attempts to reuse data from prior buffers as much as possible”
 
Row Transformations
Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations. Row transformations have the advantage of reusing existing buffers”
 
Partially blocking transformations
Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations”
 
Examples: “Merge, Merge Join, and Union All”
 
“the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow”
 
Blocking transformations
must read and process all input records before creating any output records”
 
considered to be asynchronous”
 
“a new buffer is created for its output and a new thread is introduced into the data flow.”
 
“a special type of asynchronous component” - “RDBMS source component creates two types of buffers: one for the Success output and one for the Error output”
 
Execution Trees
 
“At run time, the data flow engine breaks down Data Flow task operations into execution trees.”
 
“Each tree creates a new buffer” – “additional memory is required”
 
“Note that you will not see the execution trees until you execute the package” – “appear in the Log Events window”
 
Buffer Sizing
“using row transformations where possible to limit the number of buffers that are created and used”
 
“overall goal is to pass as many records as possible through a single buffer while efficiently utilizing memory.”
 
Estimated Row Size
“You can shrink the row size by identifying the smallest possible data types for all of your columns as early in the data flow as possible”
 
especially important for flat file data sources”
 
DefaultMaxBufferRowsDefaultMaxBufferRows
You should not configure this setting without understanding how it relates to DefaultMaxBufferSize
 
DefaultMaxBufferSizeDefaultMaxBufferSize
 
MinBufferSize – While MinBufferSize is not configurable
 
 
Buffer guidelines
In practice, you must test these settings based on your own environment, but you can start with the following general guidelines.
·         Reduce your Estimated Row Size as much as possible by removing any unnecessary columns and configuring data types correctly. Any opportunity that you have to reduce the size of the source data set before operations begin saves memory resources.
·         Start with the SSIS default settings for DefaultMaxBufferRows and DefaultMaxBufferSize. Turn on package logging with the BufferSizeTuning property enabled. This property adds information to the log that shows you where SSIS has adjusted the buffer size. You will see entries like the following
·         If your data integration is similar to Scenario 1, you will see: Rows in buffer type 0 would cause a buffer size greater than the configured maximum. There will be only 383 rows in buffers of this type.
·         If your data integration is similar to Scenario 2, you will see: Rows in buffer type 3 would cause a buffer size less than allocation minimum, which is 65536 bytes. There will be 1365 rows in buffers of this type. Note that 65536 bytes is the MinBufferSize of the machine on which this package executes.
·         Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a buffer as possible. Setting these values too low causes SSIS to create many small buffers instead of fewer but larger buffers, which is a great scenario if you have enough memory.
·         As you tweak the DefaultMaxBufferRows and DefaultMaxBufferSize, realize that once the MaxBufferSize is exceeded, the setting for MaxNumberofRows no longer matters because SSIS always scales down the number of records per buffer to maximize memory utilization.   
·         Note that the DefaultMaxBufferRows and DefaultMaxBufferSize are configured separately for each Data Flow task. When you integrate data from multiple data sources using a single Data Flow task, these two settings will only impact the data source components and transformations in that task. Also note that the determination of how many rows per buffer is done per buffer type.
Parallelism
“Parallelism is a great technique to improve the performance of your data integration operations”
 
Configurable settings
“Within SSIS, the control flow for each package is controlled by a setting called MaxConcurrentExecutables, which specifies the maximum number of SSIS threads that can execute in parallel per package. By default, this is set to -1, which translates to the number of logical machine processors plus 2.
If SSIS runs on a dedicated server and you have a lot of operations that run in parallel, you will likely want to increase this setting if some of the operations do a lot of waiting for external systems to reply. On the other hand, if you do not have a dedicated SSIS machine and your data integration application runs alongside several other applications, you may need to reduce this setting to avoid resource conflicts.”
Measuring Performance
 
Troubleshooting
 
“A great technique to troubleshoot poor performance is to isolate the execution of various parts of your package to find the slowest operations. As you isolate the execution of various operations, you can establish a baseline that you can track over time. To isolate SSIS operations, consider the following steps”
 
1.      Establish the Overall Package Execution Speed - When you execute your package, the overall execution speed is the total amount of time it takes from start to finish.
Overall Execution Speed = Source Speed + Transformation Speed + Destination Speed
 
1.      Isolate the Source and Transformation Speed – You can find out how fast SSIS is reading from a source and performing transformations by using the following steps.
1.      Create a copy of your original package.
2.      Remove the destination associated with the source. If you have more than one destination, isolate one at a time.
3.      Replace the destination with a RowCount transformation.
4.      Measure the execution performance from Source through Transformations to RowCount.
“The result of this measurement is the Source and Transformation Speed. If this is too slow and you have more resources (such as CPU) available, consider applying some of the techniques described above (such as partitioning or introducing new execution trees) to increase the throughput.”
 
“Adding Rowcount  does not add any measurable amount of performance overhead.”
 
1.      Calculate the Destination Speed – Once you know both the Overall Speed and the Cumulative Source and Transformation Speed, you can calculate how long it takes to load into the destination as follows:
Destination Speed = Overall Speed – (Cumulative Source and Transformation Speed)
For example, with the measurements you have made in steps 1 and 2, you can calculate Destination Speed as follows:
Destination Speed = 5 minutes – (4 minutes) = 1 minute
1.      Isolate the Source Speed– In step 2 you measured the cumulative source and transformation speed.  Since this is a cumulative number, you do not know how much time is spent reading data and how much time is spent transforming data. To find out how fast SSIS is reading from a data source, perform the following steps.
1.      Create a copy of your original package.
2.      Remove all transformation(s) and destination(s) associated with a source.
3.      Replace them with a single RowCount transformation.
4.      Measure the execution performance from the Source to RowCount.
This result of this measurement is the Source Speed. This is theoretically the fastest your data flow can work. If this is too slow, you’ll need to focus your attention on optimizing the source adapter so that it returns rows faster. For example, through this process you may find out that it takes 15 seconds to read data from a file data source. When compared to the Overall Execution Speed of five minutes, you realize that 6 percent of the execution time is spent reading data. Now you can do some simple math to determine the actual amount of time spent transforming the data
 
1.      Calculate the Transformation Speed - With the isolated Source Speed and the Cumulative Source and Transformation Speed, you can calculate the Total Transformation Speed as follows:
Transformation Speed = (Cumulative Source and Transformation Speed) - Source Speed
For example, with the measurements that you have made in steps 1 – 3, you can calculate Transformation Speed as follows:
Transformation Speed = 4 minutes – (15 seconds) = 3.75 minutes
You have now identified the major speed parameters summarized in Table 2.
1.      Isolate Individual Transformations –If you need to identify the performance of a specific transformation for further troubleshooting, you can use the following steps.
1.      Create a copy of the package that you used in step 4 to isolate the Source and Transformation Speed.
2.      Remove one transformation.
3.      With all other transformations and destinations intact, replace the removed transformation with a RowCount transformation.
4.      Measure the execution performance.
5.      Compare execution performance with the original Source and Transformation speed. The difference tells you the performance of that transformation.
For example, given that more than 60% of the time is spent on transformations, you focus on finding which transformation is taking the most time. You remove a lookup transformation and find out that the Cumulative Source And Transformation Speed drops from 4 minutes to 2.5 minutes. Comparing the two scenarios, you conclude that your lookup transformation is taking 1.50 minutes.  See Table 3 for the performance summary of the isolated lookup.
Gaining Visibility
As you change and enhance your design, you will also want to take advantage of the ability to monitor and log metadata about package executions.
·         SSIS Logging – SSIS allows you to log both tasks and packages to various logging providers such as XML, SQL Server, or text files. Logging allows you to view the performance of a package over time and to track performance as machine resources change and data volumes increase. Be aware that SSIS provides rich logging support which can be expensive to turn on completely. Review the logging events available and only log those events necessary.
·         SSIS Performance Counters – SSIS provides several performance counters that you can use to gain visibility into how resources are utilized during package execution. For example, you can view the number of rows read and the number of buffers in use as the package executes. One specific performance counter that is particularly useful is Buffers Spooled. If Microsoft Windows® runs out of physical memory during package execution or the process executing the package runs out of virtual memory, SSIS begins to spool buffers to files. Once this occurs, performance will degrade significantly, so it is a good idea to monitor this setting and make sure that you have enough memory for your operations.
·         SQL Server Profiler – When you extract data from or load data into SQL Server, you can use SQL Server Profiler to review the operations and query plans that are happening behind the scenes in SQL Server. As you monitor database activity, you may find tuning opportunities in the SQL Server RDBMS, such as the need to modify your indexing scheme.

 
 

 
 

No comments:

Post a Comment

Mountain Living