Some of the biggest performance tuning wins that I have had over my career have been related to performance tuning queries that are being called from SSIS. So often the actual query performance gets overlooked in the SSIS environment. A few years ago I was able to tune a query that was being called from an SSIS package and reduce the runtime of that package from over an hour, to just 2 minutes. Its not always that big of a win, and there are other performance tuning options besides just tuning the queries, so lets take a look at some of them.
SQL Server Integration Services (SSIS) is one of those components that can help you out with your ETL (Extract, Transform and Load) work, or it can hinder your process if it is a poor performing ETL. There are many things that can easily cause an ETL to run for hours, when it should be running for minutes.
The following tips are intended as things to consider when performance tuning to improve your ETL Performance Using SSIS.
Max Concurrent Executables
The MaxConcurrentExecutables is a setting on the package level, this can be accessed from the properties window for the package.
The default setting of -1 sets the total tasks or total processes running at a time to be the number of processors + 2. On a single processor server, that gives you 3 processes. If everything that the database was doing was processor bound, this would probably be a good number, however sometimes a query running in the package can be waiting on something else, and increasing this number can give you better performance, sometimes it makes things worse. The key here is to play with it a bit to see how things run in your environment, not just a development environment, but also how it performs in your production environment. On a heaver load server, I would leave this closer to the default, on a server with minimal load, I usually increase this to be 2 to 3 times the number of cores on the server.
Use a Sequence Container to Process Tasks in Parallel
Often times when queries or tasks are organize, they are linked so that one runs after the other then after another. Sometimes this is required if one query is modifying data that is needed by the next query, but that’s not always the case. In this example, the time it takes for the package to run is the duration of Query 1 + Query 2 + Query 3 + Query 4.
If the queries are independent, and don’t rely out the output from one to the next then its possible to run them in parallel. Assume that Query 1 builds up some data that is used by Query 2, 3 and 4, but Query 2, 3 and 4 don’t depend on each other at all. Instead you can lay out your sequence container like this. When the package is run Query 1 runs first, then all 3 of the queries in the sequence container get run at the same time. The duration of the package turn into the longest duration of Query 2, Query 3, and Query 4 plus the duration of Query 1.
Depending on the time that it takes for Query 2, Query 3 and Query 4 to run, this could save a huge amount of run time when the package is executed. Sometimes just deleting the constraint between tasks (if you know its safe to do so) can save a huge amount at run time.
Remove Unwanted Columns
Bringing back the data from extra columns just takes more time to transfer the data. Instead of a SELECT *, you should specifically call out the columns you want, not only will be lead to more reliable code, but you will also get a faster running ETL since there will be less data to be transferred.
Performance Tune Your Queries
Your ETL will never run quickly if it has a slow query at its core. You can find those slow queries and pull them into SQL Server Management Studio as a way to work on th he performance.
Lets say you have an ETL that takes 3 minutes to run, but 2 minutes and 50 seconds of that are on one query. Don’t worry about the rest of the performance, take that one query into SSMS and see what you can do to speed it up there.