The use of SQL Server Integration Services (SSIS) for migrating and integrating data to Dynamics CRM allows for a level of adaptability and versatility that is unmatched in the right hands. Additionally, the customizability and versatility of SSIS allows for performance tweaking and enhancements to maximize data throughput. One of the most significant methods of increasing data throughput in SSIS is via leveraging parallel processing.
Parallel processing is simply the concept of an application balancing or spreading the workload of the application over multiple CPUs and multiple threads. Servers nowadays have many physical and logical CPUs which can be utilized in parallel to complete system workloads faster. It is analogous to moving residences. If you were to move to a new residence on your own with no help, it would take you far longer to complete the move to your new place than if you had several people helping you out.
SSIS has been able to utilize multi-threading and parallel processing for some time now, but it would have to be defined by the current set of transformations available like Conditional Splits. With the right query and logic, this can provide massive improvements in transfer speeds and record throughput, but requires configuration of each Conditional Split and query data.
There is an easy and intelligent alternative to manually setting up parallel processing in SSIS. This is through the Balanced Data Distributor add-in package for SSIS, available for free from Microsoft. Simply download the add-in component to your SSIS server and run the installer. Follow the install instructions specified from Microsoft here, and you will have access to the Balanced Data Distributor in the Data Flow toolbox.
To use the Balanced Data Distributor, you simply place one in between your source and your target. The data source going into the BDDs should be singular and the target of the BDDs should be separate target steps or target destinations (that are the same). The BDD will take the input stream and intelligently and automatically distribute the data load between the resultant targets. There is no configuration for the BDD block, the logic it contains is static. It’s that simple.
There are some considerations to take into account when using the BDD, however:
- If your data source doesn’t contain a large number of records, it may not be valuable to use a BDD since the load is easily processed by one CPU.
- If your data needs to be handled in a defined order, don’t use the BDD. No control is given over data sorting through the BDD component.
- If the source query data is very slow, or slower than the target destination in terms of processing, a BDD is not useful since the limiting factor now lies with the source query speed.
- Using the BDD assumes that your server has extra CPU cycles available on other logical or physical cores, so it is not valuable to use the BDD if your server is already maxed on CPU usage.
- The data flow paths to and from the BDD should be identical. If they are not, you are randomly routing your source query data through different data flow paths, and will have different target destination writes as a result.
Keep those in mind, write your packages intelligently, keep in mind the horsepower of the environment you are working with, and if you need assistance, as always the PowerObjects SSIS and CRM experts are just a click away. Go forth and integrate and remember… Happy CRM’ing!