Almost every CRM implementation project has a data component, whether it is migrating data from your legacy system or integrating CRM to/from another database or application. Especially integrations have become more important, the user-friendly user interface of CRM is the preferred application for users and with good integration plan the users can drive their ERP, ordering, inventory management or any back-end system without ever having to view green screens or unfriendly interfaces.
When migrating/integrating huge amounts of data, the speed may become an issue. The direct SQL writes to CRM database are not supported and writing data through web service is slower. The reason for this is that creating a record in CRM database is so much more than just ‘INSERT INTO…’-statement. The first step of optimizing is to understand what happens behind the scenes and can affect the speed:
- CRM entities usually consist of 2 physical tables.
- Cascade rules/Sharing: If created record has any relationships with cascade rules, web service will handle the cascades automatically. For example cascaded sharing will lead to additional records being created in PrincipalObjectAccess table. In case of one-time migrations, disabling the cascade rules while migration runs can save lot of time.
- Record Ownership: If you are inserting records, make sure you are setting the owner as an attribute for create and not as an additional owner assign request. Assigning owner actually takes longer time than creating the record and you would be doubling the processing time.
- Money/Time: Web Service handles currencies and time zones.
- Workflows/Plugins: If the system has any custom workflows and/or plugins, I strongly recommend pausing them for the duration of migration.
Obviously most of the old truths about optimizing SQL/SSIS stand true, these have been well documented elsewhere and I will not concentrate on those today.
However one proven SQL-optimization is rarely considered when pushing data through web service. Does it benefit from multiple CPUs and parallelism?
First of all, I tested very simple insert into custom entity. As a benchmark I tried identical job using a well know third party integration tool. The inserts ran with speed of 80k rows per hour.
I created simple SSIS package:
This ran the inserts with speed of 40k rows per hour. While running the SSIS package, I noticed that only 1 logical CPU was firing in SQL server (2 physical quad-core CPUs). Other 7 CPUs were nearly idle.
So let’s try some parallelism. Instead of chaining results of OLE DB Source directly to Script Component, I split it to multiple script components that will fire up parallel (assuming multiple CPUs)
Finally we are using some more resources from the SQL.
The speed went from 40k rows per hour to 160k rows per hour. So clearly the answer is that even if the inserts are through web service, they still can run parallel and benefit greatly from it. Note that the more your SSIS uses resources, the less there is available for users. You will need to balance your optimization based on whether it is already live CRM and whether the run times are off-hours for the users.
Happy Integrations and CRM’ing!