Hey, what should we do tonight? The same thing we do every night… try to migrate all the data in the world!
Once, we were creating a migration process for a database that had no fields to verify or control changes. Thus, we had to consider a different option, and someone mentioned using HashBytes. A-ha!
HashBytes is a function that allows you to generate an encryption code for any information that is transferred in the string parameter, and where you can make the decision about which algorithm you want to use in the process.
How is the syntax of this function?
This function receives two parameters:
- The first parameter is the name of the algorithm to encrypt. There are 7 algorithms for them (MD2, MD4, MDS, SHA, SHA1, SHA2_256, SHA2_512)
- The second parameter is the string we want to encrypt. We can concatenate multiple values. The only thing required is that the parameter sent is a string.
When I send these parameters, the function returns a varbinary (maximum 8000 bytes).
OK, so we know what this function can do, but how can we use it in our migration process? The solution was to separate the main issues into four items and attack each of them separately.
- How can we identify when a record was created or modified?
- In the Migration/Integration process, we need to validate all the fields of data source because it is not always necessary to move all the information to the target system.
- How can we build the string to generate the hash with Hashbytes and make it an automatic process?
- How can we use the encrypted information in our process?
To solve the first question, we created two new fields to validate and control the changes in our cross-reference table. We did this so we could have control if we needed to create a new record or update the existing record in the target system.
The new fields are PO_HashBytes_ORI, where you save the encrypted code when the migration process is executed or when the record is new/modified. Great! We can now identify our records, but this is just a small part of our bigger issue.
For the second item, the solution was to validate all the fields we need to move in the migration/integration process that had been modified in the records. Once this was done, we could create a new table with the name of the fields of each entity that we needed to move from the source system to the target system.
Next, we populate this table with all the fields that we need to use in the process. To do this, we create a SQL Dynamic to build all the insert sentences necessary to insert all the rows required.
We now have the validation/control fields needed in our process. Next, we need to build strings to execute the HashBytes function.
We need to build this string with excellent performance, so this can occur automatically because the next time someone needs to modify the process, they can use this one instead of the more difficult process of building these strings manually. In order for the above process to go smoothly, we created a function in our database. Below is an example of the function we created (PO_get_hash_fields):
This function has a parameter (@p_table_name) where it receives the name of the table that is necessary to build the string that we use in the other function (HashBytes). With the name of the table (@p_table_name), we get all the necessary fields to build the string (PO_structure_to_update) and make the conversion when the field type is different from char, varchar, nvarchar.
The result of this function is the new string with all fields concatenated and ready to do the conversion in encrypted code.
We’re getting close. Now we need to use all the processes to identify new or updated records. But how can we use all the previous steps to complete our process to identify whether a record is new or updated? The solution was to create a stored procedure that uses the functions to generate the hash and update the field to validate if the record is new or updated.
This stored procedure has 4 input parameters that are necessary to do all the Dynamic SQL that identify if the record is new or modified.
These parameters are:
- @p_table_name_ori: this parameter is used to place the name of the table that we need to get all the fields to generate the hash.
- @p_table_name_PO_Des: this parameter is used to find the name of the cross-referenced table where we have the information of all the records stored, moved from the source system to the target system. Now we need to generate the hash for each record and update in the cross-reference table.
- @p_table_type: this parameter identifies which table we need to process (e.g., Account, Contact, and/or other entities).
- @p_process_type: this parameter is used to identify the process of validating whether the record is new or updated. If the process is updated, we need to place the information in the PO_HashBytes_DES field of the generated hash. When the record is new, we need to place the information in the PO_HashBytes_ORI field of the generated hash.
The stored procedure generates a Dynamic SQL to update a table with the encrypted code generated using the HashBytes and all the fields required in the migration/integration process for each record.
Excellent! We have our new verified/control field for our migration/integration process. Now we can use the HashByte function in the migration/integration packages by generating the hash for each record using the new stored procedure. We can also identify if a record is new or modified by comparing the original field to the new field using the encrypted code through the HashBytes. Now we can migrate databases that do not have any control fields. Wow!