Consider a data migration of a system where you can’t afford down-time: it needs to be invisible for your customers. In such cases, a big bang or tranche-based approach won’t suffice. Only a delta migration strategy can meet this no-down-time requirement.
If you’re an architect or project manager responsible for determining your migration strategy, you may want to read on.
This article provides valuable insights into a major strategy and compares it to other possible strategies. It first dives into the what & how of delta migrations. Then compares this strategy to other commonly used strategies. Before finally drawing the conclusion that delta migrations are the only way forward if you can’t afford down-time — if you can afford it!
The idea behind a delta migration is simple: take all the time you need for migrating an initial version of the data and perform a quick catch-up in the actual migration weekend where you only process the changes, the deltas, that have occurred between the start of the ‘long’ migration period and the start of the catch-up. The idea is that processing the changes is really fast, resulting in a very limited down-time.
A delta migration strategy is a solution when the full migration takes more time than the largest acceptable freeze period. This happens when the data set is large or the freeze period short.
Delta migrations are only possible if the detection of changes and processing of changes can be done in a time-efficient way. In most migrations this is possible, although it can sometimes be a complex operation.
The source system remains the leading system after the initial migration. Only after successfully processing the delta in the target system does the target system become the leading system.
Delta-detection: source or target?
The most common way to determine deltas is in the source data. We call these deltas ‘source deltas’. The changes between the source data in the delta run and the initial run are determined at the start of the delta run. The delta run then processes these differences.
Another, less common, way to detect changes in the target data is at the end of the second migration run, before loading data into the target system. These deltas are known as ‘target deltas’. The delta run processes all source data and computes the differences between the target data in the initial run and the delta run. This can be used when the migration system is fast and loading data in the target system slow.
There are two ways to process deltas:
Migrate all changed main entities in their entirety. Generally, this allows for reuse of mapping rules of the initial migration.
Compute the effects of the changes on the target data. This is usually more complex (and not designed for a full migration), except when the transformation is simple.
How to control delta migrations?
Checking the correctness of delta migrations requires closer attention than with other migration strategies.
The initial run is a normal process-all run and can be checked the same way as any migration that uses one run. At the end of the delta run, the data in the target system must correspond with the data in the source system at the start of the run. A complication is that the migration system doesn’t have a copy of the data in the target system at the end of the delta run (nor does it have this data in the intermediate models). The ‘normal’ approach to controlling the migration can therefore not be used. This complication is outside the scope of this article.
Extensions and variation
Several of the techniques for delta migrations we described can be further extended and refined.
Source and Target deltas can be combined to further reduce the number of updates in the target data. In a combined approach, the delta run migrates the changed source entities and then determines the deltas in the target data. The migrated data of a main entity can remain unchanged, even when the data from the source entity did change. For example, when only the first or largest value is migrated and the change in source data has had no effect on the first or largest value.
Another, more practical, refinement is restricting delta processing to a small number of entities, such as financial transactions. All other data of a customer remains unchanged in the period after the initial run, something you can guarantee by not handling non-financial transactions processes after the initial run. This generally imposes restrictions on the timeframe between the initial run and the delta run.
If the delta run takes too much time, then a second delta run can be used to process the changes that occurred during the first delta run (and a third delta run, etc). This way, even systems with a very high load and transaction volume (like a credit card system) can be migrated with very limited or no down-time!
Delta migrations do not come cheap. Delta-detection is an additional step that must be carefully designed. The migration system itself must be designed with delta processing in mind, and testing delta processing will take more time than testing a full migration.
The migration system must from the start be designed and built with delta processing in mind. For example, the DMS must be deterministic or use results from the initial run.
Some migration steps are not well-suited to delta processing. Take grouping, for example: all source entities need to be processed to determine grouping. When dealing with just the changes, grouping cannot be determined.
Such steps make the determination of the changed entities far from straightforward.
Another potential problem is that changes in source data can lead to rejects among the already migrated entities. These rejects must be deleted from the target system in the delta run.
Delta migrations still require a freeze period, albeit a much shorter period than a full run does. During the delta run, the source system is frozen and the target system not yet in production. Transactions that occur during the delta run must be buffered for later processing in the target system. Delta runs reduce the number of buffered transactions but do not fully eliminate them.
Pros and cons of delta migrations
a very short period of down-time or even no down-time at all
the migration is almost invisible to consumers;
the performance of the initial run is of little concern.
difficult to test
the target system must allow deletes or updates
deleting/updating data in the target system is an extra step
requires an efficient mechanism to determine changes
requires a more complex control framework
you spend a lot of your budget on migrating a very small part of the data (i.e. the deltas)
There are several alternatives to delta migrations. The most common ones being migrating in tranches, performance tuning the full migration, and extending the freeze period.
Migrating in tranches is only a viable alternative if you can choose tranches small enough to fit the maximum freeze time. Note that a migration in tranches is not a big-bang migration because the old and new systems are both in production till the last tranche is migrated.
In the case of performance tuning the full migration, after tuning the run must fit in the freeze period. Performance tuning includes acquiring faster hardware.
Even if a delta run is unavoidable, a fast initial run can help reduce the number of changes that must be handled in the delta run. Remember, the changes to process are the ones from the period that begins at the start of the initial run.
The table below summarises the properties of delta migrations and the alternatives.
DX’s best practice tips for using delta migrations.
When you select a delta migration as your migration strategy, here are some of the best practices that will help make it a success:
Design for delta migration
Design your data migration system to process deltas. Delta-processing is not an add-on feature. It impacts every phase in your project, starting with design!
Hope for the best, prepare for the worst
If a short freeze period is acceptable then make the initial run as fast as possible in the hope that a delta run isn’t necessary.
Reuse existing processing capabilities
Try to use existing stand-in processing and catch-up mechanisms in the target system (if available).
Use a very selective freeze: only allow financial transactions and a very small set of customer-facing processes. For example, blocking of lost or stolen credit/payment cards. All other processing is frozen and delayed.
Plan for performance
Plan the delta run during a period in the week when the number of changes/transactions is at its lowest. Determine this period based on historical transaction data in the source system. For example, in a credit card system, during the early hours of a Sunday morning very few transactions are performed.
Dealing with short down-times is a challenge when migrating data. This article explored the delta migration strategy for dealing with this challenge.
Do you have another approach? Or a better best practice? Let us know!