Getting started with CopyCat
CopyCat supports two different replication modes that can be used, depending on the situation.
This is the default mode, and it versions prior to 3.08.0, it was the only mode.
In this mode, CopyCat creates triggers that simply log a row in RPL$LOG every time a change occurs in one of the tables being replicated. The type of operation (INSERT, UPDATE or DELETE) is not recorded, because it is determined at replication time. The values of the fields are also not recorded, only the primary key, and therefore, there is no way of knowing which field has changed, if any.
At replication time, the row is then selected from the source DB. If it can't be found, it is deleted from the destination DB. If it is found, then an update is attempted. If the update finds no row, then it is inserted.
This mechanism is robust and fault-tolerant, because even if the databases get out of sync for whatever reason, it will still work (for example, if a row in the source DB gets updated, that update can be turned into an insert if it turns out that the row doesn't exist in the destination DB at replication time).
It also has the advantage that if the same row modified several times, the rows can be bunched up and replicated only once.
This mode doesn't make it possible to guarantee consistency however, because the exact order of events cannot always be reproduced: if you insert a row into table A, setting only a few fields, then insert a row into table B, and then update table A again, giving a value to a field with a foreign key pointing to table B, then the insert into table A will fail at replication time, because it will try to execute the full insert of all the fields, including the field with the primary key, before the insert into table B, so there will be a foreign key violation.
The result of that is that there can be an inconsistent view of the data, it will not be replicated atomically. For some uses, that doesn't really matter, but for reporting or other such situations where it's more important for the data to be consistent than fully up-to-date, that can be a problem.
The other problem with this mode is that since all fields of the table are replicated every time there is a change, there can be a performance hit for tables that have large numbers of fields and/or big blob fields, as the values of these fields will be sent over the wire every time the row changes, even if those fields didn't change.
High-consistency with field change tracking
This is a new feature of CopyCat 3.08.0, and is currently available only for Firebird and Interbase, though it will be implemented for all other supported databases in the upcoming releases.
This mode enables a full tracking of the values of the fields that were changed with every change to the database. That way, when it comes time to replicate, we can know exactly which fields were changed and what values were given, and thus, we can play back precisely the changes that took place in the source database, in exactly the same order, and therefore, the foreign key issues mentioned above no longer apply.
The drawback is that it is entirely incremental, and will not work at all if the two databases become out of sync for whatever reason. To fix such a situation would require a pumping of the entire table or database to ensure that they are identical.
The major advantage however is better consistency and therefore improved reliability. It is HIGHLY RECOMMENDED that you run the entire replication cycle using a single transaction, and that you rollback all the changes if ever there is an error on any row: this is important because some of the later changes in the list may fail and it could lead to a situation where there are subtle differences between the two databases. The only reliable way to use this feature is to replicate atomically.
Consistency is a huge advantage in many situations, because it means that either the replication failed, in which case you will see none of the changes, or it succeeded, in which case they will all be visible, but you can count on there never being anything missing, and you can also be sure that there is no incoherent data in the database (an invoice header with no lines for example).
The other big advantage to this mode is finer grained and easier to manage conflict resolution.
Since we know which fields changed, many conflicts can be automatically resolved by merging the changes from both nodes, in cases where none of the same fields were changed on both sides. In cases that are still conflictual, you are provided the list of fields changed on each side as well as the list of fields changed on both sides, and it is also possible to find out for each field the value that it was changed from and to on each side, making conflict resolution much easier.
If a conflict is detected, the fields that were changed on only one of the nodes can be merged. For instance, if the first node changed fields A, B and C while the seconde node changed fields C and D, if the conflict is resolved in favor of the second node, the values of A and B from the first node will be kept, as they were not conflictual with the change from the second node. Thus, conflict resolution can be much finer-grained.
For more information about conflict management, see: