Skip to main content

Replication Agent errors

Troubleshoot Replication Agent exceptions and errors

Abstract

Should a connection failure occur, it will be necessary to restart the FairCom replication and reestablish connections between the replicas. In some cases (for example, an extended length of outage), it may be necessary (and more efficient) to re-sync the local replica with a copy from the source replica.

Should a connection failure occur, it will be necessary to restart the FairCom replication and reestablish connections between the replicas. In some cases (for example, an extended length of outage), it may be necessary (and more efficient) to re-sync the local replica with a copy from the source replica.

Error 96 indicates that a log could not be opened. If the Replication Agent had been using that log before it terminated, it should have recorded its next position offset.

Error 76 indicates that the Replication Agent did not record a valid offset in the log. The log position did not correspond to the start of a transaction log entry, it was in the middle of an entry.

An action taken to avoid the 96 error could cause error 76. Because the saved offset is specific to the log on the original source server, replacing the file could cause error 76. An example would be copying another file to replace the original log. Another possible cause is if you changed the agent to connect to a different source server than it was using before, if that server happens to have a log with the same name, you could get this error.

It is possible to receive Replication Agent error 809 under certain conditions. This situation can occur when applications that use FairCom's socket timeout feature generate a lot of transaction log data for files that are not replicated. This activity can force the server to scan the logs up to a checkpoint, which can exceed the connection timeout. That triggers a disconnect, causing the Replication Agent to reconnect and continue.

In this case, the timeout can be avoided by increasing the socket_timeout parameter — for example, setting to 30 seconds or more. A setting of 300 seconds (5 minutes) should be a reasonable timeout value.

The Replication Agent maintains its own transaction logs and housekeeping files. Removal of these can cause problems. Be sure the FairCom replication is run in its own unique directory independent of any existing servers. You can use the LOCAL_DIRECTORY keyword in the agent's ctsrvr.cfg file to position these files, just as you would with a regular server configuration.

A common situation is deleting your source server transaction (.FCS) files and you then receive error LOPN_ERR (96) on replication startup. This error indicates the referenced transaction logs no longer exist. The FairCom replication maintains a last committed transaction log position in a REPLSTATEDT.FCS data file on the target server. When resyncing data and starting replication from a fresh state, you should delete both the REPLSTATEDT.FCS and REPLSTATEIX.FCS files. These files will be recreated as needed by the replication agent after it successfully connects to the target server and begins applying transactions.

Alternatively, the ctreplagent_<agentid>.ini file can be used to kick-start replication directly in the agent's working directory. You can create that file as needed to start at log position 0 with the entry 1 0. You can also use the #current option to start at the current transaction log position. ctreplagent_<agentid>.ini is used by default if it exists. You may delete this file after replication is successfully started and it is no longer needed.

Note

If this position refers to transaction logs that no longer exist (should they be deleted or rolled off) then a 96 error will occur.

An exception log, REPLOGDT.FCS, is generated for any replicated operations that could not be successfully applied to the local replica. This file is a standard FairCom data file format and should be examined when specific errors are reported. The repadm utility can help you to view this log, as explained in repadm Replication Agent administrator.

Standard FairCom replication output is re-directed to a text-based log file (ctreplagent.log) for any informational messages.

As FairCom DB uses transactional-based replication, information in the transaction logs is used to replicate data to a second system. An easy-to-encounter error is data that is replicated from data files that have been deleted. This can arise if the data files were deleted, however, the data remained in the existing transaction logs and the start position was set to begin replication from the first log file. The data found in the transaction logs will then be unexpectedly replicated.

If you are resetting a source database, be sure that the data and index files, source server transaction logs, and <FC_PROD_REP2> initialization file (ctreplagent_<agentid>.ini) are all in sync. If you wish to delete the data files, be sure to also clear the transaction logs on the source server (L*****.FCS, S*****.FCS).

In V11 and later, changes address the handling of HTRN_ERR (520) without the use of CLNIDXX (using the ctclnidxx utility, the !CLNIDXX dynamic dump restore option, or the AUTO_CLNIDXX ctsrvr.cfg option).

When files are copied to a target replication server, index files contain transaction high-water marks that can conflict with new transaction numbering of incoming replicated transactions. It is very possible error HTRN_ERR (520, high transaction mark error) may be observed in the replication exception log when this occurs. It is likely the first transaction that replicated to this file fail with this error.

The Replication agent will now attempt to handle HTRN_ERR errors by aborting and retrying the transaction. Retries are not attempted if the Replication agent is using the following option.

exception_mode operation.

Note

With this change, it is no longer necessary to run a CLNIDXX operation (using the ctclnidxx utility, the !CLNIDXX dynamic dump restore option, or the AUTO_CLNIDXX ctsrvr.cfg option) prior to accessing the target server's copy of the file.

FairCom replication runs as a FairCom server and can experience the same operational errors as such. Common errors result from either running FairCom replication in the same location as an already operational server or with the same server name resulting in unexpected connection errors. Always check the CTSTATUS.FCS log file for any errors should FairCom Replication not start, or stop unexpectedly.

Check the CTSTATUS.FCS file on the source server for additional replication-related messages.