Skip to main content

Record lock error retry and diagnostics

FairCom replication supports configuration options to specify how many times to attempt to lock a record and how much time to sleep between record lock attempts. These options are used when the FairCom replication attempts to update a record on the target FairCom server.

  • The lock_retry_count <count> option that is specified in ctreplagent.cfg indicates that a record read or update that fails with error DLOK_ERR (42, Could not obtain data record lock) is retried up to <count> times (default 2).

  • The lock_retry_sleep <sleep_ms> option that is specified in ctreplagent.cfg indicates that before retrying the operation that failed with error DLOK_ERR, the Replication Agent sleeps for <sleep_ms> milliseconds (default 100).

Example 1. DLOK_ERR log message

When an update fails with DLOK_ERR (after exhausting the retries), the Replication Agent then logs the following message to ctreplagent.log as this error is not expected due to FairCom replication usage of blocking locks.

ERR: Unexpectedly failed to update record: error code=42 (diag=<diagnostic_code>)
<diagnostic_code> is one of the following:
  • EQLVREC() - call failed with error DLOK_ERR

  • RWTVREC() - call failed with error DLOK_ERR

  • EQLREC() - call failed with error DLOK_ERR

  • RWTREC() - call failed with error DLOK_ERR

FairCom DB was also modified to add diagnostic log messages in the function that is used to extend a file. That function contains logic that attempts to acquire a lock on a new space. The function tries to acquire a lock up to 100 times, sleeping for 10 milliseconds between each lock attempt. It is hypothesized that a RWTVREC() operation could be failing with the DLOK_ERR error as this code exhausts its retry attempts. To determine if this is the case, additional logging was added with the following message to CTSTATUS.FCS when the lock attempt in this function fails.

extfil: Failed to lock offset 0x<offset> when extending file <filename>: 42

A record lock error (for example, error 42), may indicate the record on the target is already locked by another client. Check to see if a rogue replication is accidentally running in the background. You can use FairCom monitor to view current connections and their origins. The command-line ctadmn utility can also provide this information.