V11.5 Release Notes

Corrected checkpoint processing to persist required transaction logs for recovery

This fix corrects a potentially serious issue involving the logging of the following message to CTSTATUS.FCS:

Index buffer on commit node list without update flag...

Background information: Checkpoint processing is a critical component of transaction control. During this phase, a known good state of the database is created and persisted. There are several key phases of operations during checkpoint processing:

Determining cache page vulnerabilities
Flushing updated cache page information to disk
Writing critical recovery state information to persisted transaction logs
Determining which transaction logs to retain (only logs required for recovery should be persisted)

It is possible during checkpoint processing that while walking internal updated cache page lists a c-tree thread can update an entry after that position in the list has already been examined. This condition is allowed, as no mutex is held during list processing to maintain performance with very high transaction rates.

In the V8 release of c-tree Servers, this condition was not always properly checked. A buffer could be left on the list, even though it had been properly flushed. The following status log message was noted immediately preceding such an event:

Index buffer on commit node list without update flag...

However, this potentially resulted in not releasing any transaction logs, as the server determined they were needed for recovery of this buffer page. The accumulation of large numbers of transaction logs could result in consuming all disk space.

Corrective action was introduced toward the end of the V8 line of c-tree Servers and included in V10 lines. This correction detected and marked such unexpected updated buffer states such that they would trigger appropriate transaction log release.

However, with V10 servers under high transaction load, the diagnostic message continued to be noted and now the number of active log files decreasing to 4 and possibly not increasing after that point:

Index buffer on commit node list without update flag...

The number of active log files decreased to: 4

Further, during a maintenance window, a c-treeACE server was improperly halted. During automatic recovery, error 96 occurred: a transaction log was determined to be required for recovery, however, did not exist. In this case, it was for an index file buffer, which could be recovered by an index rebuild of the data file.

It has been determined that these automatic recovery errors could occur for either a data file or index file phase of recovery. If the data phase recovery fails, data may not be recoverable. If the data recovery phase succeeds, data is not lost. If the index recovery phase fails, changes to indexes may be lost. Indexes can be rebuilt from the existing data files in this case.

This issue has been corrected in all c-treeACE Servers with build dates after 160527 (May 27, 2016). It was determined that the initial checkpoint operation correctly handled the condition. However, a subsequent checkpoint failed to consider the same state information causing the calculated log requirements to be out of sync with what was actually required. A proper check condition now correctly releases only the logs no longer necessary for recovery, persisting all required recovery logs.

Note: In this case of a successful data recovery phase and failed index recovery phase, using the server keyword TRANIDX_LOPN_ERR_CONTINUE YES in ctsrvr.cfg allows automatic recovery to succeed. However, indexes should be rebuilt ensuring they are in sync with the data.