Product Documentation

FairCom DB V12 Release Notes

Previous Topic

Next Topic

Enhanced Detection and Reporting of Filesystem Flush Errors to Protect Data Integrity

Potential silent data loss following fsync() failures

When using buffered filesystem I/O for data and index files (FairCom DB default), an I/O error during a filesystem flush request (e.g. fsync()) could result in data loss depending on the filesystem implementation details, even if a future flush request succeeded (as filesystem contents may be invalidated and released upon failure before the next flush attempt). In addition, while committed transactions were secured to the transaction log, transaction log data may no longer have been available for recovery upon an assumed successful data/index flush state resulting in data/index files in a permanently unrecoverable data loss state.

For files under full transaction control, such an error can be treated identically to a failed write error, which is considered fatal in most situations. It is considered safe to terminate on any failed I/O write or flush of transaction controlled data/index files, because all committed transactions have been secured to the logs.

Behavior Change: Starting with code lines dated with a Build after 190314, an fsync() error will now behave similar to a failed write, which is expected to be a fatal error for a ctTRNLOG file and will cause the FairCom Server to shut down.

Workaround: Using all the following ctsrvr.cfg configuration options should avoid this issue for files under transaction control by opening them in a synchronous write mode:

COMPATIBILITY LOG_WRITETHRU is the recommended transaction log write mode. It is recommended to not change or disable this mode.

COMPATIBILITY TDATA_WRITETHRU is strongly recommended for all transaction controlled files and is considered to perform well.

COMPATIBILITY TINDEX_WRITETHRU is the safest option, however it may impact performance and performance testing should be done to ensure acceptable throughput is maintained.

Systems that have little idle disk time or that have large c-tree caches relative to system RAM will benefit the most from using these keywords.

Note: The existing COMPATIBILITY CHKPNT_FLUSHALL configuration option should no longer be considered for use in production environments due to long-term potential data integrity issues as the sync() system call does not detect filesystem write errors.

TOCIndex