Product Documentation

FairCom RTG V3 Update Guide

Previous Topic

Next Topic

Debug Heap Options for Detection of Memory Corruption

FairCom Database Engine heavily utilizes memory operations to optimize performance. To avoid frequent OS memory requests, it maintains a sophisticated memory suballocator. This internal subsystem allocates larger blocks of memory from the O/S and sub allocates them as needed. All memory gets and puts from this suballocator system are tagged, tracked and checked. On rare occasions, heap memory corruption can occur resulting in an eventual illegal memory access attempt, ultimately terminating FairCom DB operations. Debugging these conditions is extremely complex. FairCom DB V12 and FairCom RTG V3 enhance memory diagnostics when required.

Debugging options can now be enabled when corrupted memory is suspected triggering additional tracking and checking of internal memory suballocations. Use of these options should only be considered in consultation with your FairCom support team as these can significantly impact performance in certain use cases.

HEAP_DEBUG_LEVEL <level> with values 0-3 (Default 0):

  • HEAP_DEBUG_LEVEL 0 uses the default memory allocator; no checks are performed.
  • HEAP_DEBUG_LEVEL 1 enables the debug allocator, with some detection for memory buffer write overruns or use-after-free bugs. There is no additional memory overhead, and the CPU overhead is small.
  • HEAP_DEBUG_LEVEL 2 does the checks from HEAP_DEBUG_LEVEL 1 and adds a small redzone before/after each allocation, which is checked at free. This may detect some buffer write overruns, write underruns, or double frees. This option uses an additional 16 bytes of memory on each allocation.

    Levels 1 and 2 generate a CTSTATUS entry and stack trace at free when a memory fault is detected.

  • HEAP_DEBUG_LEVEL 3 uses the system's virtual memory subsystem to immediately detect illegal memory overruns (both reads and writes) by aligning allocations at the end of a 4K page. The following types of errors may immediately generate a segmentation fault (core dump) on the invalid access: memory read/write overrun, double free, and use-after-free.

    Memory write underruns or some small write overruns may instead be detected at free and generate a CTSTATUS message and stack trace.

    Memory allocation failures are logged to CTSTATUS logging when HEAP_DEBUG_LEVEL 3 is enabled.

    Because Level 3 debugging uses an extra 4K-8K bytes for all allocations, it is possible to restrict this to particular size(s) of allocations using HEAP_DEBUG_EXCLUSION_LIST <N>.

    Be sure to see the Performance Note below.

HEAP_DEBUG_EXCLUSION_LIST <N> [,<N> ...] - When HEAP_DEBUG_LEVEL 3 (only) is specified, it is possible to have particular allocation sizes use the default (non-debug) allocator to reduce the overall memory overhead of this debugging method. By default, HEAP_DEBUG_LEVEL applies to all allocation sizes. The particular values to specify would typically be recommended by FairCom support based on analysis of prior core dumps.

HEAP_DEBUG_EXCLUSION_LIST # allocation size range (in bytes):

# Bytes

1 <=16

2 17-32

3 33-64

4 65-128

5 129-256

6 257-512

7 513-1024

8 1025-2048

9 2049-4096

10 4097-8192

Limitations: When Level 3 debugging is enabled, memory usage statistics are not tracked for those bins.

Performance Note

HEAP_DEBUG_LEVEL 3 has a negative impact on performance on any machines with 4 or more CPU cores. FairCom does not recommend the use of Level 3 on any machines with 4 or more CPU cores where performance is important.

If performance is an issue, FairCom recommends using the Level 2 version of heap debug on any location where you've seen a memory-related crash occur and where performance is important. Our testing has not revealed a noticeable performance impact with Level 2, even on a large test box with 72 cores.

Note 1:

If allocation stack traces are enabled (such as with ctstat -mt +ALL), an additional stack is dumped showing the allocation location of the buffer.

A use-after-free might cause a segmentation fault (core dump).

Note 2:

HEAP_DEBUG_LEVEL 2 has a benefit to stability: If the buffer overrun is 8 bytes or less, the heap will NOT be corrupted because only the extra redzone memory is modified. The server will stay up and only a stack trace will be generated.

Linux

On Linux, the HEAP_DEBUG_LEVEL 3 configuration option requires raising the kernel limit vm.max_map_count to be twice the number of allocations. FairCom can help you to determine the minimum value to set the Linux kernel parameter, vm.max_map_count. (This value will generally rise and fall with the amount of server activity, so include a safety factor based on how much busier it might be.) Setting vm.max_map_count too small will result in allocation failures if the number of allocations exceeds that limit, which could lead to a server crash.

This can be changed until next system reboot with the command:

sysctl -w vm.max_map_count=<N>

Example using HEAP_DEBUG_LEVEL 3 restricted to sublist #2:

Look at current memory usage at a busy server time using this command:

ctstat -ml -u admin -p ADMIN_PASSWD -s FAIRCOMS

The output will show sublist #2 as "PI2TYP." The next column shows the total allocations in bytes.

PI2TYP 98304 7400 0.01%

Divide the first column by 32 (the max size of this allocation range) to get the number of current allocations on this sublist: 98304/32=3072, and then double that: 6144.

You also need to include the MBATYP:

MBATYP 73388776- 73388776-

Divide the first column by 8192 (this would be the worst case assumption for this list), and double the result:

73388776/8192 * 2 = 17916

The sum of these values is 2x the allocation count: 17916+6144 = 24060. Apply a safety multiplier such as x10.

This is the minimum value you need to set as the Linux kernel parameter:

vm.max_map_count

This value will generally rise and fall with the amount of server activity, so include a safety factor based on how much busier it may become. Setting vm.max_map_count too small will result in allocation failures if the number of allocations exceeds that limit, which could lead to a server crash.

After setting the kernel value (Linux only) with vm.max_map_count, add the following server keywords to enable the debug suballocator on sublist #2 (PI2TYP) only:

HEAP_DEBUG_LEVEL 3

HEAP_DEBUG_EXCLUSION_LIST 1,3,4,5,6,7,8,9,10

Stack Dump Message

When a Stack Dump is generated, the following message will be logged to CTSTATUS.FCS.

"A Heap Fault was detected and stack dump created"

If you have this logic enabled, FairCom recommends routine reviews of CTSTATUS.FCS to determine if a Stack Dump has been created. If you see this message, please send the Stack Dump, and the CTSTATUS.FCS message to FairCom Support for inspection.

TOCIndex