Product Documentation

Knowledgebase

Previous Topic

Next Topic

8 Steps to a Fast Data Load

The following eight steps can be used to speed up the process of inserting data into your FairCom DB database:

1. Turn off transaction processing

Transaction processing control can be turned off during these steps. Assuming you have the data preserved where you can start over in the event of a problem, you don't need transaction processing control for this process.

If you desire to have transaction processing control down the road, then ensure you create the data and index files with TRNLOG file mode active. Once you create the file initially with TRNLOG enabled, you can disable TRNLOG programmatically to speed up the operations as indicated in the FairCom DB Programmer Reference Guide in the topic titled Transaction Processing On/Off.​

Or you can call the cttrnmod program, explained in the topic titled cttrnmod - Change Transaction Mode Utility.

2. SHARED MEMORY protocol

If at all possible, run the data load program on the same machine hosting the FairCom DB data. This will allow the Server to use the shared memory communication protocol which is much faster than TCP/IP.

If you need to use TCP/IP, increase the number of threads to multiple threads per CPU core to compensate for the network latency.

3. Direct I/O (V11 and later only)

When using FairCom DB V11 and later, please review Direct I/O support. This will provide some help when building and working with larger files. See Linux Direct I/O (UNBUFFERED I/O) Performance Support in the V11 Update Guide.

4. Multi-thread the inserts

The next way to boost performance is to use one of the non-relational FairCom DB APIs, such as the ISAM or the c-treeDB API.

If you can break the data coming into the program into multiple chunks, these APIs allow you to take advantage of multi-threading to do the inserts. A good rule of thumb is to use one or two threads for each virtual CPU core.

5. Disable indexes using CTOPEN_DATAONLY file mode

You can drop the index support when you are doing the data load. This will get the data into the data file in the fastest manner and will avoid the time it takes to update your indexes on the fly.

6. Insert in batches

With V10 and newer, we've added batch inserts. This is quicker than individual adds because we can maximize the OS packet size and get the maximum amount of data fed into the FairCom Server process with each batch call.

7. Turn transaction processing back on

See the links given previously in Tip 1 for instructions for turning TRNLOG back on:

8. Rebuild to create indexes

Once you have all of the data loaded into the data files, do a rebuild to generate the indexes. This is the fastest way to build the indexes because you now have all of the data in the FairCom DB data files, so the indexes can be built from scratch with a known set of data. To generate your indexes, use the function call ctdbRebuildTable discussed in the c-treeDB Developer's Guide.

Or you can call the ctrbldif program discussed in the FairCom DB Programmer's Reference.

To improve the performance of an index rebuild through the Server, increase these two settings in your ctsrvr.cfg file:

MAX_HANDLES

SORT_MEMORY

 

The tips given above should help you complete the data load process in much less time than a single-threaded program using ctdbWriteRecord() inserts. In one customer case where we have used this process, the time to load 2.2 billion records, with several indexes, went from approximately 2 weeks, to less than 2 days.

 

TOCIndex