Knowledgebase

Monitoring System Resource Usage

FairCom DB relies upon system resources such as the CPU, disk, memory and the network. Typically each system includes its own tools for monitoring system resources. The following sections provide an overview of system-specific monitoring tools for system resources.

In This Section

Monitoring CPU Usage

Monitoring Disk Usage

Monitoring Memory Usage

Monitoring Network Usage

Other System Monitoring Options

Monitoring CPU Usage

The system administrator should know the expected pattern of CPU use by FairCom DB during normal operation of the system. CPU metrics such as user time, system time, I/O wait time, idle time, and context switches for the server process should be tracked so that unexpected changes in CPU usage can be detected, analyzed, and corrected.

Solaris supports the following utilities for monitoring CPU usage. Other Unix systems support these and similar utilities:

mpstat reports per-processor statistics including counts for interrupts, context switches, spins on mutexes and reader/writer locks, system calls, and user/system/wait/idle times.
prstat reports active process statistics including process state, priority, number of lightweight processes, and CPU usage.
top displays the top processes on the system and periodically updates this information. Raw CPU percentage is used to rank the processes.
ps prints information about active processes.

The Windows Performance Monitor utility (perfmon) can be used to monitor CPU usage. The Processor performance object maintains counters for each CPU. The available counters include the following (descriptions taken from the Performance Monitor’s explanatory text):

% Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value from 100%.
% Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode. When a Windows system service in called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.
% User Time is the percentage of elapsed time the processor spends in the user mode. User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems. The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory. The operating system switches application threads to privileged mode to access operating system services. This counter displays the average busy time as a percentage of the sample time.

Monitoring Disk Usage

The system administrator should know the expected pattern of disk use by FairCom DB during normal operation of the system. Expected data and index file sizes and disk I/O should be tracked so that unexpected changes in disk usage can be detected, analyzed, and corrected.

Solaris supports the following utilities for monitoring disk usage. Other Unix systems support these and similar utilities:

vmstat reports virtual memory statistics regarding process, virtual memory, disk, trap, and CPU activity.
iostat iteratively reports terminal, disk, and tape I/O activity, as well as CPU utilization.

The Windows Performance Monitor utility (perfmon) can be used to monitor system disk usage. The PhysicalDisk performance object maintains counters for each physical disk. The available counters include the following (descriptions taken from the Performance Monitor’s explanatory text):

% Disk Read Time is the percentage of elapsed time that the selected disk drive was busy servicing read requests.
% Disk Write Time is the percentage of elapsed time that the selected disk drive was busy servicing write requests.
% Idle Time reports the percentage of time during the sample interval that the disk was idle.
Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.

The LogicalDisk performance object maintains counters for each logical disk. The available counters include the following:

Free Megabytes displays the unallocated space, in megabytes, on the disk drive in megabytes. One megabyte is equal to 1,048,576 bytes.
Current Disk Queue Length is the number of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the collection. This is a instantaneous snapshot, not an average over the time interval. Multi-spindle disk devices can have multiple requests that are active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For good performance, this difference should average less than two.

The Cache performance object maintains counters for the file system cache. The available counters include the following:

Data Flushes/sec is the rate at which the file system cache has flushed its contents to disk as the result of a request to flush or to satisfy a write-through file write request. More than one page can be transferred on each flush operation.
Lazy Write Flushes/sec is the rate at which the Lazy Writer thread has written to disk. Lazy Writing is the process of updating the disk after the page has been changed in memory, so that the application that changed the file does not have to wait for the disk write to be complete before proceeding. More than one page can be transferred by each write operation.

In addition to system tools available for monitoring disk usage, FairCom DB supports options to limit disk usage. The server’s Disk Full feature offers three levels of control over disk full checks:

The DISK_FULL_LIMIT keyword provides checking on all files.
The DISK_FULL_VOLUME keyword sets a volume-specific limit that overrides the system-wide limit.
The file-specific check (set by creating a file using an Xtd8 create function with the dskful member of the XCREblk structure set to the desired file size limit in bytes) overrides the system-wide and volume-specific checks.

If extending the size of the file would leave less than the specified threshold, then the write operation causing the file extension fails, returning SAVL_ERR (583).

Monitoring Memory Usage

The system administrator should know the expected memory use by FairCom DB during normal operation of the system. Memory use should be tracked so that unexpected changes in memory usage can be detected, analyzed, and corrected.

Solaris supports the following utility for monitoring memory usage. Other Unix systems support similar utilities:

vmstat reports virtual memory statistics regarding process, virtual memory, disk, trap, and CPU activity.

The Windows Performance Monitor utility (perfmon) can be used to monitor system memory usage. The Memory performance object maintains counters for memory usage. The available counters include the following (descriptions taken from the Performance Monitor’s explanatory text):

AvailableMBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\\Available Bytes. It is calculated by adding the amount of space on the Zeroed, Free, and Stand by memory lists. Free memory is ready for use; Zeroed memory are pages of memory filled with zeros to prevent later processes from seeing data used by a previous process; Standby memory is memory removed from a process’ working set (its physical memory) on route to disk, but is still available to be recalled. This counter displays the last observed value only; it is not an average.
Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\\Pages Input/sec and Memory\\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.

In addition to the available system monitoring tools, FairCom DB supports monitoring server memory usage using the FairCom DB SystemConfiguration() API function. See the section "Monitoring FairCom DB Using SystemConfiguration API" for more details.

Monitoring Network Usage

The system administrator should know the expected network use by FairCom DB during normal operation of the system. Network use should be tracked so that unexpected changes in network usage can be detected, analyzed, and corrected.

Solaris supports the following utility for monitoring network usage. Other Unix systems support similar utilities:

netstat displays the contents of network-related data structures in various formats, depending on the specified options.

The Windows Performance Monitor utility (perfmon) can be used to network usage. The Network Interface performance object maintains counters for network usage. The available counters include the following (descriptions taken from the Performance Monitor’s explanatory text):

Bytes Received/sec is the rate at which bytes are received over each network adapter, including framing characters. Network Interface\\Bytes Received/sec is a subset of Network Interface\\Bytes Total/sec.
Bytes Sent/sec is the rate at which bytes are sent over each each network adapter, including framing characters. Network Interface\\Bytes Sent/sec is a subset of Network Interface\\Bytes Total/sec.
Output Queue Length is the length of the output packet queue (in packets). If this is longer than two, there are delays and the bottleneck should be found and eliminated, if possible. Since the requests are queued by the Network Driver Interface Specification (NDIS) in this implementation, this will always be 0.
Packets Outbound Errors is the number of outbound packets that could not be transmitted because of errors.
Packets Received Errors is the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol.
Packets Received/sec is the rate at which packets are received on the network interface.
Packets Sent/sec is the rate at which packets are sent on the network interface.

Other System Monitoring Options

In addition to the system monitoring utilities described in the above sections that are target specific resource monitoring, there are also other system utilities that can be used to monitor a variety of system resources.

On Solaris and Unix systems, the sar utility is useful for monitoring system activity, including system buffer transfer activity, cache hit ratios, system calls, and device activity.

The Windows Performance Monitor includes performance objects such as the Object, Process, Server, System, and Thread objects that can be used to monitor system resource usage in specific ways. Also, the Windows Task Manager can be used to monitor system resource usage.