Document revision date: 30 March 2001 | |
![]() |
![]() ![]() ![]() ![]() |
![]() |
Previous | Contents | Index |
The CPU is the central resource in your system and it is the most costly to augment. Good CPU performance is vital to that of the system as a whole, because the CPU performs the two most basic system functions: it allocates and initiates the demand for all the other resource, and it provides instruction execution service to user processes.
This chapter discusses the following topics:
Only one process can execute on a CPU at a time, so the CPU resource must be shared sequentially. Because several processes can be ready to use the CPU at any given time, the system maintains a queue of processes waiting for the CPU.
These processes are in the compute (COM) or compute outswapped (COMO) scheduling states.
The system allocates the CPU resource for a period of time known as a quantum to each process that is not waiting for other resources.
During its quantum, a process can execute until any of the following events occur:
A good measure of the CPU response is the average number of processes in the COM and COMO states over time---that is, the average length of the compute queue.
If the number of processes in the compute queue is close to zero, unblocked processes will rarely need to wait for the CPU.
Several factors affect how long any given process must wait to be granted its quantum of CPU time:
The worst-case scenario involves a large compute queue of compute-bound processes. Each compute-bound process can retain the CPU for the entire quantum period.
Assuming no interrupt time and a default quantum of 200 milliseconds, a group of five compute-bound processes of the same priority (one in CUR state and the others in COM state) acquires the CPU once every second.
As the number of such processes increases, there is a proportional increase in the waiting time.
If the processes are not compute bound, they can relinquish the CPU before having consumed their quantum period, thus reducing waiting time for the CPU.
Because of MONITOR's sampling nature, the utility rarely detects processes that remain only briefly in the COM state. Thus, if MONITOR shows COM processes, you can assume they are the compute-bound type.
The best way to determine a reasonable length for the compute queue at your site is to note its length during periods when all the system resources are performing adequately and when users perceive response time to be satisfactory.
Then, watch for deviations from this value and try to develop a sense for acceptable ranges.
To estimate available CPU capacity, observe the average amount of idle time and the average number of processes in the various scheduling wait states.
While idle time is a measure of the percentage of unused CPU time, the wait states indicate the reasons that the CPU was idle and might point to utilization problems with other resources.
Before using idle time to estimate growth potential or as an aid to balancing the CPU resource among processes in an OpenVMS Cluster, ensure that the other resources are not overcommitted, thereby causing the CPU to be underutilized.
Whenever a process enters a scheduling wait state---a state other than CUR (process currently using the CPU) and COM---it is said to be blocked from using the CPU.
Most times, a process enters a wait state as part of the normal synchronization that takes place between the CPU and the other resources.
But certain wait states can indicate problems with those other resources that could block viable processes from using the CPU.
MONITOR data on the scheduling wait states provides clues about potential problems with the memory and disk I/O resources.
There are two types of scheduling wait states---voluntary and involuntary. Processes enter voluntary wait states directly; they are placed in involuntary wait states by the system.
Processes in the local event flag wait (LEF) state are said to be voluntarily blocked from using the CPU; that is, they are temporarily requesting to wait before continuing with CPU service. Since the LEF state can indicate conditions ranging from normal waiting for terminal command input to waiting for I/O completion or locks, you can obtain no useful information about potentially harmful blockage simply by observing the number of processes in that state. You can usually assume, though, that most of them are waiting for terminal command input (at the DCL prompt).
Some processes might enter the LEF state because they are awaiting I/O completion on a disk or other peripheral device. If the I/O subsystem is not overloaded, this type of waiting is temporary and inconsequential. If, on the other hand, the I/O resource, particularly disk I/O, is approaching capacity, it could be causing the CPU to be seriously underutilized.
Long disk response times are the clue that certain processes are in the LEF state because they are experiencing long delays in acquiring disk service. If your system exhibits unusually long disk response times, refer to Section 7.2.1 and try to correct that problem before attempting to improve CPU responsiveness.
Other processes in the LEF state might be waiting for a lock to be granted. This situation can arise in environments where extensive file sharing is the norm---particularly in OpenVMS Clusters. Check the ENQs Forced to Wait Rate. (This is the rate of $ENQ lock requests forced to wait before the lock was granted.) Since the statistic gives no indication of the duration of lock waits, it does not provide direct information about lock waiting. A value significantly larger than your system's normal value, however, can indicate that users will start to notice delays.
On large SMP systems, it might improve performance to give one CPU all lock manager work. If you have a high CPU count and a high amount time spent synchronizing mulitple CPU's, consider implementing a dedicated lock manager as described in Section 13.2.
If you suspect... | Then... |
---|---|
The lock waiting is caused by file sharing 1 | Attempt to reduce the level of sharing. |
The lock waiting results from user or third-party application locks | Attempt to influence the redesign of such applications. |
A high amount of locking activity in an SMP environment | Assign a CPU to perform dedicated lock management. |
Processes can also enter the LEF state or the other voluntary wait states (common event flag wait [CEF], hibernate [HIB], and suspended [SUSP]) when system services are used to synchronize applications. Such processes have temporarily abdicated use of the CPU; they do not indicate problems with other resources.
Involuntary wait states are not requested by processes but are invoked by the system to achieve process synchronization in certain circumstances:
The presence of processes in the MWAIT state indicates that there might be a shortage of a systemwide resource (usually page or swapping file capacity) and that the shortage is blocking these processes from the CPU.
If you see processes in this state, do the following:
$ MONITOR /INPUT=SYS$MONITOR:file-spec /VIEWING_TIME=1 PROCESSES |
The most common types of resource waits are those signifying depletion of the page and swapping files as shown in the following table:
State | Description |
---|---|
RWSWP | Indicates a swapping file of deficient size. |
RWMBP, RWMPE, RWPGF | Indicates a paging file that is too small. |
RWAST |
Indicates that the process is waiting for a resource whose availability
will be signaled by delivery of an asynchronous system trap (AST).
In most instances, either an I/O operation is outstanding (incomplete), or a process quota has been exhausted. |
You can determine paging and swapping file sizes and the amount of available space they contain by entering the SHOW MEMORY/FILES/FULL command.
The AUTOGEN feedback report provides detailed information about paging and swapping file use. AUTOGEN uses the data in the feedback report to resize or to recommend resizing the paging and swapping files.
The surest way to determine whether a CPU limitation could be degrading performance is to check for a state queue with the MONITOR STATES command. See Figure A-16. If any processes appear to be in the COM or COMO state, a CPU limitation may be at work. However, if no processes are in the COM or COMO state, you need not investigate the CPU limitation any further.
If processes are in the COM or COMO state, they are being denied access to the CPU. One or more of the following conditions is occurring:
If you suspect the system is performing suboptimally because processes are blocked by a process running at higher priority, do the following:
If you find that this condition exists, your option is to adjust the process priorities. See Section 13.3 for a discussion of how to change the process priorities assigned in the UAF, define priorities in the login command procedure, or change the priorities of processes while they execute.
Once you rule out the possibility of preemption by higher priority processes, you need to determine if there is a serious problem with time slicing between processes at the same priority. Using the list of top CPU users, compare the priorities and assess how many processes are operating at the same one. Refer to Section 13.3, if you conclude that the priorities are inappropriate.
However, if you decide that the priorities are correct and will not benefit from such adjustments, you are confronted with a situation that will not respond to any form of system tuning. Again, the only appropriate solution here is to adjust the work load to decrease the demand or add CPU capacity (see Section 13.7).
If you discover that blocking is not due to contention with other processes at the same or higher priorities, you need to find out if there is too much activity in interrupt state. In other words, is the rate of interrupts so excessive that it is preventing processes from using the CPU?
You can determine how much time is spent in interrupt state from the MONITOR MODES display. A percentage of time in interrupt state less than 10 percent is moderate; 20 percent or more is excessive. (The higher the percentage, the more effort you should dedicate to solving this resource drain.)
If the interrupt time is excessive, you need to explore which devices cause significant numbers of interrupts on your system and how you might reduce the interrupt rate.
The decisions you make will depend on the source of heavy interrupts. Perhaps they are due to communications devices or special hardware used in real-time applications. Whatever the source, you need to find ways to reduce the number of interrupts so that the CPU can handle work from other processes. Otherwise, the solution may require you to adjust the work load or acquire CPU capacity (see Section 13.7).
Once you have either ruled out or resolved a CPU limitation, you need to determine which other resource limitation produces the block. Your next check should be for the amount of idle time. See Figure A-17. Use the MONITOR MODES command. If there is any idle time, another resource is the problem and you may be able to tune for a solution. If you reexamine the MONITOR STATES display, you will likely observe a number of processes in the COMO state. You can conclude that this condition reflects a memory limitation, not a CPU limitation. Follow the procedures described in Chapter 7 to find the cause of the blockage, and then take the corrective action recommended in Chapter 10.
If the MONITOR MODES display indicates that there is no idle time, your CPU is 100 percent busy. You will find that processes are in the COM state on the MONITOR STATES display. You must answer one more question. Is the CPU being used for real work or for nonessential operating system functions? If there is operating system overhead, you may be able to reduce it.
Analyze the MONITOR MODES display carefully. If your system exhibits excessive kernel mode activity, it is possible that the operating system is incurring overhead in the areas of memory management, I/O handling, or scheduling. Investigate the memory limitation and I/O limitation (Chapters 7 and 8), if you have not already done so.
Once you rule out the possibility of improving memory management or I/O handling, the problem of excessive kernel mode activity might be due to scheduling overhead. However, you can do practically nothing to tune the scheduling function. There is only one case that might respond to tuning. The clock-based rescheduling that can occur at quantum end is costlier than the typical rescheduling that is event driven by process state. Explore whether the value of the system parameter QUANTUM is too low and can be increased to bring about a performance improvement by reducing the frequency of this clock-based rescheduling (see Section 13.4). If not, your only other recourse is to adjust the work load or acquire CPU capacity (see Section 13.7).
If the MONITOR MODES display indicates that a great deal of time is spent in executive mode, it is possible that RMS is being misused. If you suspect this problem, proceed to the steps described in Section 8.3.3 for RMS induced I/O limitations, making any changes that seem indicated. You should also consult the Guide to OpenVMS File Applications.
If at this point in your investigation the MONITOR MODES display indicates that most of the time is spent in supervisor mode or user mode, you are confronted with a situation where the CPU is performing real work and the demand exceeds the capacity. You must either make adjustments in the work load to reduce demand (by more efficient coding of applications, for example) or you must add CPU capacity (see Section 13.7).
Use the following MONITOR commands to obtain the appropriate statistic:
Command | Statistic |
---|---|
Compute Queue | |
STATES | Number of processes in compute (COM) and compute outswapped (COMO) scheduling states |
Estimating CPU Capacity | |
STATES | All items |
MODES | Idle time |
Voluntary Wait States | |
STATES | Number of processes in local event flag wait (LEF), common event flag wait (CEF), hibernate (HIB), and suspended (SUSP) states |
LOCK | ENQs Forced to Wait Rate |
MODES | MP synchronization |
Involuntary Wait States | |
STATES | Number of processes in miscellaneous resource wait (MWAIT) state |
PROCESSES | Types of resource waits (RW xxx) |
Reducing CPU Consumption | |
MODES | All items |
Interrupt State | |
IO | Direct I/O Rate, Buffered I/O Rate, Page Read I/O Rate, Page Write I/O Rate |
DLOCK | All items |
SCS | All items |
MP Synchronization Mode | |
MODES | MP Synchronization |
IO | Direct I/O Rate, Buffered I/O Rate |
DLOCK | All items |
PAGE | All items |
DISK | Operation Rate |
Kernel Mode | |
MODES | Kernel mode |
IO | Page Fault Rate, Inswap Rate, Logical Name Translation Rate |
LOCK | New ENQ Rate, Converted ENQ Rate, DEQ Rate |
FCB | All items |
PAGE | Demand Zero Fault Rate, Global Valid Fault Rate, Page Read I/O |
DECNET | Sum of packet rates |
CPU Load Balancing | |
MODES | Time spent by processors in each mode |
See Table B-1 for a summary of MONITOR data items.
Previous | Next | Contents | Index |
![]() ![]() ![]() ![]() |
privacy and legal statement | ||
6491PRO_010.HTML |