Performance Instrumenting Device Drivers | |
Copyright© 1998 Mark Russinovich | |
Last
Updated February 24, 1998
|
|
Introduction | Many
device drivers work with high-performance devices such as disk drives,
data acquisition cards, and network adapters. Its often the case that
developers are charged with the task of making their driver squeeze as
much performance from a device as possible, within the constraints of
being a well-behaved kernel-mode citizen. In competitive product areas a
positive difference of a few percentage points of benchmark performance
can make for boosted sales through good reviews. Unfortunately, the only
support for accurate performance measurements is KeQueryPerformanceCounter,
which on many HALs offers less than stellar resolution. For example, on
uniprocessor x86's with the standard NT HAL the resolution of the KeQueryPerformanceCounter
source is 1 microsecond. This resolution is unacceptably low for most
device driver functions since they would be lost in the noise. To address the issue of obtaining ultra-accurate device driver profiles, I've developed a set of macros and support routines that rely on the Pentium and higher processor's built-in cycle counter. By definition the cycle counter runs at the rate of the CPU system clock, and is therefore extremely fine grained (e.g. on a 200MHz machine the resolution is about 5 nanoseconds). Defining a new counter and instrumenting a driver, while a manual process, is quite easy, so a typical driver can be instrumented within minutes. When the performance summary function is invoked, you'll not only see how many times a counter entry/exit pair was tripped, but how many total cycles, total microseconds, and average cycles and microseconds were spent in each entry/exit pair you've defined. The rest of this page will describe how to instrument your own drivers, and I include source code to an instrumented version of NTRegmon as an example you can refer to and play with. |
Defining and Placing a Counter | The
first step in using the performance macros is to determine which
functions or code segments you want to instrument. Next you must define
a unique macro identifier to represent the measured code. To insure that
these macro names have little chance of colliding with existing macro
definitions, I've defined the standard of using "PERF_"
appended with the name of an instrumented function, or an artificial
name describing a code segment. For example, if I was instrumenting a
StartIo routine I would define PERF_STARTIO as the name of its
performance counter. Next, you place the calls to start and stop the counter you are defining at the appropriate places in your code. Place the macro PERF_ENTRY(COUNTERNAME) at the function or code segment's entry point, and PERF_EXIT(COUNTERNAME) at all exit points if the code will be running at DISPATCH_LEVEL or lower. If the function is an ISR, or is called from an ISR, use PERF_ISR_ENTRY() and PERF_ISR_EXIT() instead. Note that a PERF_ENTRY must be placed at the start of a C-logical block if you are using C, since the macro declares temporary variables. As a simple example, if you were instrumenting your StartIo routine you would place the counters like this: VOID Startio(IN PDEVICE_OBJECT Device, IN PIRP Irp) { Local variables; PERF_ENTRY( PERF_STARTIO ); .... // Driver Entry code PERF_EXIT( PERF_STARTIO ); } The next step is to add the macro definition to the PERF.H include file that I've provided. Simply add it to the list of your defined counters, giving each one a unique ID number. Then add a textual description for the counter to the CounterNames array in PERF.C. The description's index in the array must correspond to the ID that you assigned the counter. Finally, add a call to InitPerfCounters() in your DriverEntry, and calls to DumpPerfCounters() whereever you want to see the state of the counters. A call to DumpPerfCounters would typically be in a DeviceIoControl where a user-mode program is triggering a dump of performance information, in an unload routine, or in response to IRP_MJ_SHUTDOWN. That's all there is to it. Add PERF.C (a file I supply) to your driver's SOURCES file, and include PERF.H in any modules that make use of counters. Note that by default the function DumpPerfCounters uses a form of DbgPrint that will only print output when building a Checked version of your driver. When you want the counters removed for a release of the driver simply edit the define "#define PERFORMANCE 1" to "#define PERFORMANCE 0" in PERF.H and rebuild. |
Interpreting the Counters | What
you'll see when DumpPerfCounters will look something like the
following, which is taken from the instrumented version of NTRegmon
that can be downloaded below:============================================================================== Function Count Length Average ============================================================================== MatchWithPattern 9789 49500 5 (uS) 9868614 1008 (CY) ------------------------------------------------------------------------------ UpdateStore 3263 555448 170 (uS) 110737077 33937 (CY) ------------------------------------------------------------------------------ ... ----------------------------------------------------------------------------- RegmonDeviceControl 59 15645 265 (uS) 3119075 52865 (CY) ------------------------------------------------------------------------------ Cycles Per Second : 199365268 Cycles in PERF_XXXX : 208 NT Performance counter frequency: 199300000Each counter section includes the descriptive name supplied in CounterNames, the number of times the entry counter of that name was executed, and the total time spent in the entry/exit pair in both microseconds (uS) and cycles (CY). In addition you'll see the average time spent in each execution of the entry exit pair (simply the total time divided by the count). At the bottom of the output you'll see the number of cycles per second InitPerfCounters measured for the system. If this does not match your processor's clock rate than something is amiss. During the calibration phase, InitPerfCounters also determines the approximate overhead of an individual entry/exit pair; the output statistics are adjusted to remove this calculated overhead, which is shown on the "Cycles in PERF_XXX" line of the output. Finally, just for kicks the resolution of the NT performance counter (KeQueryPerformanceCounter) is shown. |
The Sample | I've instrumented NTRegmon as a demonstration of the counters and their use. Defining a total of 15 counters and placing them in the code took about 5 minutes. When you exit the NTRegmon GUI (run it from \gui\release) the driver will unload and dump the counter statistics via DbgPrint. Use DebugMon, Windbg, or SoftICE to view the results. |
Download Instrumented NTRegmon (188KB) Download Perf.c and Perf.h alone (2KB) |