Bounds Checking for C

Richard Jones and Paul Kelly, Imperial College, July 1995

We're very excited about this: we can check every time a program uses a pointer or array and ensure that only valid references are allowed. This isn't new: what's new is that checked code can interwork with unchecked modules, libraries and system calls. We're still working on some rough edges and on improving the performance.

This is a short overview; for a full report (and the code), see here.

C is unusual among programming languages in providing the programmer with the full power of pointers. Languages in the Pascal/Algol family have arrays and pointers, with the restriction that arithmetic on pointers is disallowed. Languages like BCPL allow arbitrary operations on pointers, but lack types and so require clumsy scaling by object sizes.

An advantage of the Pascal/Algol approach is that array references can be checked at run-time fairly efficiently, in fact so efficiently that there is a good case for bounds-checking in production code. Bounds checking is easy for arrays because the array subscript syntax specifies both the address calculation and the array within which the resulting pointer should point.

With pointers in C, a pointer can be used in a context divorced from the name of the storage region for which it is valid.

Approaches to bounds checking

One response to this analysis is to discard C, since this lack of efficient checkability is responsible for many software failures.

A second approach is to extend the language to make checking easier. There are various proposals for doing this, and it is an opportunity to add other features such as assertion checking.

A third more-or-less workable scheme is to modify the representation of pointers to include three items: the pointer itself, and the lower and upper bounds of the object to which it is supposed to point. Experience with this has shown the benefits of bounds checking (e.g. see the bcc and rtcc compilers cited below), but there are difficulties:

How we solved the problem

Our technique provides full checking without changing the representation of pointers. We therefore avoid most of the problems noted above. Some efficiency problems remain, but bounds checking need not be used in all of the files which make up a program, so trusted, performance-critical code can run at full speed.

The key idea is this:

Implementation

We made some small modifications to the C front-end of gcc, the Gnu C compiler, to add code to check pointer arithmetic and use, and to maintain a table of known allocated storage regions.

We went to some trouble to ensure that gcc's optimiser could handle the added code, and employed modest inlining for efficiency.

The table of known allocated storage regions has to handle insertions, deletions and range lookups extremely fast, but since programs display a high degree of locality the access pattern is highly skewed. For these reasons a splay tree was used, in which objects are migrated to the root when accessed.

Performance

Availability

The software is distributed free under GNU copyleft, in the form of a patch to the gcc 2.7.0 source distribution ( here).