 |
Index for Section 3 |
|
 |
Alphabetical listing for R |
|
 |
Bottom of page |
|
regexp(3)
NAME
advance, advance_r, compile, compile_r, step, step_r - Regular expression
compile and match routines
SYNOPSIS
#define INIT declarations
#define GETC getc code
#define PEEKC peek code
#define UNGETC(c) ungetc code
#define RETURN(ptr) return code
#define ERROR(val) error code
#include <regexp.h>
char *compile(
char *instring,
char *expbuf,
const char *endbuf,
int eof );
int step(
const char *string,
const char *expbuf );
int advance(
const char *string,
const char *expbuf );
extern char *loc1, *loc2, *locs;
The following functions do not conform to current standards and are
supported only for backward compatibility:
char *compile_r(
char *instring,
char *expbuf,
char *endbuf,
int eof,
struct regexp_data *regexp_data );
int advance_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data );
int step_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data );
STANDARDS
Interfaces documented on this reference page conform to industry standards
as follows:
advance(), compile(), step(): XSH4.2
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
PARAMETERS
c The value of the next character (byte) in the regular expression
pattern. Returned by the next call to the GETC() and PEEKC() macros.
ptr Specifies a pointer to the character following the last character of
the compiled regular expression.
val Specifies an error value.
instring
Specifies a string to be passed to the compile() function.
The instring parameter is never used explicitly by the compile()
function, but you can use it in your macros. For example, you may want
to pass the string containing a pattern as the instring parameter to
the compile() function and use the INIT() macro to set a pointer to the
beginning of this string. When your macros do not use instring, call
the compile() function with a value of ((char *) 0) for this parameter.
expbuf
Points to a character array where the compiled regular expression is
stored.
endbuf
Points to the location that immediately follows the character array
where the compiled regular expression is stored. When the compiled
expression cannot be contained in (endbuf-expbuf) number of bytes, a
call to the ERROR(_BIGREGEXP) macro is made (see the ERRORS section).
eof Specifies the character that marks the end of the regular expression.
For example, in ed this character is usually a / (slash).
string
Points to a NULL terminated string of characters, in the step()
function, to be searched for a match.
regexp_data
Is data for the compile_r(), step_r(), and advance_r() functions.
DESCRIPTION
The compile(), advance(), and step() functions are used for general-purpose
expression matching.
The compile() function takes a simple regular expression as input and
produces a compiled expression that can be used with the step() and
advance() functions.
The following six macros, used in the compile() function, must be defined
before the #include <regexp.h> statement in programs. The GETC(), PEEKC(),
and UNGETC() macros operate on the regular expression provided as input for
the compile() function.
INIT()
The INIT() macro is used for dependent declarations and
initializations. In the regexp.h header file this macro is located
right after the compile() function declarations and opening { (left
brace). Your INIT() declarations must end with a ; (semicolon).
The INIT() macro is frequently used to set a register variable to point
to the beginning of the regular expression, so that this pointer can be
used in declarations for GETC(), PEEKC(), and UNGETC(). Alternatively,
you can use INIT() to declare external variables that GETC(), PEEKC(),
and UNGETC() need.
GETC()
The GETC() macro returns the value of the next character (byte) in the
regular-expression pattern. Successive calls to GETC() return
successive characters of the regular expression.
PEEKC()
The PEEKC() macro returns the next character (byte) in the regular
expression. Immediate subsequent calls to this macro return the same
byte, which is also the next character returned by the GETC() macro.
UNGETC(c)
The UNGETC() macro causes the c parameter to be returned by the next
call to the GETC() and PEEKC() macros. No more than one character of
pushback is ever needed because this character is guaranteed to be the
last character read by the GETC() macro. The value of the UNGETC()
macro is always ignored.
RETURN(ptr)
The RETURN() macro is used for normal exit of the compile() function.
The value of the ptr parameter is a pointer to the character following
the last character of the compiled regular expression. This is useful
in programs that manage memory allocation.
ERROR(val)
The ERROR() macro is the abnormal return from the compile() function. A
call to this macro should never return a value. In this macro, val is
an error number, which is described in the ERRORS section of this
reference page.
The step() function finds the first substring of the string parameter that
matches the compiled expression pointed to by the expbuf parameter. When
there is no match, the step() function returns a value of 0 (zero). When
there is a match, the step() function returns a nonzero value and sets two
global character pointers: loc1, which points to the first character of the
substring that matches the pattern, and loc2, which points to the character
immediately following the substring that matches the pattern. When the
regular expression matches the entire expression, loc1 points to the first
character of the string parameter and loc2 points to the NULL character at
the end of the expression specified by the string parameter.
The step() function uses the integer variable circf, which is set by the
compile() function when the regular expression begins with a ^
(circumflex). When this variable is set, the step() function only tries to
match the regular expression to the beginning of the string. When you
compile more than one regular expression before executing the first one,
save the value of circf for each compiled expression and set circf to the
saved value before each call to step().
The advance() function tests whether an initial substring of the string
parameter matches the expression pointed to by the expbuf parameter. Using
the same parameters that were passed to it, the step() function calls the
advance() function. The step() function increments a pointer through the
string parameter characters and calls advance() until a nonzero value,
which indicates a match, is returned, or until the end of the expression
pointed to by the string parameter is reached. To unconditionally constrain
string to point to the beginning of the expression, call the advance()
function directly instead of calling step().
When the advance() function encounters an * (asterisk) or a \{\} sequence
in the regular expression, it advances its pointer to the string to be
matched as far as possible and recursively calls itself, trying to match
the remainder of the regular expression. As long as there is no match, the
advance() function backs up along the string until the function finds a
match or reaches the point in the string where the initial match with the *
or \{\} character occurred.
It is sometimes desirable to stop this backing up before the initial
pointer position in the string is reached. When the locs global character
pointer is matched with the character at the pointer position in the string
during the backing-up process, the advance() function breaks out of the
recursive loop that backs up and returns the value 0 (zero).
The compile_r(), step_r(), and advance_r() functions are the reentrant
versions of the compile(), step(), and advance() functions. They are
supported in order to maintain backward compatibility with operating system
versions prior to Tru64 UNIX Version 4.0.
The regexp.h header file defines the regexp_data structure.
NOTES
This interface has been deprecated in favor of the regcomp() interface
specified by the POSIX and X/Open standards and may be retired. If
possible, you should migrate regexp() regular expression routines to the
routines offered under the regcomp() and regexec() interfaces (see
regcomp(3)).
The regexp interface is provided to support System V applications.
Traditional BSD applications use different functions for regular expression
handling. See the re_comp(3) and re_exec(3) reference pages.
The advance(), compile(), and step() functions are scheduled to be
withdrawn from a future version of the X/Open CAE Specification.
RETURN VALUES
Upon successful completion, the compile() function calls the RETURN()
macro. Upon failure, this function calls the ERROR() macro.
Whenever a successful match occurs, the step() and advance() functions
return a nonzero value. Upon failure, these functions return a value of 0
(zero).
[Tru64 UNIX] The compile_r(), step_r(), and advance_r() functions return
the same values as their non-reentrant counterparts.
ERRORS
If any of the following conditions occurs, the compile() or compile_r()
functions call the ERROR() macro with an error value as its argument:
[[11]]
The range endpoint is too large.
[[16]]
A bad number was received.
[[25]]
The number in \digit is out of range.
[[36]]
There is an illegal or missing delimiter.
[[41]]
There is no remembered search string.
[[42]]
The use of a pair of \( and \) is unbalanced.
[[43]]
There are too many \( and \) pairs (exceeds the maximum value set for
_NBRA in regexp.h, usually 9).
[[44]]
More than two numbers are given in the \{ and \} pair.
[[45]]
A } character was expected after a \.
[[46]]
The first number exceeds the second in the \{ and \} pair.
[[49]]
There is a [ ] pair imbalance.
[[50]]
There is a regular expression overflow.
[[99]]
[Tru64 UNIX] There was an unknown error.
EXAMPLES
The following is an example of the regular expression macros and calls from
the grep command:
#define INIT register char *sp=instring;
#define GETC (*sp++)
#define PEEKC (*sp)
#define UNGETC(c) (--sp)
#define RETURN(c) return;
#define ERROR(c) regerr
#include <regexp.h>
. . .
compile (patstr, expbuf, &expbuf[ESIZE], '\0');
. . .
if (step (linebuf, expbuf))
succeed( );
. . .
SEE ALSO
Functions: ctype(3), fnmatch(3), glob(3), regcomp(3), re_comp(3)
Commands: ed(1), sed(1), grep(1)
Standards: standards(5)
 |
Index for Section 3 |
|
 |
Alphabetical listing for R |
|
 |
Top of page |
|