[next] [previous] [contents] [full-page]11.1 - Wildcard Patterns
11.2 - Regular Expressions
11.3 - Examples
11.4 - Expression Substitution
Matching of strings is a pervasive and important function within the server. Two types are supported; wildcard and regular expression. Wildcard matching is generally much less expensive (in CPU cycles and time) than regular expression matching and so should always be used unless the match explicitly requires otherwise. WASD attempts to improve the efficiency of both by performing a preliminary pass to make simple matches and eliminate obvious mismatches using a very low-cost comparison. This either matches or doesn't, or encounters a pattern matching meta-character which causes it to undertake full pattern matching.
To assist with the refinement of string matching patterns the Server Administration facility (18 - Server Administration) has a report item named "Match". This report allows the input of target and match strings and allows direct access to the server's wildcard and regular expression matching routines. Successful matches show the matching elements and a substitution field (11.4 - Expression Substitution) allows resultant strings to be assessed.
To determine what string match processing is occuring during request
processing in the running server use the match item available
from the Server Administration WATCH Report (19 - WATCH Facility).
11.1 - Wildcard Patterns
Wildcard patterns are simple, low-cost mechanisms for matching a string to a template. They are designed to be used in path and authorization mapping to compare a request path to the root (left-hand side) or a template expression.
|
Wildcard matching uses the '*' and '%' symbols to match any zero or more,
or any one character respectively. The '*' wildcard can either be greedy or
non-greedy depending on the context (and for historical reasons). It can also
be forced to be greedy by using two consecutive ('**'). By default it is not
greedy when matching request paths for mapping or authentication, and is greedy
at other times (matching strings within conditional testing, etc.)
Greedy and Non-Greedy
Non-greedy matching attempts to match an asterisk wildcard up until the first character that is not the same as the character immediately following the wildcard. It matches a minimum number of characters before failing. Greedy matching attempts to match all characters up until the first string that does not match what follows the asterisk.
To illustrate; using the following string
non-greedy character matching compared to greedy character matching
the following non-greedy pattern
*non-greedy character*matching
does not match but the following greedy pattern
*non-greedy character**matching
does match. The non-greedy one failed as soon as it encountered the space
following the first "matching" string, while the greedy pattern
continued to match eventually encountering a string matching the string
following the greedy wildcard.
11.2 - Regular Expressions
Regular expression matching is case insensitive (in line with other WASD behaviour) and uses the Posix EGREP pattern syntax and capabilities. Regular expression matching offers significant but relatively expensive functionality. One of those expenses is expression compilation. WASD attempts to eliminate this by pre-compiling expressions during server startup whenever feasable. Regular expression matching must be enabled using the [RegEx] HTTPD$CONFIG directive and are then differentiated from wildcard patterns by using a leading "^" character.
A detailed tutorial on regular expression capabilities and usage is well beyond the scope of this document. Many such hard-copy and on-line documents are available. This summary is only to serve as a quick mnemonic. WASD regular expressions support the following set of operators.
|
The following operators are used to match one, or in conjunction with the repetition operators more, characters of the target string. These single and leading characters are reserved meta-characters and must be escaped using a leading backslash ("\") if required as a literal character in the matching pattern.
|
Repetition operators control the extent, or number, of whatever the matching operators match. These are also reserved meta-characters and must be escaped using a leading backslash if required as a literal character.
|
The following provides a series of examples as they might occur in use for server configuration.
if (user-agent:Mozilla*Gecko*) if (user-agent:^^Mozilla.*Gecko)
map /*/-/* /ht_root/runtime/*/* map ^/(.+)/-/(.+) /ht_root/runtime/*/*
pass ^[^-_./a-z0-9]+ "403 Forbidden character in path!"
Expression substitution is available during path mapping
(13 - Mapping Rules). Both wildcard (implicitly) and regular
expressions (using grouping operators) note the offsets of matched
portions of the strings. These are then used for wildcard and
specified wildcard substitution where result strings provide for
this (e.g. mapping 'pass' and 'redirect' rules). A maximum of nine such
wildcard substitutions are supported (one other, the zeroeth, is the full
match).
Wildcard Substitution
With wildcard matching each asterisk wildcard contained in the pattern
(template string) has matching characters in the
target string noted and stored. Note that for the percentage
(single character) wildcard no such storage is provided. These characters are
available for substitution using corresponding wildcards present in the
result string. For instance, the target string
this is an example target string
would be matched by the pattern string
* is an example target *
as containing two matching wildcard strings
this
string
which could be substituted using the result string
* is an example result *
producing the resultant string
this is an example result string
Regular Expression Substitution
With regular expression matching the groups of matching characters must be
explicitly specified using the grouping parenthesis operator.
Hence with regular expression matching it is possible to match many characters
from the target string without retaining them for later substitution. Only if
that match is designated as a subsitution source do the matching characters
become available for substituion via any result string. Using two possible
target strings as an example
this is an example target string
this is a contrived target string
would both be matched by the regular expression
^^([a-z]*) is [a-z ]* target ([a-z]*)$
which though it contains three regular expressions in the pattern, only
two have the grouping parentheses, and so make their matching string available
for substitution
this
string
which could be substituted using the result string
* is the final result *
producing the resultant string
this is the final result string
Specified Substitution
By default the strings matched by wildcard or grouping operators are substituted in the same order in which they are matched. This order may be changed by specifying which wildcard string should be substituted where. Not all matched (and stored) strings need to be substituted. Some may be omitted and the contents effectively ignored.
The specified substitution syntax is a result wildcard followed by a
single-apostrophe (') and a single digit from zero to nine (0...9).
The zeroeth element is the full matching string. Element one is the first
matching part of the expression, on through to the last. Specifying an element
that had no matching string substitutes an empty string (i.e. nothing is
added). Using the same target string as in the previous previous example
this is an example target string
and matched by the wildcard pattern string
* is an example target *
when substituted by the result string
*'2 is an example result
would produce the resultant string
string is an example result
with the string represented by the first wildcard effectively being
discarded.