[next] [previous] [contents] [full-page]6.1 - Site Organisation
6.2 - Server Instances
6.3 - Virtual Services
6.4 - Request Throttling
6.5 - Client Concurrency
6.6 - Content-Type Configuration
6.6.1 - Adding Content-Types
6.6.2 - MIME.TYPES
6.6.3 - Unknown Content-Types
6.6.4 - Explicitly Specifying Content-Type
6.7 - Language Variants
6.8 - Character Set Conversion
6.9 - Error Reporting
6.9.1 - Basic and Detailed
6.9.2 - Site Specific
6.10 - OPCOM Logging
6.11 - Access Logging
6.11.1 - Log Format
6.11.2 - Log Per-Period
6.11.3 - Log Per-Service
6.11.4 - Log Per-Instance
6.11.5 - Log Naming
6.11.6 - Access Tracking
6.11.7 - Access Alert
6.12 - Include File Directive
Server configuration basically concerns itself with the very fundamental behaviour of the server process. Requirements such as buffer and cache sizes, timeout values, scripting limits, content-types, icons, number and type of services offerted, etc., are determined from configuration information.
By default, the system-table logical name HTTPD$CONFIG locates a common
configuration file, unless an individual configuration file is specified using
a job-table logical name. Simple editing of this file changes the
configuration. Comment lines may be included by prefixing them with the hash
"#" character. Configuration file directives are not case-sensitive. Any
changes to the configuration file can only be enabled by restarting the HTTPd
process using the following command on the server system.
$ HTTPD /DO=RESTART
A server's currently loaded configuration can be interrogated. See
18 - Server Administration for further information.
6.1 - Site Organisation
Here are a few "Mother's Truths" about site organisation. These are only basic and obvious suggestions (after a little step back from the sometimes initially overwhelming feeling of "what do I do now with this brand new toy?"). There are lots of general documents of Web site organisation and design that are applicable to all server environments. Above all, bring your own software system design experience to the Web-specific environment, it's not all that different to any other transaction-based, user-interactive environment.
It is recommended that the server distribution tree and any document and
other Web-specific data areas be kept separate and distinct. The former in
HT_ROOT:[000000], the latter perhaps in something like WEB:[000000]. This
logical device could be provided with the following DCL introduced into the
site or server startup procedures:
$ DEFINE /SYSTEM /TRANSLATION=CONCEALED WEB DSA811:[WEB.]
Note that logical device names like this need not appear in in the
structure of the Web site. The root of the Web-accessable path can be
concealed using a final mapping rule similar to the following
pass /* /web/*
which simply defaults anything else to that physical area. Of
course if that anything else needs to exist then it must be located
in that physical area.
Mapping rules are the tools used to build a logical structure to a site from the physical area, perhaps multiple areas, used to house the associated files. The logical organisation of served data is largely hierarchical, organised under the Web-server path root, and is achieved via two mechanisms.
Physically distinct areas are used for good physical reasons (e.g. the
area can best be hosted on a task-local disk), for historical reasons (e.g.
the area existed before any Web environment existed) or for reasons of
convenience (e.g. lets put this where access controls already allow the
maintainers to manage it).
Guidelines
There are no good reasons for having site-specific documents integrated with the package directory structure. All site-served files should be located in an autonomous, dedicated area or areas. The only reason to place script files into HT_ROOT:[CGI-BIN] or HT_ROOT:[architecture_BIN] is that the script script is traditionally accessable via a /cgi-bin/ path or that the site is a small and/or low usage environment where this directory is conveniently available for the few extra scripts being made available.
For any significant site (size that as best suits your perception), or for when a specific software system or systems is being built or exists and it is being "Web-ified", design that software system as you would be any other. That is place the documentation in one directory are, executables and support procedures in their own, management files in another, data in yet another area, etc. Then make those portions that are required to be accessable via the Web interface accessable via the logical associations afforded through the use of the server's mapping rules (13 - Mapping Rules). Of course existing areas that are to be now made available via the Web can be mapped in the same way. This includes the active components - executable scripts. There is no reason (apart from historical) why the /cgi-bin/ path should be used to activate scripts associated with a dedicated software system. Use a specific and unique path for scripts associated with each such system.
When making a directory structure available via the Web care must be taken
that only the portions required to be accessed can be. Other areas should or
must not be accessable. The server process can only access files that are
world-accessable, it is specifically granted access via VMS protection
mechanisms (e.g. ACLs), or that the individual SYSUAF-authorized accessor can
access and which have specifically been made available via server
authorization rules. Use the recommendations in
7.1 - Recommended Package Security as guidlines when designing your own
site's protections and permissions.
Document Root
A particular area of the file system may be specified as the root of a particular (virtual) sites documents. This is done using the HTTPD$MAP SET map=root=<string> mapping rule. After this rule is applied all subsequent rules have the specified string prefixed to mapped strings before file-system resolution.
For example, the following HTTPD$MAP rule set
[[the.virtual.site:*]]
pass /*/-/* /ht_root/runtime/*/*
/ht_root/* /ht_root/*
set * map=root=/dka0/the_site
exec /cgi-bin/* /cgi-bin/*
pass /* /*
fail *
when applied to the following request URLs results in the described
mappings being applied.
http://the.virtual.site/doc/example.txt
access to the document represented by file
DKA0:[THE_SITE.DOC]EXAMPLE.TXT
With the request for a directory icon using
http://the.virtual.site/-/httpd/file.gif
access to the image represented by file
HT_ROOT:[RUNTIME.HTTPD]FILE.GIF
And a request for a script using
http://the.virtual.site/cgi-bin/example.php
activation of the script represented by the file
DKA0:[THE_SITE.CGI-BIN]EXAMPLE.PHP
Care must be taken in getting the sequence of mapping rules correct for
access to non-site resources before actually setting the document root which
then ties every other resource to that root.
The term instance is used by WASD to describe an autonomous server
process. WASD will support multiple servers running on a single system, alone
or in combination with multiple servers running across a cluster. When
multiple instances are configured on a single system they cooperate to
distribute the request load between themselves and share certain essential
resources such as accounting and authorization information.
6.2 - Server Instances
WARNING
Versions earlier than Compaq TCP/IP Services v5.3 and some TCPware v5.n (at least) have a problem with socket listen queuing that can cause services to "hang" (should this happen just restart the server). Ensure you have the requisite version/ECO/patch installed before activating multiple instances on production systems!
The approach WASD has used in providing multiple instance serving may be compared in many ways to VMS clustering.
A cluster is often described as a loosely-coupled, distributed operating environment where autonomous processors can join, process and leave (even fail) independently, participating in a single management domain and communicating with one another for the purposes of resource sharing and high availability.
Similarly WASD instances run in autonomous, detached processes (across one
or more systems in a cluster) using a common configuration and management
interface, aware of the presence and activity of other instances (via the
Distributed Lock Manager and shared memory), sharing processing load and
providing rolling restart and automatic "fail-through" as required.
Load Sharing
On a multi-CPU system there are performance advantages to having processing available for scheduling on each. WASD employs AST (I/O) based processing and was not originally designed to support VMS kernel threading. Benchmarking has shown this to be quite fast and efficient even when compared to a kernel-threaded server (OSU) across 2 CPUs. The advantage of multiple CPUs for a single multi-threaded server also diminishes where a site frequently activates scripts for processing. These of course (potentially) require a CPU each for processing. Where a system has many CPUs (and to a lesser extent with only two and few script activations) WASD's single-process, AST-driven design would scale more poorly. Running multiple WASD instances addresses this.
Of course load sharing is not the only advantage to multiple
instances ...
Restart
When multiple WASD instances are executing on a node and a restart is
initiated only one process shuts down at a time. Others remain available for
requests until the one restarting is again fully ready to process them itself,
at which point the next commences restart. This has been termed a
rolling restart. Such behaviour allows server reconfiguration on a
busy site without even a small loss of availability.
Fail-Through
When multiple instances are executing on a node and one of these exits for some reason (resource exhaustion, bugcheck, etc.) the other(s) will continue to process requests. Of course requests in-progress by the particular instance at the time of instance failure are disconnected (this contrasts with the rolling restart behaviour described above). If the former process has actually exited (in contrast to just the image) a new server process will automatically be created after a few seconds.
The term fail-through is used rather than failover
because one server does not commence processing as another ceases. All servers
are constantly active with those remaining immediately and automatically taking
all requests in the absence any one (or more) of them.
Considerations
Of course "there is no such thing as a free lunch" and supporting multiple instances is no exception to this rule. To coordinate activity between and access to shared resources, multiple instances use low-level mutexes and the VMS Distributed Lock Manager (DLM). This does add some system overhead and a little latency to request processing, however as the benchmarks indicate (21 - Server Performance) increases in overall request throughput on a multi-CPU system easily offset these costs. On single CPU systems the advantages of rolling restart and fail-through need to be assessed against the small cost on a per-site basis. It is to be expected many low activity sites will not require multiple instances to be active at all.
When managing multiple instances on a single node it is important to
consider each process will receive a request in round-robin distribution and
that this needs to be considered when debugging scripts, using the Server
Administration page and the likes of WATCH, etc.
Configuration
If not explicitly configured only one instance is created. The
configuration directive [InstanceMax] allows multiple instances to be
specified (8 - Server Configuration Directives). When this is set to an
integer that many instances are created and maintained. If set to
"CPU" then one instance per system CPU is created. If set to
"CPU-integer" then one instance for all but one CPU is
created, etc. The current limit on instances is eight, although this is
somewhat arbitrary. As with all requests, Server Administration page access
is automatically shared between instances. There are occasions when consistent
access to a single instance is desirable. This is provided via an
admin service (9 - Service Configuration).
6.3 - Virtual Services
The WASD server is capable of concurrently supporting the same host name on different port numbers and a number of different host names (DNS aliased or multi-homed) using the same port number. This capability is generally known as a virtual server. There is no design limitation on the number of these services that WASD will concurrently support. Virtual services offer versatile and powerful multi-site capabilities using the one system and server. Service determination is based on the contents of the request's "Host:" header field. If none is present it defaults to base service for the interface's IP address and port.
The same mechanism also effectively allows a single instance of the configuration files to support multiple server processes (using the /SERVICE qualifier), either on the one system or across multiple systems, as in a cluster (i.e. supports "virtual" and "real" servers.) See STARTUP_SERVER.COM for further information on startup support for these configurations.
WASD provides server process run-time parameters via the HTTPD$CONFIG
configuration file. These provide settings for logging, scripting, timeouts,
file content-type mappings, etc. The HTTPD$MSG file provides configurable
system messages.
[Service]
Using the [Service] HTTPD$CONFIG configuration parameter or the /SERVICE qualifier the server creates an HTTP service for each specified. If the host name is omitted it defaults to the local host name. If the port is omitted it defaults to 80. The first port specified in the service list becomes the "administration" port of the server, using the local host name, appearing in administration reports, menus, etc. This port is also that specified when sending control commands via the /DO= qualifier (5.5.2 - Server Command Line Control).
This rather contrived example shows a server configured to provide four
services over two host names.
[Service]
alpha.wasd.dsto.defence.gov.au
alpha.wasd.dsto.defence.gov.au:8080
beta.wasd.dsto.defence.gov.au
beta.wasd.dsto.defence.gov.au:8000
Note that both the HTTPD$SERVICE configuration file
(see 9 - Service Configuration and the /SERVICE= command-line
qualifier (5.5 - HTTPd Command Line) override this directive.
HTTPD$SERVICE
If the logical name HTTPD$SERVICE is defined (and it is optional) the HTTPD$CONFIG [Service] directive is not used. As mentioned, this configuration file is optional. For simple sites, those containing one or two services, the use of a separate service configuration file is probably not warranted. Once the number begins to grow this file offers an easier-to-manage interface for those services.
See 9 - Service Configuration for further detail.
[[virtual-server]]
The essential profile of a site is established by it's mapped resources and any authorization controls, the HTTPD$MAP and HTTPD$AUTH configuration files respectively, and these two files support directives that allow configuration rules to be applied to all virtual services (i.e. a default), to a host name (all ports), or to a single specified service (host name and specific port).
To restrict rules to a specified server (virtual or real) add a line
containing the server host name, and optionally a port number, between
double-square brackets. All following rules will be applied only to that
service. If a port number is not present it applies to all ports for that
service name, otherwise only to the service using that port. To resume
applying rules to all services use a single asterisk instead of a host name. In
this way default (all service) and server-specific rules may be interleaved to
build a composite environment, server-specific yet with defaults. Note that
service-specific and service-common rules may be mixed in any order allowing
common rules to be shared. This descriptive example shows a file with one rule
per line.
# just an example
this rule applies to all services
so does this
and this one
[[alpha.wasd.dsto.defence.gov.au]]
this one however applies only to ALPHA, but to all ports
as indeed does this
[[beta.wasd.dsto.defence.gov.au:8000]]
now we switch to the BETA service, but only port 8000
another one only applying to BETA
and a third
[[*]]
now we have a couple default rules
that again apply to all servers
Both the mapping and authorization modules report if rules are provided for services that are not configured for the particular server process (i.e. not in the server's [Service] or /SERVICE parameter list). This provides feedback to the site administrator about any configuration problems that exist, but may also appear if a set of rules are shared between multiple processes on a system or cluster where processes deliver differing services. In this latter case the reports can be considered informational, but should be checked initially and then occasionally for misconfiguration.
NOTE
There is a difference when specifying virtual services during service creation and when using them to apply mapping, etc. When creating a service the scheme (or protocol, e.g. "http:", "https:") needs to be specified so the server can apply the correct protocol to connections accepted at that service. Once a service is created however, it becomes defined by the host-name and port supplied when created. Only one scheme (protocol) can be supported on any one host-name/port instance and so it becomes unnecessary to provide it with mapping rules, etc. The server will complain in instances where it is redundant.
If a service is not configured for the particular host address and port of a request one of two actions will be taken.
[ServiceNotFoundURL] //server.host.name/httpd/-/servicenotfound.html
pass /*/-/admin/* pass /*/-/* /ht_root/runtime/*/* exec /cgi-bin/* /cgi-bin/* [[virtual1.host.name]] /* /web/virtual1/* / /web/virtual1/ [[virtual2.host.name]] /* /web/virtual2/* / /web/virtual2/ [[virtual3.host.name]] /* /web/virtual3/* / /web/virtual3/ [[*]] /* /web/servicenotfound.html
This applies to dotted-decimal addresses as well as alpha-numeric. Therefore if there is a requirement to connect via a numeric IP address such a service must have been configured.
Note also that the converse is possible. That is, it's possible to
configure a service that the server cannot ever possibly respond to because it
does not have an interface using the IP address represented by the service
host.
6.4 - Request Throttling
Request "throttling" is a term adopted to describe controlling the number of requests that can be processing against any specified path at any one time. Requests in excess of this value are First-In-First-Out (FIFO) queued, up to an optional limit, waiting for a currently processing request to conclude allowing the next queued request to resume processing. This is primarily intended to limit concurrent resource-intensive script execution but could be applied to any resource path. Here's one dictionary description.
throttle n 1: a valve that regulates the supply of fuel to the engine [syn: accelerator, throttle valve] 2: a pedal that controls the throttle valve; "he stepped on the gas" [syn: accelerator, accelerator pedal, gas pedal, gas, gun] v 1: place limits on; "restrict the use of this parking lot" [syn: restrict, restrain, trammel, limit, bound, confine] 2: squeeze the throat of; "he tried to strangle his opponent" [syn: strangle, strangulate] 3: reduce the air supply; of carburetors [syn: choke]
This is applied to a path (or paths) using the HTTPD$MAP mapping SET
THROTTLE= rule (13.4.5 - SET Rule). The general format is
set path throttle=n1[,n2,n3,n4,t/o1,t/o2]
set path throttle=from[,to,resume,busy,t/o-queue,t/o-busy]
where
One way to read a throttle rule is "begin to throttle (queue) requests from the n1 value up to the n2 value, after which the queue is FIFOed up to the n3 value when it resumes queuing-only, up until the busy n4 value".
Each integer represents the number of concurrent requests against the throttle rule path. Parameters not required may be specified as zero or omitted in a comma-separated list. The schema of the rule requires that each successive parameter be larger than that preceding it. This basic consistency check is performed when the rule is loaded.
For any rule the possible maximum number of requests that can be processed at any one time may be simply calculated through the addition of the n1 value to the difference of the n3 and n2 values (i.e. max = n1 + (n3 - n2)). The maximum concurrently queued as the difference of the n4 and the maximum concurrently processed.
A comprehensive throttle statistics report is available from the Server
Administration page (18 - Server Administration).
Examples
Requests up to 10 are concurrently processed. When 10 is reached futher requests are queued to server capacity.
Concurrent requests to 10 are processed immediately. From 11 to 20 requests are queued. After 20 all requests are queued but also result in a request FIFOing off the queue to be processed (queue length is static, number being processed increases to server capacity).
Concurrent requests up to 15 are immediately processed. Requests 16 through to 30 are queued, while 31 to 40 requests result in the new requests being queued and waiting requests being FIFOed into processing. Concurrent requests from 41 onwards are again queued, in this scenario to server capacity.
Concurrent requests up to 10 are immediately processed. Requests 11 through to 20 will be queued. Concurrent requests from 21 to 30 are queued too, but at the same time waiting requests are FIFOed from the queue (resulting in 10 (n1) + 10 (n3-n2) = 20 being processed). From 31 onwards requests are just queued. Up to 40 concurrent requests may be against the path before all new requests are immediately returned with a 503 "busy" status. With this scenario no more than 20 can be concurrently processed with 20 concurrently queued.
Concurrent requests up to 10 are processed. When 10 is reached requests are queued up to request 30. When request 31 arrives it is immediately given a 503 "busy" status.
This is basically the same as scenario 4) but with a resume-on-timeout of two minutes. If there are currently 15 (or 22 or 28) requests (n1 exceeded, n3 still within limit) the queued requests will begin processing on timeout. Should there be 32 processing (n3 has reached limit) the request will continue to sit in the queue. The timeout would not be reset.
This is basically the same as scenario 3) but with a busy-on-timeout of three minutes. When the timeout expires the request is immediately dequeued with a 503 "busy" status.
Throttling is applied using mapping rules. The set of these rules may be changed within an executing server using map reload functionality. This means the number of, and/or contents of, throttle rules may change during server execution. The throttle functionality needs to be independent of the the mapping functionality (requests are processed independently of mapping rules once the rules have been applied). After a mapping reload the contents of the throttle data structures may be at variance with the constraints currently executing requests began processing under.
This should have little deleterious effect. The worst case is mis-applied
constraints on the execution limits of changed request paths, and slightly
confusing data in the Throttle Report. This quickly passes as requests being
processed under the previous throttle constraints conclude and an entirely new
collection of requests created using the constraints of the currently loaded
rules are processed.
6.5 - Client Concurrency
The "client_connect_gt:" mapping conditional (12 - Conditional Configuration) attempts to allow some measurement of the number of requests a particular client currently has being processed. Using this decision criterion appropriate request mapping for controlling the additional requests can be undertaken. It is not intended to provide fine-grained control over activities, rather just to prevent a single client using an unreasonable proportion of the resources.
For example. If the number of requests from one particulat client looks
like it has got out of control (at the client end) then it becomes possible to
queue (throttle) or reject further requests. In HTTPD$MAP
if (client_connect_gt:15) set * throttle=15
if (client_connect_gt:15) pass * "503 Exceeding your concurrency limit!"
While not completely foolproof it does offer some measure of control over
gross client concurrency abuse or error.
6.6 - Content-Type Configuration
HTTP uses an implementation of the MIME (Multi-purpose
Internet Mail Extensions) specification for identifying the type of data
returned in a response. A MIME content-type consists of a plain text string
describing the data as a type and slash-separated
subtype, as illustrated in the following examples:
text/html
text/plain
image/gif
image/jpeg
application/octet-stream
The content-type is returned to the client as part of the HTTP response,
the client then using this information to correctly process and present the
data contained in that response.
6.6.1 - Adding Content-Types
In common with most HTTP servers WASD uses a file's suffix (extension, type, e.g. ".HTML, ".TXT", ".GIF"" to identify the data type within the file. The [AddType] directive is used during configuration to bind a file type to a MIME content-type. To make the server recognise and return specific content-types these directives map file types to content-types.
With the VMS file system there is no effective file characteristic or algorithm for identifying a file's content without an exhaustive examination of the data contained there-in ... a very expensive process (and probably still inconclusive in many cases), hence the reliance on the file type.
NOTE
When adding a totally new content-type to the configuration be sure also to bind an icon to that type using the [AddIcon] directive (see below). If this is not done the default icon specified by [AddDefaultIcon] is displayed. If that is not defined then a directory listing shows "[?]" in place of an icon.
Mappings using [AddType] look like these.
[AddType]
.html text/html HyperText Markup Language
.txt text/plain plain text
.gif image/gif image (GIF)
.hlb text/x-script /Conan VMS Help library
.decw$book text/x-script /HyperReader Bookreader book
* internal/x-unknown application/octet-stream
6.6.2 - MIME.TYPES
To allow the server to share content-type definitions with other MIME-aware applications, and for WASD scripts to be able to perform their own mapping on a shared understanding of MIME content it is possible to move the file suffix to content-type mapping from a collection of [AddType]s in HTTPD$CONFIG to an external file. This file is usually named MIME.TYPES and is specified in HTTPD$CONFIG using the [AddMimeTypesFile] directive.
Mappings using MIME.TYPES look like these.
# MIME type Extension
application/msword doc
application/octet-stream bin dms lha lzh exe class
application/oda oda
application/pdf pdf
application/postscript ai eps ps
application/rtf rtf
A leading content-type is mapped to single or multiple file suffixes. A general MIME.TYPES file commonly has content-types listed with no corresponding file suffix. These are ignored by WASD. Where a file suffix is repeated during configuration the latter version completely supercedes the former (with the Server Administration page showing an italicised and struck-through content-type to help identify duplicates).
To allow the configuration information used by the server to generate directory listings with additional detail, WASD-specific extensions to the standard MIME.TYPES format are provided. These are "hidden" in comment structures so as not to interfere with non-WASD application use. All begin with a hash then an exclamation character ("#!") then another reserved character indicating the purpose of the extension. Existing comments are unaffected provided the second character is anything but an exclamation mark!
These directives are placed following the MIME-type entry
they apply to. An example of the contents of a MIME.TYPES file with various
WASD extensions.
# MIME type Extension
application/msword doc
#! MS Word document
#![DOC] /httpd/-/doc.gif
application/octet-stream bin dms lha lzh exe class
#! binary content
#![BIN] /httpd/-/binary.gif
application/oda oda
application/pdf pdf
application/postscript ai eps ps
#! Adobe PostScript
#![PS.] /httpd/-/postscript.gif
#!+A
application/rtf rtf
#! Rich Text Format
#![RTF] /httpd/-/rtf.gif
application/x-script bks decw$bookshelf
#! DEC Bookshelf
#!/cgi-bin/hypershelf
application/x-script bkb decw$book
#![BKR] /httpd/-/script.gif
#! DEC Book
#!/cgi-bin/hyperreader
Other reserved characters have been specified for development purposes but are not (perhaps currently) employed by the HTTP server.
If a file type is not recognised (i.e. no [AddType] or [AddMimeTypesFile]
mapping corresponding to the file type) then by default WASD identifies its
data as application/octet-stream (i.e. essentially binary data).
Most browsers respond to this content-type with a download dialog, allowing
the data to be saved as a file. Most commonly these unknown types manifest
themselves when authors use "interesting" file names to indicate their
purpose. Here are some examples the author has encountered:
README.VMS
README.1ST
READ-ME.FIRST
BUILD.INSTRUCTIONS
MANUAL.PT1 (.PT2, ...)
If the site administrator would prefer another default content-type,
perhaps "text/plain" so that any unidentified files default to plain
text, then this may be configured by specifying that content-type as the
description of the catch-all file type entry. Examples (use one
of):
[AddType]
* internal/x-unknown
* internal/x-unknown application/octet-stream
* internal/x-unknown text/plain
* internal/x-unknown something/else-entirely
It is the author's opinion that unidentified file types should remain as
binary downloads, not "text" documents, which they are probably more
often not, but it's there if it's wanted.
6.6.4 - Explicitly Specifying Content-Type
When accessing files it is possible to explicitly specify the identifying content-type to be returned to the browser in the HTTP response header. Of course this does not change the actual content of the file, just the header content-type! This is primarily provided to allow access to plain-text documents that have obscure, non-"standard" or non-configured file extensions.
It could also be used for other purposes, "forcing" the browser to accept a particular file as a particular content-type. This can be useful if the extension is not configured (as mentioned above) or in the case where the file contains data of a known content-type but with an extension conflicting with an already configured extension specifying data of a different content-type.
Enter the file path into the browser's URL specification field ("Location:",
"Address:"). Then, for plain-text, append the following query string:
?httpd=content&type=text/plain
For another content-type substitute it appropriately.
For example, to retrieve a text file in binary (why I can't imagine :^) use
?httpd=content&type=application/octet-stream
This is an example:
file.unknown file.unknown?httpd=content&type=text/plain
It is posssible to "force" the content-type for all files in a particular
directory. Enter the path to the directory and then add
?httpd=index&type=text/plain
(or what-ever type is desired). Links to files in the listing will contain the appropriate "?httpd=content&type=..." appended as a query string.
This is an example:
*.* *.*?httpd=index&type=text/plain
Language-specific variants of a document may be configured to be served automatically and transparently. This is organized as a basic file and name with language-specific variant indicated by an additional "tag", one of ISO language abbreviations used by the "Accept-Language:" request header field, e.g. en for English, fr for French, de for German, ru for Russian, etc.
Two variants of the basic file specification are possible; file name (the
default) and file type. Hence if the basic file name is EXAMPLE.HTML then
specifically German, English, French and Russian language versions in the
directory would be either
EXAMPLE.HTML
EXAMPLE_DE.HTML
EXAMPLE_EN.HTML
EXAMPLE_FR.HTML
EXAMPLE_RU.HTML
or
EXAMPLE.HTML
EXAMPLE.HTML_DE
EXAMPLE.HTML_EN
EXAMPLE.HTML_FR
EXAMPLE.HTML_RU
A path must be explicitly SET using the accept=lang mapping
rule as containing language variants. As searching for variants is a
relatively expensive operation the rule(s) applying this functionality should
be carefully crafted. The accept=lang rule accepts an optional
default language representing the contents of the basic, untagged files. This
provides an opportunity to more efficiently handle requests with a language
first preference matching that of the default. In this case no variant search
is undertaken, the basic file is simply served. The following example sets a
path to contain files with a default language of French and possibly containing
other language variants.
set /web/doc/* accept=lang=(default=fr)
In this case the behaviour would be as follows. With the default language set to "fr" a request's "Accept-Language:" field is initially processed to check if the first preference is for "fr". If it is then there is no need for further accept language processing and the basic file is returned as the response. If not then the directory is searched for other files matching the EXAMPLE_*.HTML specification. All files matching this wildcard have the "*" portion (e.g. "EN", "FR", "DE", "RU") added to a list of variants. When the search is complete this list is compared to the request's "Accept-Language:" list. The first one to be matched has the contents of the corresponding file returned. If none are matched the default version would be returned.
This example of the behaviour is based on the contents of the directory
described above. A request that specifies
Accept-Language: fr,de,en
will have EXAMPLE.HTML returned (without having searched for any other
variants). For a request specifying
Accept-Language: ru,en
then the EXAMPLE_RU.HTML file is returned, and if no
"Accept-Language:" is supplied with the request EXAMPLE.HTML would be
returned. One or other file is always returned, with the default, non-language
file always the fallback source of data. If it does not exist and no other
language variant is selected the request returns a 404 file-not-found error.
Content-Type
When using the accept=lang=(variant=type) form of the rule
(i.e. the variant is placed on the file type rather than the default file name)
each possible file extension must also must have it's content-type made known
to the server. Using the example above the variants would need to be
configured in a similar way to the following.
[AddType]
.HTML "text/html; charset=ISO-8859-1" HyperText Markup Language
.HTML_DE "text/html; charset=ISO-8859-1" HTML (German)
.HTML_EN "text/html; charset=ISO-8859-1" HTML (English)
.HTML_FR "text/html; charset=ISO-8859-1" HTML (French)
.HTML_RU "text/html; charset=koi8-r" HTML (Russian)
Non-Text Content
Normally only files with a content-type of "text/.." are subject to
variant searching. If the rule path includes a file type then those files
matching the rule are also variant-searched. In this way images, audio files,
etc., may also have language-specific versions supplied transparently. The
following illustrates this usage
set /web/doc/*.jpg accept=lang=(default=fr)
set /web/doc/*.wav accept=lang=(default=fr)
6.8 - Character Set Conversion
The default character set sent in the response header for text documents (plain and HTML) is set using the [CharsetDefault] directive and/or the SET charset mapping rule. English language sites should specify ISO-8859-1, other Latin alphabet sites, ISO-8859-2, 3, etc. Cyrillic sites might wish to specify ISO-8859-5 or KOI8-R, and so on.
Document and CGI script output may be dynamically converted from one
character set to another using the standard VMS NCS conversion library. The
[CharsetConvert] directive provides the server with character set aliases
(those that are for all requirements the same) and which NCS conversion
function may be used to convert one character set into another.
document-charset accept-charset[,accept-charset..] [NCS-function-name[=factor]]
When this directive is configured the server compares each text response's character set (if any) to each of the directive's document charset string. If it matches it then compares each of the accepted charset (if multiple) to the request "Accept-Charset:" list of accepted characters sets.
At least one doc-charset and one accept-charset must be present. If only these two are present (i.e. no NCS-conversion-function) it indicates that the two character sets are aliases (i.e. the same set of characters, different name) and no conversion is necessary.
If an NCS-conversion-function is supplied it indicates that the document doc-charset can be converted to the request "Accept-Charset:" preference of the accept-charset using the NCS conversion function name specified.
A factor parameter can be appended to the conversion function. Some conversion functions require more than one output byte to represent one input byte for some characters. The 'factor' is an integer between 1 and 4 indicating how much more buffer space may be required for the converted string. It works by allocating that many times more output buffer space than is occupied by the input buffer. If not specified it defaults to 1, or an output buffer the same size as the input buffer.
Multiple comma-separated accept-charsets may be included as the
second component for either of the above behaviours, with each being matched
individually. Wildcard "*" and "%" may be used in the
doc-charset and accept-charset strings.
[CharsetConvert]
windows-1251 windows-1251,cp-1251
windows-1251 koi8-r windows1251_to_koi8r
koi8-r koi8-r,koi8
koi8-r windows-1251,cp-1251 koi8r_to_windows1251
koi8-r utf-8 koi8r_to_utf8=2
6.9 - Error Reporting
By default the server provides it's own internal error reporting facility.
These reports may be configured as basic or detailed on
a per-path basis, as well as determining the basic "look-and-feel". For
more demanding requirements the [ErrorReportPath] configuration directive
allows a redirection path to be specified for error reporting, permitting the
site administrator to tailor both the nature and format of the information
provided. A Server Side Include document, CGI script or even standard HTML
file(s) may be specified. Generally an SSI document would be recommended for
it's simplicity yet versatility.
6.9.1 - Basic and Detailed
Internally generated error reports are the most efficient. These can be delivered with two levels of error information. The default is more detailed.
ERROR 404 - The requested resource could not be found.
Document not found ... /ht_root/index.html
(document, bookmark, or reference requires revision)
Additional information: 1xx, 2xx, 3xx, 4xx, 5xx, Help
WASD/7.0.0 Server at wasd.dsto.defence.gov.au Port 80
There is also the more basic.
ERROR 404 - The requested resource could not be found.
Additional information: 1xx, 2xx, 3xx, 4xx, 5xx, Help
WASD/7.0.0 Server at wasd.dsto.defence.gov.au Port 80
These can be set per-server using the [ReportBasicOnly] configuration directive, or on a per-path basis in the HTTPD$MAP configuration file. The basic report is intended for environments where traditionally a minimum of information might be provided to the user community, both to reduce site configuration information leakage but also where a general user population may only need or want the information that a document was either found or not found. The detailed report often provides far more specific information as to the nature of the event and so may be more appropriate to a more technical group of users. Either way it is relatively simple to provide one as the default and the other for specific audiences. Note that the detailed report also includes in page <META> information the code module and line references for reported errors.
To default to a basic report for all but selected resource paths introduce
the following to the top of the HTTPD$MAP configuration file.
# default is basic reports
set /* report=basic
set /internal-documents/* report=detailed
set /other/path/* report=detailed
To provide the converse, default to a detailed report for all but selected
paths use the following.
# default is detailed reports
set /web/* report=basic
Other Customization
The additional reference information included in the report may be disabled using the appropriate HTTPD$MSG [status] message item. Emptying this message results in an error report similar to the following.
ERROR 404 - The requested resource could not be found.
WASD/7.0.0 Server at wasd.dsto.defence.gov.au Port 80
The server signature may be disabled using the HTTPD$CONFIG [ServerSignature] configuration directive. This results in a minimal error report.
ERROR 404 - The requested resource could not be found.
A simple approach to providing a site-specific "look-and-feel" to
server reports is to customize the [ServerReportBodyTag] HTTPD$CONFIG
configuration directive. Using this directive report page background colour,
background image, text and link colours, etc., may be specified for all
reports. It is also possible to more significantly change the report format
and contents (within some constraints), without resorting to the site-specific
mechansims refered to below, by changing the contents of the appropriate
HTTPD$MSG [status] item. This should be undertaken with care.
6.9.2 - Site Specific
Customized error reports can be generated for all or selected HTTP status
status associated with errors reported by the server using the HTTPD$CONFIG
[ErrorReportPath] and HTTPD$SERVER [ServiceErrorReportPath] configuration
directives. To explicitly handle all error reports specify the path to the
error reporting mechanism (see description below) as in the following example.
[ErrorReportPath] /httpd/-/reporterror.shtml
To handle only selected error reports add the HTTP status codes following
the report path. In this example only 403 and 404 errors are explicitly
handled, the rest remain server-generated. This is particularly useful for
static error documents.
[ErrorReportPath] /httpd/-/reporterror.shtml 403 404
Site-specific error reporting works by internal redirection. When an error is reported the original request is concluded and the request reconstructed using the error report path before internally being reprocessed. For SSI and CGI script handlers error information becomes available via a specially-built query string, and from that as CGI variables in the error report context. One implication is the original request path and query string are no longer available. All error information must be obtained from the error information in the new query string.
It is suggested with any use of this facility the reporting
document(s) be located somewhere local, probably HT_ROOT:[RUNTIME.HTTPD], and then
enabled by placing the appropriate path into the [ErrorReportPath]
configuration directive.
[ErrorReportPath] /httpd/-/reporterror.shtml
Note that virtual services can subsequently have this path mapped to other
documents (or even scripts) so that some or all services may have custom error
reports. For instance the following arrangement provides each host (service)
with an customized error report.
# HTTPD$CONFIG
[ErrorReportPath] /errorreport.shtml
# HTTPD$MAP
[[alpha.wasd.dsto.gov.au]]
pass /errorreport.shtml /httpd/-/alphareport.shtml
[[beta.wasd.dsto.gov.au]]
pass /errorreport.shtml /httpd/-/betareport.shtml
[[gamma.wasd.dsto.gov.au]]
pass /errorreport.shtml /httpd/-/gammareport.shtml
Using Static HTML Documents
Static HTML documents are a good choice for site-specific error messages.
They are very low overhead and are easily customizable. One per possible
response error status code is required. When providing an error report path
including a "!UL" introduces the response status code into the file
path, providing a report path that includes a three digit number representing
the HTTP status code. A file for each possible or configured code must then be
provided, in this example for 403 (authorization failure), 404 (resource not
found) and 502 (bad gateway/script).
[ErrorReportPath] /httpd/-/reporterror!UL.html 403 404 502
This mapping will generate paths such as the following, and require the
three specified to respond to those errors.
/httpd/-/reporterror403.html
/httpd/-/reporterror404.html
/httpd/-/reporterror502.html
Using an SSI Document
SSI documents provide the versatility of dynamic report generation for but they do take time and CPU for processing, and this may be a significant consideration on busy sites.
Three example SSI error report documents are provided.
The following SSI variables are available specifically for generating error reports. The <!--#printenv --> statement near the top of the file may be uncommented to view all SSI and CGI variables available.
|
It is also possible to report using a script.
The same error information is available via corresponding CGI variables.
The source code
HT_ROOT:[SRC.MISC]REPORTERROR.C
provides such an implementation example.
6.10 - OPCOM Logging
Significant server events may be optionally displayed via a selected operator's console and recorded in the operator log. Various categories of these events may be selectively enabled via HTTPD$CONFIG directives (8 - Server Configuration Directives).
Some significant server events are always logged to OPCOM if any one of the
above categories is enabled.
6.11 - Access Logging
WASD provides a versatile access log, allowing data to be collected in Web-standard common and combined formats, as well as allowing customization of the log record format. It is also possible to specify a log period. If this is done log files are automatically changed according to the period specified.
Where multiple access log files are generated with per-instance, per-period and/or per-service logging (see below) these can be merged into single files for administrative or archival purposes using the CALOGS utility (23.6 - CALogs).
The Quick-and-Dirty LOG STATisticS utility (23.10 - QDLogStats) can be used to provide elementary ad hoc log analysis from the command-line or CGI interface.
Exclude requests from specified hosts using the [LogExcludeHosts]
configuration parameter.
6.11.1 - Log Format
The configuration parameter [LogFormat] and the server qualifier /FORMAT specifies one of three pre-defined formats, or a user-definable format. Most log analysis tools can process the three pre-defined formats. There is a small performance impost when using the user-defined format, as the log entry must be specially formatted for each request.
The user-defined format allows customised log formats to be specified using a selection of commonly required data. The specification must begin with a character that is used as a substitute when a particular field is empty (use "\0" for no substitute, as in the "windows log format" example below).
Two different "escape" characters introduce the following parameters:
|
|
Any other character is directly inserted into the log entry.
"PA" and "RQ"
The "PA" and "RQ" have distinct roles. In general the "RQ" (request) directive will always be used as this is the full request string; script component (if any), path string and query string component (if any). The "PA" directive is merely the path string after any script and query string components have been removed.
-!CN - !AU [!TC] \q!RQ\q !RS !BY
-!CN - !AU [!TC] \q!RQ\q !RS !BY \q!RF\q \q!UA\q
\0!TC\t!CA\t!SN\t!AR\t!AU\t!ME\t!PA\t!RQ\t!EM\t!UA\t!RS\t!BB\t
-!CN - !AU [!TC] \q!RQ\q !RS !BY !ES
The access log file may have a period specified against it, producing an automatic generation of log file based on that period. This allows logs to be systematically named, ordered and kept to a managable size. The period specified can be one of
The log file changes on the first request after midnight of the new period. When using a weekly period the new log file comes into effect on the first request following midnight on the specified day.
When using a periodic log file, the file name specified by HTTPD$LOG or the configuration parameter [LogFile] is partially ignored, only partially because the directory component of it is used to located the generated file name. The periodic log file name generated comprises
HT_LOGS:WASD_80_19971013_ACCESS.LOG
For the daily period the date represents the request date. For the weekly
period it is the date of the previous (or current) day specified. That is, if
the request occurs on the Wednesday for a weekly period specified by Monday
the log date show the last Monday's. For the monthly period it uses the first.
6.11.3 - Log Per-Service
By default a single access log file is created for each HTTP server
process. Using the [LogPerService] configuration directive a log file for each
service provided by the HTTPd is generated (6.3 - Virtual Services).
The [LogNaming] format can be any of "NAME" (default) which names the log file
using the first period-delimited component of the IP host name, "HOST" which
uses as much of the IP host name as can be accomodated within the maximum 39
character filename limitation (of ODS-2), or "ADDRESS" which uses the full IP
host address in the name. Both HOST and ADDRESS have hyphens substituted for
periods in the string. If these are specified then by default the service port
follows the host name component. This may be suppressed using the
[LogPerServiceHostOnly] directive, allowing a minimum extra 3 characters in the
name, and combining entries for all ports associated with the host name
(for example, a standard HTTP service on port 80 and an SSL service on port 443
would have entries in the one file).
6.11.4 - Log Per-Instance
To reduce physical disk activity, and thereby significantly improve performance, the RMS characteristics of the logging stream are set to buffer records for as long as possible and only write to disk when buffer space is exhausted (a periodic flush ensures records from times of low activity are written to disk). However when multiple server processes (either in the case of multiple instances on a single node, single instance on each of multiple clustered nodes, or a combination of the two) have the same log files open for write then this buffering and defered write-to-disk is disabled by RMS, it insisting that all records must be flushed to disk for correct serialization and coherency.
This introduces measuraable latency and a potentially significant bottleneck to high-demand processing. Note that it only becomes a real issue under load. Sites with a low load should not experience any impact.
Sites that may be affected by this issue can revert to the original
buffered log stream by enabling the [LogPerInstance] configuration directive.
This ensures that each log stream has only one writer by creating a unique log
file for each instance process executing on the node and/or cluster. It does
this by appending the node and process name to the file type. This would
change the log name from something like
HT_LOGS:131-185-250-202_80_ACCESS.LOG
to, in the case of a two-instance single node,
HT_LOGS:131-185-250-202_80_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:131-185-250-202_80_ACCESS.LOG_KLAATU_HTTPE-80
Of course the number-of and naming-of log files is beginning to
become a little itimidating at this stage! To assist with managing
this seeming plethora of access log files is the calogs utiltiy
(23.6 - CALogs), which allows multiple log files to be merged whilst
keeping the records in timestamp order.
6.11.5 - Log Naming
When per-period or per-service logging is enabled the access log file has a specific name generated. Part of this name is the host's name or IP address. By default the host name is used, however if the host IP address is specified the literal address is used, hyphens being substituted for the periods. Accepted values for the [LogNaming] configuration directive are:
Examples of generated per-service (non-per-period) log names:
HT_LOGS:131-185-250-202_80_ACCESS.LOG
HT_LOGS:WASD-DSTO-DEFENCE-GOV-AU_80_ACCESS.LOG
HT_LOGS:WASD_80_ACCESS.LOG
Examples of generated per-period (with/without per-service) log names:
HT_LOGS:131-185-250-202_80_19971013_ACCESS.LOG
HT_LOGS:WASD-DSTO-DEFENCE-GO_80_19971013_ACCESS.LOG
HT_LOGS:WASD_80_19971013_ACCESS.LOG
Examples of generated per-instance (per-service and per-period) log names:
HT_LOGS:131-185-250-202_80_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:WASD-DSTO-DEFENCE-GOV-AU_80_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:WASD_80_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:131-185-250-202_80_19971013_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:WASD-DSTO-DEFENCE-GO_80_19971013_ACCESS.LOG_KLAATU_HTTPD-80
HT_LOGS:WASD_80_19971013_ACCESS.LOG_KLAATU_HTTPD-80
6.11.6 - Access Tracking
The term access tracking describes the ability to follow a single user's accesses through a particular site or group of related sites. This is accomplished by setting a unique cookie in a user's browser. This cookie is then sent with all requests to that site. The site detects the cookie's unique identifier, or token, and includes it the access log, allowing the user's route through the site or sites to be reviewed. Note that a browser must have cookies enabled for this mechanism to operate.
WASD access tracking is controlled using the [Track...] directives. The tracking cookie uses an opaque, nineteen character string as the token (e.g. "ORoKJAOef8sAAAkuACc"). This token is spatially and temporally completely unique, generated the first time a user's browser accesses the site. This token is by default added to the server access log in the common format "remote-ID" location. It can also be placed into custom logs. From this identifier in the logs a session's progress may be easily tracked. Note that the token contains nothing related to the user's actual identity! It is merely a unique identifier that tags a single browser's access trail through a site.
The [Track] directive enables access tracking on a per-server basis. By default all non-proxy services will then have tracking enabled. Individual services may be then be disabled (or enabled in the case of proxy services) using the per-service ";notrack" and ";track" parameters.
By default a session track token expires when the user closes the browser. To encourage the browser to keep this token between uses enable multi-session tracking using the [TrackMultiSession] directive. Note that browsers may dispose of any cookie at any time resources become scarce, and that users can also remove them.
Session tracking can be extended from the default of the local server
(virtual if applicable) to a group of servers within a local domain. This
means the same, initial identifier appears in the logs of all WASD servers in a
related group of hosts. Of course tracking must be enabled on all servers.
The host grouping is specified using the [TrackDomain] directive (this follows
the general rules governing cookie domain behaviour - see RFC2109). Most host
grouping require a minimum of three dots in the specification.
For example (note the leading dot)
.site.org.domain
which would match the following servers, "curly.site.org.domain",
"larry.site.org.domain", "moe.site.org.domain", etc. Sites in
top-level domains (e.g. "edu", "com", "org") need only
specify a minimum of two periods.
6.11.7 - Access Alert
It is possible to mark a path as being of specific interest. When this is accessed by a request the server puts a message into the the server process log and perhaps of greater immediate utility the increase in alert hits is detected by HTTPDMON and this (optionally) provides an audible alert allowing immediate attention. This is enabled on a per-path basis using the SET mapping rule. Variations on the basic rule allow some control over when the alert is generated.
The special case ALERT=integer allows a path to be alerted if
the final response HTTP status is the same as the integer specified (e.g. 501,
404) or within the category specified (599, 499).
6.12 - Include File Directive
WASD uses multiple configuration files for a server and it's site, each one providing for a different functional aspect ... configuration, virtual services, path mapping, authorization, etc. Generally these configuration files are "flat", with all required directives included in a single file. This provides a simple and straight-forward approach suitable for most sites and allows for the provision of Server Administration page online configuration of several aspects.
It is also possible to build site configurations by including the contents of referenced files. This may provide a structure and flexibility not possible using the flat-file approach. All WASD configuration files allow the use of an [IncludeFile] directive. This takes a VMS file specification parameter. The file's contents are then loaded and processed as if part of the parent configuration file. These included files are allowed to be nested to a depth of two (i.e. the configuration file can include a file which may then include another file).
The following is an example used to build up the mapping rules for four
virtual services supported on the one server.
# HTTPD$MAP
[[alpha.site.com]]
[IncludeFile] HT_ROOT:[LOCAL]MAP_ALPHA_80.CONF
[[alpha.site.com:443]]
[IncludeFile] HT_ROOT:[LOCAL]MAP_ALPHA_443.CONF
[[beta.site.com]]
[IncludeFile] HT_ROOT:[LOCAL]MAP_BETA_80.CONF
[[beta.site.com:443]]
[IncludeFile] HT_ROOT:[LOCAL]MAP_BETA_443.CONF
[[*]]
[IncludeFile] HT_ROOT:[LOCAL]MAP_COMMON.CONF
NOTE
Such configurations cannot be managed using Server Administration page interfaces. Files containing [IncludeFile] directives are noted during server startup and if an Server Administration page configuration interface is accessed where this would be a problem an explanatory message and warning is provided. A configuration can still be saved but the resulting configuration will be a flat-file representation of the server configuration, not the original hierarchical one.