[next] [previous] [contents] [full-page]16.1 - HTTP Proxy Serving
16.1.1 - Enabling A Proxy Service
16.1.2 - Proxy Bind
16.1.3 - Proxy Chaining
16.1.4 - Controlling Proxy Serving
16.2 - Caching
16.2.1 - Cache Device
16.2.2 - Enabling Caching
16.2.3 - Cache Management
16.2.4 - Cache Invalidation
16.2.5 - Cache Retention
16.2.6 - Reporting and Maintenance
16.2.7 - PCACHE Utility
16.3 - CONNECT Serving
16.3.1 - Enabling CONNECT Serving
16.3.2 - Controlling CONNECT Serving
16.4 - FTP Proxy Serving
16.4.1 - FTP Query String Keywords
16.4.2 - "login" Keyword
16.5 - Gatewaying Using Proxy
16.5.1 - Reverse Proxy
16.5.2 - One-Shot Proxy
16.5.3 - DNS Wildcard Proxy
16.5.4 - Originating SSL
16.6 - Browser Proxy Configuration
A proxy server acts as an intermediary between Web clients and Web servers. It listens for requests from the clients and forwards these to remote servers. The proxy server then receives the responses from the servers and returns them to the clients. Why go to this trouble? There are several reasons, the most common being:
No additional software needs to be installed to provide proxy serving. The following steps provide a brief outline of proxy configuration.
When proxy processing is enabled and HTTPD$CONFIG directive
[ReportBasicOnly] is disabled it is necessary to make adjustments to the
contents of the HTTPD$MSG message configuration file [status] item beginning
"Additional Information". Each of the
"/httpd/-/statusnxx.html" links
<A HREF="/httpd/-/status1xx.html">1<I>xx</I></A>
<A HREF="/httpd/-/status2xx.html">2<I>xx</I></A>
<A HREF="/httpd/-/status3xx.html">3<I>xx</I></A>
<A HREF="/httpd/-/status4xx.html">4<I>xx</I></A>
<A HREF="/httpd/-/status5xx.html">5<I>xx</I></A>
<A HREF="/httpd/-/statushelp.html">Help</A>
should be changed to include a local host component
<A HREF="http://local.host.name/httpd/-/status1xx.html">1<I>xx</I></A>
<A HREF="http://local.host.name/httpd/-/status2xx.html">2<I>xx</I></A>
<A HREF="http://local.host.name/httpd/-/status3xx.html">3<I>xx</I></A>
<A HREF="http://local.host.name/httpd/-/status4xx.html">4<I>xx</I></A>
<A HREF="http://local.host.name/httpd/-/status5xx.html">5<I>xx</I></A>
<A HREF="http://local.host.name/httpd/-/statushelp.html">Help</A>
If this is not provided the links and any error report will be interpreted
by the browser as relative to the server the proxy was attempting to request
from and the error explanation will not be accessable.
16.1 - HTTP Proxy Serving
WASD provides a proxy service for the HTTP scheme (prototcol).
Proxy serving generally relies on DNS resolution of the requested host name. DNS lookup can introduce significant latency to transactions. To help ameliorate this WASD incorporates a host name cache. To ensure cache consistency the contents are regularly flushed, after which host names must use DNS lookup again, refreshing the information in the cache. The period of this cache purge is contolled with the [ProxyHostCachePurgeHours] configuration parameter.
When a request is made by a proxy server is is common for it to add a line
to the request header stating that it is a forwarded request and the agent
doing the forwarding. With WASD proxying this line would look something like
this:
Forwarded: by http://host.name.domain (HTTPd-WASD/8.4.0 OpenVMS/IA64 SSL)
It is enabled using the [ProxyForwarded] configuration parameter.
An additional, and perhaps more widely used facility, is the Squid
extension field to the proxied request header supplying the originating
client host name or IP address.
X-Forwarded-For: client.host.name
It is enabled using the [ProxyXForwardedFor] configuration parameter.
16.1.1 - Enabling A Proxy Service
Proxy serving is enabled on a per-server basis using the [ProxyServing] configuration parameter.
WASD can configure services using the HTTPD$CONFIG [service] directive, the
HTTPD$SERVICE configuration file, or even the /SERVICE= qualifier.
HTTPD$CONFIG [Service]
The actual services providing the proxy serving (i.e. the host and port)
are specified on a per-service basis. This means it is possible to have proxy
and non-proxy services deployed on the one server (on different ports of
course). Proxying is enabled by appending the proxy keyword to the
particular service specification. The following example shows a non-proxy and
proxy service.
[Service]
http://alpha.wasd.dsto.defence.gov.au:80
http://alpha.wasd.dsto.defence.gov.au:8080;proxy
HTTPD$SERVICE
Proxy service configuration using the HTTPD$SERVICE configuration is
slightly simpler, with a specific configuration directive for each aspect.
(9 - Service Configuration). This example illustrates
configuring the same services as used in the previous section.
[[http://alpha.wasd.dsto.defence.gov.au:80]]
[[http://alpha.wasd.dsto.defence.gov.au:8080]]
[ServiceProxy] enabled
Examples in following section all show configuration using the HTTPD$CONFIG
[Service] directive. When using the HTTPD$SERVICE configuration file Server
Administration facility interface all relevant proxy directives are provided
for selection.
16.1.2 - Proxy Bind
Using the HTTPD$MAP SET proxy=bind=<IP-address> rule
it becomes possible to make the outgoing request appear to originate from a
particular source. The Network Interface must be able to bind to the specified
IP address (i.e. it cannot be an arbitrary address).
SET http://*.fred.com proxy=bind=131.185.250.1
16.1.3 - Proxy Chaining
Some sites may already be firewalled and have corporate proxy servers
providing Internet access. It is quite possible to use WASD proxying in this
environment, where the WASD server makes it's proxied requests via the next
proxy server in the hierarchy. This is known as proxy chaining.
Using the chain keyword specify the host name of the next server
when enabling the proxy service, as in this example:
[Service]
http://alpha.wasd.dsto.defence.gov.au:8080;proxy;chain=next.proxy.host
Chaining may also be controlled on a virtual service or path basis using
the HTTPD$MAP SET proxy=chain=<host:port> rule.
SET http://*.com proxy=chain=next.proxy.host:8080
16.1.4 - Controlling Proxy Serving
Controlling both access-to and access-via proxy serving is possible.
Proxy Password
Access to the proxy service can be directly controlled through the use of WASD authorization. Proxy authorization is distinct from general access authorization. It uses specific proxy authorization fields provided by HTTP, and by this allows a proxied transaction to also supply transaction authorization for the remote server.
The following example shows a service specification using the
"pauth" parameter making the proxy service require authorization for
use.
[Service]
http://alpha.wasd.dsto.defence.gov.au:8080;proxy;pauth
In addition to the service being specified as requiring authorization it is
also necessary to configure the source of the authentication. This is done
using the HTTPD$AUTH configuration file. The following example shows all
requests for the proxy virtual service must be authorized (GET and well as
POST, etc.), although it is possible to restrict access to only read (GET),
preventing data being sent out via the server.
[[alpha.wasd.dsto.defence.gov.au:8080]]
["Proxy Access"=PROXY_ACCESS=id]
http://* read+write
Local Password
It is also possible to control proxy access via local authorization,
although this is less flexible by removing the ability to then pass
authorization information to the remote service. In other repects it is set up
in the same way as proxy authorization, only using the "lauth"
parameter.
Access Filtering
Extensive control of how, by whom and what a proxy service is used for may be exercised using WASD general and conditional mapping (13 - Mapping Rules and 13.7 - Conditional Mapping) possibly in the context of a virtual service specification for the particular connect service host and port (13.6 - Virtual Servers). The following examples provide a small indication of how mapping could be used in a proxy service context.
[[alpha.wasd.dsto.defence.gov.au:8080]] pass http://*hacker*/* "403 Proxy access to this host is forbidden." pass http://*
[[alpha.wasd.dsto.defence.gov.au:8080]] pass http://*.org/* pass http://*.digital.com/* pass http://* "403 Proxy access to this host is forbidden."
[[alpha.wasd.dsto.defence.gov.au:8080]] pass http://* "403 Restricted access." ![ho:131.185.250.* ho:131.185.200.10] pass http://*
[[alpha.wasd.dsto.defence.gov.au:8080]] pass http://subscribe.sexy.com/* "403 POSTing not allowed." [me:POST] pass http://*
[[alpha.wasd.dsto.defence.gov.au:8080]] redirect http://www.sexy.com/* http://www.disney.com/ pass http://*
[[alpha.wasd.dsto.defence.gov.au:8080]] pass http://* pass /* "403 This is a proxy-only service."
[[main.corporate.server.com:80]] pass /sales/* http://sales.corporate.server.com/* pass /shipping/* http://shipping.corporate.server.com/* pass /support/* http://support.corporate.server.com/* pass * "403 Nothing to access here!"
NOTE
To expedite proxy mapping is it recommended to have a final rule for the proxy virtual service that explicitly passes the request. This would most commonly be a permissive pass as in example 1, could quite easily be an restrictive pass as in example 2, or a combination as in example 6.
Caching involves using the local file-system for storage of responses that can be reused when a request for the same URL is made. The WASD server does not have to be configured for caching, it will provide proxied access without any caching taking place.
When a proxied request is processed, and it's characteristics would allow the response to be cached, a unique identifier generated from the URL is used to create a corresponding file name. The response header and any body are stored in this file. This may be the data of an HTML page, a graphic, etc.
When a proxied request is being processed, and it's characteristics
would allow the request to be cached, the unique identifier generated allows
for a previously created cache file to be checked for. If it exists, and is
current enough, the response is returned from it, instead of from the remote
server. If it exists and is no longer current the request is re-made to the
remote server, and the response if still cacheable is re-cached, keeping the
contents current. If it does not exist the response is delivered from the
remote server.
Not all responses can be cached!
The main critera are for the response to be successful (200 status), general (i.e. one not in response to a specialized query or action), and not too volatile (i.e. the same page may be expected to be returned more than once, preferably over an extended period).
Proxied requests can only be cached if ...
Proxied responses will only be cached if ...
The [ProxyCacheFileKbytesMax] configuration parameter controls the maximum
size of a response before it will not be cached. This can be determined from
any "Content-Length:" response header field, in which case it will
proactively not be cached, or if during cache load the maximum size of the file
increases beyond the specified limit the load is aborted.
Not all sites may benefit from cache!
As many transactions on today's Web contain query strings, etc., and therefore cannot be meaningfully cached, it should not be assumed the cost/benefit of having a proxy cache enabled is a forgone conclusion. Each site should monitor the proxy traffic reports and decide on a local policy.
The facilities described in 16.2.6 - Reporting and Maintenance allow a reasonably informed decision to be made. Items to be considered.
Last, but by no means least, understanding the characteristics of local
usage. For example, are there a small number of requests generating lots of
non-cacheable traffic? For instance, a few users accessing streaming content.
16.2.1 - Cache Device
Selection of a disk device for supporting the proxy cache should not be made without careful consideration, doubly so if significant traffic is experienced. Here are some common-sense suggestions.
Initially the directory will need to be created. This can be done manually
as described below, or if using the supplied server startup procedures
(STARTUP.COM) it is checked for and if it does not exist
is automatically created during startup. The directory must be owned by the
HTTP$SERVER account and have full read+write+execute+delete access. It is
suggested to name it [HT_CACHE] and may be created manually using the following
command.
$ CREATE /DIR /OWN=HTTP$SERVER /PROT=(O:RWED,G,W) device:[HT_CACHE]
It is a relatively simple matter to relocate the cache at any stage.
Simply create the required directory in the new location, modify the startup
procedures to reflect this, shut the server down completely then restart it
using the procedures (not a /DO=RESTART!). The contents of
the previous location could be transfered to the new using the BACKUP utility
if desired.
HT_CACHE_ROOT Logical
It is required to define the logical name HT_CACHE_ROOT if any proxy
services are specified in the server configuration. The server will not start
unless it is correctly defined. The logical should be a
concealed device logical specifying the top level directory of the
cache tree. The following example shows how to define such a logical name.
$ DEFINE /SYSTEM /EXEC /TRANSLATION=CONCEALED HT_CACHE_ROOT device:[HT_CACHE.]
If example startup procedure is in use then it is quite
straight-forward to have the logical created during server startup
(STARTUP.COM).
16.2.2 - Enabling Caching
Caching may enabled on a per-service basis. This means it is possible to
have a caching proxy service and a non-caching service active on the one
server. Caching is enabled by appending the cache keyword to the
particular service specification. The following example shows a non-proxy and
a caching proxy service.
[Service]
http://alpha.wasd.dsto.defence.gov.au:80
http://alpha.wasd.dsto.defence.gov.au:8080;proxy;cache
Proxy caching may be selectively disabled for a particular site,
sites or paths within sites using the set nocache mapping rule.
This rule, used to disable caching for local requests, also disables proxy file
caching for that subset of requests. This example shows a couple of
variations.
[[alpha.wasd.dsto.defence.gov.au:8080]]
# disable caching for local site's servers that respond fairly quickly
set http://*.local.domain/* nocache
# disable caching of log files
set http://*.log nocache
pass http://*
NOTE
It is also recommended to place the cache directory under some authorization control to prevent casual browsing and access of the cache contents. Something local, similar in intention to[VMS] /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;
As the proxy cache is implemented using the local file system, management of the cache implies controlling the number of, and exactly which files remain in cache. Essentially then, management means when and which to delete. The [ProxyReportLog] configuration parameter enables the server process log reporting of cache management activities.
Cache file deletion has three variants.
This ensures files that have not been accessed within specified limits are periodically and regularly deleted. The [ProxyCacheRoutineHourOfDay] configuration parameter controls this activity.
The ROUTINE form occurs once per day at the specified hour. The cache files are scanned looking for those that exceed the configuration parameter for maximum period since last access, which are then deleted (the largest number of [ProxyCachePurgeList], as described below).
Setting the [ProxyCacheRoutineHourOfDay] configuration parameter to 24 enables background purging.
In this mode the server continuously scans through the cache files in the same manner as for ROUTINE purging. The difference is it is not all done a single burst once a day, pushing disk activity to it's maximum. The background purge regulates the period between each file access, pacing the scan so that the entire cache is passed through once a day. It adjusts this pace according the the size of the cache.
This is a remedial action, when cache device usage is reaching it's configuration limit and files need to be deleted to free up space. The following parameters control this behaviour.
The cache device space usage is checked at the specified interval.
If the device reaches the specified percentage used a cache purge is initiated and by deleting files until the specified reduction is attained, the total space in use on the disk is reduced.
The cache files are scanned using the [ProxyCachePurgeList] parameter described below, working from the greatest to least number of hours in the steps provided. At each scan files not accessed within that period are deleted. At each few files deleted the device free space is checked as having reached the lower purge percentage limit, at which point the scan terminates.
This parameter has as it's input a series of comma-separated integers
representing a series of hours since files were last accessed. In this
way the cache can be progressively reduced until percentage usage targets are
realized. Such a parameter would be specified as follows,
[ProxyCachePurgeList] 168,48,24,8,0
meaning the purge would first delete files not accessed in the last week,
then not for the last two days, then the last twenty-four hours, then eight,
then finally all files. The largest of the specified periods (in this case
168) is also used as the limit for the ROUTINE scan and file delete.
Once the target reduction percentage is reached the purge stops. During the purge operation further cache files are not created. Even when cache files cannot be created for any reason proxy serving still continues transparently to the clients.
NOTE
Cache files can be manually deleted at any time (from the command line) without disturbing the proxy-caching server and without rebuilding any databases. When deleting, the /BEFORE=date/time qualifier can be used, with /CREATED being the document's last-modified date, /REVISED being the last time it was loaded, and /EXPIRED the last time the file was accessed (used to supply a request). Be aware that on an active server it is quite possible some files may be locked at time of attempted deletion.
If [ProxyCacheRoutineHourOfDay] is empty or non-numeric the automatic, once-a-day routine purge of the cache by the server is disabled and it is expected to be performed via some other mechanism, such as a periodic batch job. This allows routine purging more or less frequently than is provided-for by server configuration, and/or the purge activity being performed by a process or cluster node other than that of the HTTPd server (reducing server and/or node impact of this highly I/O intensive activity). Progress and other messages are provided via SYS$OUTPUT, and if configured in the [Opcom...] directives to the operator log and designated operator terminal as well. If a process already has the cache locked the initiated activity aborts.
The following example shows a routine purge being performed from the
command-line. This form uses the hours from [ProxyCachePurgeList].
$ HTTPD /PROXY=PURGE=ROUTINE
A variant on this allows the maximum age to be explicitly specified.
$ HTTPD /PROXY=PURGE=ROUTINE=168
Reactive purging and statistic scans may also be initiated from the command
line. For a reactive purge the first number can be the device usage
percentage (indicated by the trailing "%"), if not the configuration
limit is used.
$ HTTPD /PROXY=PURGE=REACTIVE=80%,168,48,24,8,0
$ HTTPD /PROXY=CACHE=STATISTICS
Any in-progress scan of the cache (i.e. reactive or routine purges, or a
statistics scan) can be halted from the command line (and online Server
Admininistration facility).
$ HTTPD /PROXY=STOP=SCAN
16.2.4 - Cache Invalidation
For the purposes of this document, cache invalidation is defined as the determination when a cache file's data is no longer valid and needs to be reloaded.
The method used for cache validation is deliberately quite simple in algorithm and implementation. In this first attempt at a proxy server the overriding criteria have been efficiency, simplicity of implementation, and reliability. Wishing to avoid complicated revalidation using behind-the-scenes HEAD requests the basic approach has been to just invalidate the cache item upon exiry of a period related to it's "Last-Modified:" age or upon a no-cache request, both described further below.
The revision count (automatically updated by VMS) tracks the
absolute number of accesses since the file was created (actually a maximum of
65535, or an unsigned short, but that should be enough for informational
purposes).
16.2.5 - Cache Retention
The [ProxyCaheReloadList] configuration parameter is used to control when a file being accessed is reloaded from source.
This parameter supplies a series of integers representing the hours after
which an access to a cache file causes the file to be invalidated and reloaded
from it's source during the proxied request. Each number in
the series represents the lower boundary of the range between it and the next
number of hours. A file with a last-loaded age falling within a range is
reloaded at the lower boundary of that particular range. The following example
[ProxyCacheReloadList] 1,2,4,8,12,24,48,96,168
would result in a file 1.5 hours old being reloaded every hour, 3.25 hours
old every 2 hours, 7 hours old every 4 hours, etc. Here "old" means since
last (or of course first) loaded. Files not reloaded since the final integer,
in this example 168 (one week), are always reloaded.
16.2.6 - Reporting and Maintenance
The HTTPDMON utility allows real-time monitoring of proxy serving activity (23.8 - HTTPd Monitor).
Proxy reports and some administrative control may be exercised from the online Server Administration facility (18 - Server Administration). The information reported includes:
The following actions can be initiated from this menu. Note that three of these relate to proxy file cache and so may take varying periods to complete, depending on the number of files. If the cache is particularly large the scan/purge may take some considerable time.
Also available from the Server Administration facility is a dialog allowing the proxy characteristics of the running server to be adjusted on an ad hoc basis. This only affects the executing server, to make changes to permanent configuration the HTTPD$CONFIG configuration file must be changed.
This dialog can be used to modify the device free space percentages
according to recent changes in device usage, alter the reload or purge hour
list characteristics, etc. After making these changes a routine or reactive
purge will automatically be initiated to reduce the space in use by the proxy
cache if implied by the new settings.
16.2.7 - PCACHE Utility
It is often useful to be able to list the contents of the proxy cache
directory or the characteristics or contents of a particular cache file.
Cache files have a specific internal format and so require a tool capable of
dealing with this. The
HT_ROOT:[SRC.UTILS]PCACHE.C
program provides a versatile command-line utility as well as CGI(plus) script,
making cache file information accessable from a browser. It also allows cache
files to be selected by wildcard filtering on the basis of the contents of the
associated URL or response header. For detailed information on the various
command-line options and CGI query-string options see the description at the
start of the source code file.
Command-Line Use
Make the HT_EXE:PCACHE.EXE executable a foreign verb. It is then possible to
To make the PCACHE script available to the server ensure the following line
exists in the HTTP$CONFIG configuration file in the [AddType] section.
.HTC application/x-script /cgiplus-bin/pcache WASD proxy cache file
The following rule needs to be in the HTTPD$MAP configuration file.
pass /ht_cache_root/*
NOTE
It is also recommended to place the utility and the cache directory under some authorization control to prevent casual browsing and access of the cache contents. Something local, similar in intention to[VMS] /pcache/* ~webadmin,131.185.250.*,r+w ; /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;
Once available the following is then possible.
If the configuration changes described above have been made the following link will return such an index.
/ht_cache_root/
PCACHE
NOTE
Cache directory trees have the potential to become heavily populated, so the use of the script to generate listings of the cache contents could return extremely large listing documents.
The connect service provides firewall proxying for any connection-oriented TCP/IP access. Essentially it provides the ability to tunnel any other protocol via a Web proxy server. In the context of Web services it is most commonly used to provide firewall-transparent access for Secure Sockets Layer (SSL) transactions.
The WASD CONNECT service implements the de facto standard HTTP
CONNECT method, described in a number of Internet Drafts.
16.3.1 - Enabling CONNECT Serving
As with proxy serving in general, CONNECT serving may enabled on a
per-service basis using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE
configuration file, or even the /SERVICE= qualifier.
HTTPD$CONFIG [Service]
The actual services providing the CONNECT access (i.e. the host and port)
are specified on a per-service basis. This means it is possible to have
CONNECT and non-CONNECT services deployed on the one server, as part of a
general proxy service or standalone. CONNECT proxying is enabled by appending
the connect keyword to the particular service specification. The
following example shows a non-proxy and proxy services, with and without
additional connect processing enabled.
[Service]
http://alpha.wasd.dsto.defence.gov.au:80
http://alpha.wasd.dsto.defence.gov.au:8080;proxy
http://alpha.wasd.dsto.defence.gov.au:8081;connect
http://alpha.wasd.dsto.defence.gov.au:8082;proxy;connect
HTTPD$SERVICE
Proxy service configuration using the HTTPD$SERVICE configuration is
slightly simpler, with a specific configuration directive for each aspect
(9 - Service Configuration). This example illustrates
configuring the same services as used in the previous section.
[[http://alpha.wasd.dsto.defence.gov.au:80]]
[[http://alpha.wasd.dsto.defence.gov.au:8080]]
[ServiceProxy] enabled
[[http://alpha.wasd.dsto.defence.gov.au:8081]]
[ServiceProxySSL] enabled
[[http://alpha.wasd.dsto.defence.gov.au:8082]]
[ServiceProxy] enabled
[ServiceProxySSL] enabled
16.3.2 - Controlling CONNECT Serving
The connect service poses a significant security dilemma when in use in a firewalled environment. Once a CONNECT service connection has been accepted and established it essentially acts as a relay to whatever data is passed through it. Therefore any transaction whatsoever can occur via the connect service, which in many environments may be considered undesirable.
In the context of the Web and the use of the connect service for
proxying SSL transactions it may be well considered to restrict possible
connections to the well-known SSL port, 443. This may be done using
conditional mapping rules, as in the following example:
[[alpha.wasd.dsto.defence.gov.au:8080]]
pass *:443 [me:connect]
pass * "403 CONNECT only allowed to port 443." [me:connect]
All of the comments on the use of general and conditional mapping made in
16.1.4 - Controlling Proxy Serving can also be applied to the connect service.
16.4 - FTP Proxy Serving
WASD provides a proxy service for the FTP scheme (prototcol). This provides the facility to list directories on the remote FTP server, download and upload files.
The (probable) file system of the FTP server host is determined by examining the results of an FTP PWD command. If it returns a current working directory specification containing a "/" then it's assumes it to be Unix(-like), if ":[" then VMS, if a "\" then DOS. (Some DOS-based FTP servers respond with a Unix-like "/" so a second level of file-system determination is undertaken with the first entry of the actual listing.) Anything else is unknown and reported as such.
Note that the content-type of the transfer is determined by the way the proxy server interprets the FTP request path's "file" extension. This may or may not correspond with what the remote system might consider the file type to be. The default content-type for unknown file types is "application/octet-stream" (binary). When using the alt query string parameters then for any file in a listing the icon provides an alternate content-type. If the file link provides a text document then the icon will provide a binary file. If the link returns a binary file then the icon will return a file with a plain-text content-type.
In addition to content-type the FTP mode in which the file transfer occurs can be determined by either of two conditions. It the content-type is "text/.." then the transfer mode will be ASCII (i.e. record carriage-control adjusted between systems). If not text then the file is transfered in Image mode (i.e. a binary, opaque octet-stream). For any given content-type this default behaviour may be adjusted using the [AddType] directive (8.2 - Alphabetic Listing), or the "#!+" MIME.TYPES directive (6.6.2 - MIME.TYPES).
Rules required in HTTPD$MAP for mapping FTP proxy. This is preferably made
against the virtual service providing the FTP proxy. The service explicitly
must make the icon path used available or it must be available to the proxy
service in some other part of the mappings. Also the general requirement for
error message URLs applies to FTP proxying
(Error Messages).
[[proxy.host.name:8080]
pass http://* http://*
pass ftp://* ftp://*
pass /*/-/* /ht_root/runtime/*/*
16.4.1 - FTP Query String Keywords
Keywords added to an FTP request query string allow the basic FTP action to be somewhat tailored. These case-insensitive keywords can be in the form of a query keys or query form fields and values. This allows considerable flexibility in how they are supplied, allowing easy use from a browser URL field or for inclusion as form fields.
|
The usual mechanism for supplying the username and password for access to a non-anonymous proxied FTP server area is to place it as part of the request line (i.e. "ftp://username:password@the.host.name/path/"). This has the obvious disadvantage that it's there for all and sundry to see.
The "login" query string is provided to work around the more obvious
of these issues, having the authentication credentials as part of the request
URL. When this string is placed in the request query string the FTP proxy
requests the browser to prompt for authentication (i.e. returns a 401 status).
When request header authentication data is present it uses this as the remote
FTP server username and password. Hence the remote username and password never
need to appear in plain-text on screen or in server logs.
16.5 - Gatewaying Using Proxy
WASD is fully capable of mapping non-proxy into proxy requests, with various limitations on effectiveness considering the nature of what is being performed.
Gatewaying between request schemes (protocols)
and also gatewaying between IP versions
All can be useful for various reasons. One example might be where a script
is required to obtain a resource from a secure server via SSL. The script can
either be made SSL-aware, sometimes a not insignificant undertaking, or it can
use standard HTTP to the proxy and have that access the required server via
SSL. Another example might be accessing an internal HTTP resource from an
external browser securely, with SSL being used from the browser to the proxy
server, which the accesses the internal HTTP resource on it's behalf.
Request Redirect
The basic mechanism allowing this gatewaying is "internal"
redirection. The redirect mapping rule
(13.4.2 - REDIRECT Rule) either returns the new URL to the
originating client (requiring it to reinitiate the request) or begins
reprocessing the request internally (transparently to the client). It is this
latter function that is obviously used for gatewaying.
16.5.1 - Reverse Proxy
The use of WASD proxy serving as a firewall component assumes two configured network interfaces on the system, one of which is connected to the internal network, the other to the external network. (Firewalling could also be accomplished using a single network interface with router blocking external access to all but the server system.) Outgoing (internal to external) proxying is the most common configuration, however a proxy server can also be used to provide controlled external access to selected internal resources. This is sometimes known as reverse proxy and is a specific example of WASD's general non-proxy to proxy request redirection capability (16.5 - Gatewaying Using Proxy).
In this configuration the proxy server is contacted by an external browser
with a standard HTTP request. Proxy server rules map this request onto a
proxy-request format result. For example:
redirect /sales/* /http://sales.server.com/*?
Note that the trailing question-mark is required to propagate any query string (13.4.2 - REDIRECT Rule).
The server recognises the result format and performs a proxy request to a
system on the internal network. Note that the mappings required could become
quite complex, but it is possible. See example 7 in
16.1.4 - Controlling Proxy Serving.
Redirection Location Field
If a reverse proxied server returns a redirection response (302)
containing a "Location: url" field with the host component
the same reverse-proxied-to server it can be rewritten to instead contain the
proxy server host. If these do not match the rewrite does not occur. Using
the redirection example above, the SET mapping rule
proxy=reverse=location specifies the path that will be prefixed to
the path component in the location field URL. Usually this would be the same
path used to map the reverse proxy redirect (in this example
"/sales/"), though could be any string (presumably detected and
processed by some other part of the mapping).
set /sales/* proxy=reverse=location=/sales/
redirect /sales/* /http://sales.server.com/*?
This could be simplified a little by using a postfix SET rule along with
the original redirect.
redirect /sales/* /http://sales.server.com/*? proxy=reverse=location=/sales/
If the proxy=reverse=location=<string> ends in an
asterisk the entire 302 location field URL is appended (rather than just the
path) resulting in something along the lines of
Location: http://proxy.server.com/sales/http://sales.server.com/path/
which once redirected by the client can be subsequently tested for and
some action made by the proxy server according to the content (just a bell or
whistle ;^).
Authorization Verification
WASD can authorize reverse proxy requests locally (perhaps from the SYSUAF) and rewrite that username into the proxied requests "Authorization: ..." field. The proxied-to server can then verify that the request originated from the proxy server and extract and use that username as authenticated.
This functionality is described in the
HT_ROOT:[SRC.HTTPD]PROXYVERIFY.C
module.
16.5.2 - One-Shot Proxy
This looks a little like reverse proxy, providing access to a non-local resource via a standard (non-proxy) request. The difference allows the client to determine which remote resource is accessed. This works quite effectively for non-HTML resources (e.g. image, binary files, etc.) but non-self-referential links in HTML documents will generally be inaccessable to the client. This can provide provide scripts access to protocols they do not support, as with HTTP to FTP, HTTP to HTTP-over-SSL, etc.
Mappings appropriate to the protocols to be support must be made against
the proxy service. Of course mapping rules may also be used to control whom or
to what is connected.
[[the.proxy.service:port]]
# support "one-shot" non-proxy to proxy redirect
redirect /http://* http://*
redirect /https://* https://*
redirect /ftp://* ftp://*
# OK to process these (already, or now) proxy format requests
pass http://* http://*
pass https://* https://*
pass ftp://* ftp://*
The client may the provide the desired URL as the path of the request to
the proxy service. Notice that the scheme provided in the desired URL can be
any supported by the service and it's mappings.
http://the.proxy.service:port/http://the.remote.host/path
http://the.proxy.service:port/https://the.remote.host/path
http://the.proxy.service:port/ftp://the.remote.host/pub/
16.5.3 - DNS Wildcard Proxy
This relies on being able to manipulate host record in the DNS or local name resolution database. If a "*.the.proxy.host" DNS (CNAME) record is resolved it allows any host name ending in ".the.proxy.host" to be resolved to the corresponding IP address. Similarly (at least the Compaq TCP/IP Services) the local host database allows an alias like "another.host.name.proxy.host.name" for the proxy host name. Both of these would allow a browser to access "another.host.name.proxy.host.name" with it resolved to the proxy service. The request "Host:" field would contain "another.host.name.proxy.host.name".
Using this approach a fully functioning proxy may be implemented for the
browser without actually configuring it for proxy access, where returned HTML
documents contain links that are always correct with reference to the host used
to request them. This allows the client an ad hoc proxy for
selected requests. For a wildcard (CNAME) record the browser user may enter
any host name prepended to the proxy service host name and port and have the
request proxied to that host name. Entering the following URL into the browser
location field
http://the.host.name.the.proxy.service:8080/path
would result in a standard HTTP proxy request for "/path" being
made to "the.host.name:80". With the URL
https://the.host.name.the.proxy.service:8443/path
an SSL proxy request. Note that normally the well-known port would be
used to connect to (80 for http: and 443 for https:). If the final,
period-separated component of the wildcard host name is all digits it is
interpreted as a specific port to connect to. The example
http://the.host.name.8001.the.proxy.service:8080/path
would connect to "the.host.name:8001", and
https://the.host.name.8443.the.proxy.service:8443/path
to "the.host.name:8443".
NOTE
It has been observed that some browsers insist that an all-digit host name element is a port number despite it being prefixed by a period not a colon. These browsers then attempt to contact the host/port directly. This obviously precludes using an all-digit element to indicate a target port number with these browsers.
This wildcard DNS entry approach is a more fully functional analogue to
common proxy behaviour but is slightly less flexible in providing gatewaying
between protocols and does require more care in configuration. It also relies
on the contents of the request "Host:" field to provide mapping
information (which generally is not a problem with modern browsers). The
mappings must be performed in two parts, the first to handle the wildcard DNS
entry, the second is the fairly standard rule(s) providing access for proxy
processing.
[[the.proxy.service:port1]]
if (host:*.the.proxy.service:port1)
redirect * /http://*
else
pass http://* http://*
endif
The obvious difference between this and one-shot proxy is the desired host
name is provided as part of the URL host, not part of the request path. This
allows the browser to correctly resolve HTML links etc. It is less flexible
because a different proxy service needs to be provided for each protocol
mapping. Therefore, to allow HTTP to HTTP-over-SSL proxy gatewaying another
service and mapping would be required.
[[the.proxy.service:port2]]
if (host:*.the.proxy.service:port2)
redirect * /https://*
else
pass https://* https://*
endif
16.5.4 - Originating SSL
This proxy function allows standard HTTP clients to connect to Secure Sockets Layer (17 - Secure Sockets Layer) services. This is very different to the CONNECT service (16.3 - CONNECT Serving), allowing scripts and standard character-cell browsers supporting only HTTP to access secure services.
Standard username/password authentication is supported (as are all other
standard HTTP request/response interactions). The use of X.509 client
certificates (17.3.7 - Authorization Using X.509 Certification) to establish outgoing identity
is not currently supported.
Enabling SSL
Unlike HTTP and FTP proxy it requires the service to be specifically configured using the [ServiceClientSSL] directive.
There are a number of Secure Sockets Layer related service parameters that should also be considered (9 - Service Configuration). Although most have workable defaults unless [ServiceProxyClientSSLverifyCA] and [ServiceProxyClientSSLverifyCAfile] are specifically set the outgoing connection will be established without any checking of the remote server's certificate. This means the host's secure service could be considered unworthy of trust as it's credentials have not been established.
As with other proxy serving, HTTP-to-SSL gatewaying may enabled on a
per-service basis using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE
configuration file, or even the /SERVICE= qualifier, although not all options
are available unless using HTTPD$SERVICE.
HTTPD$CONFIG [Service]
The actual services providing the SSL gateway (i.e. the host and
port) are specified on a per-service basis, enabled by appending the
pclientssl keyword to the particular service specification. The
following example shows such a services.
[Service]
http://alpha.wasd.dsto.defence.gov.au:8080;proxy;pclientssl
HTTPD$SERVICE
With proxy service configuration being done using the HTTPD$SERVICE
configuration file (9 - Service Configuration) is is performed
with specific directives. This example illustrates configuring
the same services as used in the previous section.
[[http://alpha.wasd.dsto.defence.gov.au:8080]]
[ServiceProxy] enabled
[ServiceClientSSL] enabled
16.6 - Browser Proxy Configuration
The browser needs to be configured to access URLs via the proxy server. This is done using two basic approaches, manual and automatic.
Most browsers allow the configuration for access via a proxy server. This
commonly consists of an entry for each of the common Web protocol schemes
("http:", "ftp:", "gopher:", etc.). Supply the configured
WASD proxy service host name and port for the HTTP scheme. This is currently
the only one available. This would be similar to the following example:
http: www.wasd.dsto.defence.gov.au 8080
To exclude local hosts, and other servers that do not require proxy access,
there is usually a field that allows a list of hosts and/or domain names for
which the browser should not use proxy access. This might be something like:
wasd.dsto.defence.gov.au,dsto.defence.gov.au,defence.gov.au
At least Netscape Navigator/Communicator and Microsoft Internet Explorer
(4.n and following) provide the facility to download a small
JavaScript function for establishing proxy policy. Information on this
function and it's deployment may be found at
http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html
The following is a very simple proxy configuration JavaScript function.
This specifies that all URL host names that aren't full qualified, or that are
in the "defence.gov.au" domain will be connected to directly, with all
other being accessed via the specified proxy server.
function FindProxyForURL(url,host)
{
if (isPlainHostName(host) ||
dnsDomainIs(host, ".defence.gov.au"))
return "DIRECT";
else
return "PROXY www.wasd.dsto.defence.gov.au:8080; DIRECT";
}
This JavaScript is contained in a file with a specific, associated MIME file type, "application/x-ns-proxy-autoconfig". For WASD it is recommended the file be placed in HT_ROOT:[LOCAL] and have a file extension of .PAC (which follows Netscape naming convention).
The following HTTPD$CONFIG directive would map the file extension to the
required MIME type:
[AddType]
.PAC application/x-ns-proxy-autoconfig - proxy autoconfig
This file is commonly made the default document available from the proxy
service. The following example shows the HTTP$MAP rules required to do this:
[www.wasd.dsto.defence.gov.au:8080]
pass http://* http://*
pass / /ht_root/local/proxy.pac
pass *
All that remains is to provide the browser with the location from which
load this automatic proxy configuration file. In the case of the
above set-up this would be:
http://www.wasd.dsto.defence.gov.au:8080/
A template for a proxy auto-configuration file may be found at HT_ROOT:[EXAMPLE]PROXY_AUTOCONFIG.TXT