Chapter 5 - Using HTMLDOC on a Web Server

This chapter describes how to interface HTMLDOC to your web server using CGI scripts and programs.

Note:

The free version of HTMLDOC for Windows does not support use from a web server.

The Basics

HTMLDOC can be used in a variety of ways to generate formatted reports on a web server. The most common way is to combine HTMLDOC with a CGI script or program and send the output to the HTTP client.

To make this work the CGI script or program must send the appropriate HTTP attributes, the required empty line to signify the beginning of the document, and then execute the HTMLDOC program to generate the HTML, PostScript, or PDF file as needed.

Another way to generate PDF files from your reports is to use HTMLDOC as a "portal" application. When used as a portal, HTMLDOC automatically retrieves the named document or report from your server and passes a PDF version to the web browser. See the next sections for more information.

WARNING:

Passing information directly from the web browser to HTMLDOC can potentially expose your system to security risks. Always be sure to "sanitize" any input from the web browser so that filenames, URLs, and options passed to HTMLDOC are not acted on by the shell program.

Calling HTMLDOC from a Shell Script

Shell scripts are probably the easiest to work with, but are normally limited to GET type requests. Here is a script called topdf that acts as a portal, converting the named file to PDF:

#!/bin/sh
#
# Sample "portal" script to convert the named HTML file to PDF on-the-fly.
#
# Usage: http://www.domain.com/path/topdf/path/filename.html
#

#
# The "options" variable contains any options you want to pass to HTMLDOC.
#

options="-t pdf --webpage --header ... --footer ..."

#
# Tell the browser to expect a PDF file...
#

echo "Content-Type: application/pdf"
echo ""

#
# Run HTMLDOC to generate the PDF file...
#

htmldoc $options http://${SERVER_NAME}:${SERVER_PORT}$PATH_INFO

Users of this CGI would reference the URL "http://www.domain.com/topdf.cgi/index.html" to generate a PDF file of the site's home page.

The options variable in the script can be set to use any supported command-line option for HTMLDOC; for a complete list see Chapter 8 - Command-Line Reference.

Calling HTMLDOC from Perl

Perl scripts offer the ability to generate more complex reports, pull data from databases, etc. The easiest way to interface Perl scripts with HTMLDOC is to write a report to a temporary file and then execute HTMLDOC to generate the PDF file.

Here is a simple Perl subroutine that can be used to write a PDF report to the HTTP client:

sub topdf(filename);

sub topdf {
    # Get the filename argument...
    my $filename = shift;

    # Make stdout unbuffered...
    select(STDOUT); $| = 1;

    # Write the content type to the client...
    print "Content-Type: application/pdf\n\n";

    # Run HTMLDOC to provide the PDF file to the user...
    system "htmldoc -t pdf --quiet --webpage $filename";
}

Calling HTMLDOC from PHP

PHP is quickly becoming the most popular server-side scripting language available. PHP provides a passthru() function that can be used to run HTMLDOC. This combined with the header() function can be used to provide on-the-fly reports in PDF format.

Here is a simple PHP function that can be used to convert a HTML report to PDF and send it to the HTTP client:

function topdf($filename, $options = "") {
    # Write the content type to the client...
    header("Content-Type: application/pdf");
    flush();

    # Run HTMLDOC to provide the PDF file to the user...
    passthru("htmldoc -t pdf --quiet --jpeg --webpage $options '$filename'");
}

The function accepts a filename and an optional "options" string for specifying the header, footer, fonts, etc.

To prevent malicious users from passing in unauthorized characters into this function, the following function can be used to verify that the URL/filename does not contain any characters that might be interpreted by the shell:

function bad_url($url) {
    // See if the URL starts with http: or https:...
    if (strncmp($url, "http://", 7) != 0 &&
	strncmp($url, "https://", 8) != 0) {
        return 1;
    }

    // Check for bad characters in the URL...
    $len = strlen($url);
    for ($i = 0; $i < $len; $i ++) {
        if (!strchr("~_*()/:%?+-&@;=,$.", $url[$i]) &&
	    !ctype_alnum($url[$i])) {
	    return 1;
	}
    }

    return 0;
}

Another method is to use the escapeshellarg() function provided with PHP 4.0.3 and higher to generate a quoted shell argument for HTMLDOC.

To make a "portal" script, add the following code to complete the example:

global $SERVER_NAME;
global $SERVER_PORT;
global $PATH_INFO;
global $QUERY_STRING;

if ($QUERY_STRING != "") {
    $url = "http://${SERVER_NAME}:${SERVER_PORT}${PATH_INFO}?${QUERY_STRING}";
} else {
    $url = "http://${SERVER_NAME}:${SERVER_PORT}$PATH_INFO";
}

if (bad_url($url)) {
  print("<HTML><HEAD><TITLE>Bad URL</TITLE></HEAD>\n"
       ."<BODY><H1>Bad URL</H1>\n",
       ."<P>The URL <B><TT>$url</TT></B> is bad.</P>\n"
       ."</BODY></HTML>\n");
} else {
  topdf($url);
}

Calling HTMLDOC from C

C programs offer the best flexibility and easily supports on-the-fly report generation without the need for temporary files.

Here are some simple C functions that can be used to generate a PDF report to the HTTP client from a temporary file or pipe:

#include <stdio.h>
#include <stdlib.h>


/* topdf() - convert a HTML file to PDF */
FILE *topdf(const char *filename)	/* HTML file to convert */
{
  char	command[1024];			/* Command to execute */


  puts("Content-Type: application/pdf\n");

  sprintf(command, "htmldoc -t pdf --webpage %s", filename);

  return (popen(command, "w"));
}


/* topdf2() - pipe HTML output to HTMLDOC for conversion to PDF */
FILE *topdf2(void)
{
  puts("Content-Type: application/pdf\n");
  return (popen("htmldoc -t pdf --webpage -", "w"));
}

Calling HTMLDOC from Java

Java programs are a portable way to add PDF support to your web server. Here is a class called htmldoc that acts as a portal, converting the named file to PDF. It can also be called by your Java servlets to process an HTML file and send the result to the client in PDF format:

class htmldoc
{
  // Convert named file to PDF on stdout...
  public static int topdf(String filename)// I - Name of file to convert
  {
    String		command;	// Command string
    Process		process;	// Process for HTMLDOC
    Runtime		runtime;	// Local runtime object
    java.io.InputStream	input;		// Output from HTMLDOC
    byte		buffer [];	// Buffer for output data
    int			bytes;		// Number of bytes


    // First tell the client that we will be sending PDF...
    System.out.print("Content-type: application/pdf\n\n");

    // Construct the command string
    command = "htmldoc --quiet --jpeg --webpage -t pdf --left 36 " +
              "--header .t. --footer .1. " + filename;

    // Run the process and wait for it to complete...
    runtime = Runtime.getRuntime();

    try
    {
      // Create a new HTMLDOC process...
      process = runtime.exec(command);

      // Get stdout from the process and a buffer for the data...
      input  = process.getInputStream();
      buffer = new byte[8192];

      // Read output from HTMLDOC until we have it all...
      while ((bytes = input.read(buffer)) > 0)
        System.out.write(buffer, 0, bytes);

      // Return the exit status from HTMLDOC...
      return (process.waitFor());
    }
    catch (Exception e)
    {
      // An error occurred - send it to stderr for the web server...
      System.err.print(e.toString() + " caught while running:\n\n");
      System.err.print("    " + command + "\n");
      return (1);
    }
  }

  // Main entry for htmldoc class
  public static void main(String[] args)// I - Command-line args
  {
    String	server_name,		// SERVER_NAME env var
		server_port,		// SERVER_PORT env var
		path_info,		// PATH_INFO env var
		query_string,		// QUERY_STRING env var
		filename;		// File to convert


    if ((server_name = System.getProperty("SERVER_NAME")) != null &&
        (server_port = System.getProperty("SERVER_PORT")) != null &&
	(path_info = System.getProperty("PATH_INFO")) != null)
    {
      // Construct a URL for the resource specified...
      filename = "http://" + server_name + ":" + server_port + path_info;

      if ((query_string = System.getProperty("QUERY_STRING")) != null)
      {
        filename = filename + "?" + query_string;
      }
    }
    else if (args.length == 1)
    {
      // Pull the filename from the command-line...
      filename = args[0];
    }
    else
    {
      // Error - no args or env variables!
      System.err.print("Usage: htmldoc.class filename\n");
      return;
    }

    // Convert the file to PDF and send to the web client...
    topdf(filename);
  }
}