Common data-structures for CGI-like connectors.
This library tries to minimize the use of unsafe practices. It cannot be bullet proof however and you should read about security.
REMARK: It happens frequently that hard to predict random numbers
are needed in Web applications. The previous version of this
library used to include some facilities for that (in the
Netcgi_jserv
module). They have been dropped in favor of
Cryptokit.
The name of the argument.
The value of the argument, after all transfer encodings have been removed. If the value is stored in a file, the file will be loaded.
Argument.Oversized
if the argument was discarded.
Failure
if the object has been finalized.
Open the contents of the value as an input channel. This
works for all kinds of arguments, regardless of their
#storage
and #representation
.
Argument.Oversized
if the argument was discarded.
Failure
if the object has been finalized.
Tells whether the argument is stored in memory (as a string)
or as a file (the argument of `File
being the filename).
Returns the content type of the header and its parameters as a
couple (hdr, params)
. When the header is missing, the
result is ("text/plain", [])
. Below you will find access
method for frequently used parameters.
The charset
parameter of the content type of the header, or
""
when there is no such parameter, or no header.
The filename
parameter found in the header of file uploads.
When present, Some name
is returned, and None
otherwise.
(This is not to be confused with the possible local file
holding the data.)
The representation of the argument.
`Simple
the value of the CGI argument is an unstructured
string value.`MIME
The argument has a MIME header in addition to the
value. The MIME message is read-only.Arguments stored in temp files must be deleted when the
argument is no longer used. You can call finalize
to delete
such files. The method does not have any effect when store =
`Memory
. The method never raises any exceptions. If the
file no longer exists (e.g. because it was moved away) or if
there are any problems deleting the file, the error will be
ignored.
The finalize
method is not registered in the garbage
collector. You can do that, but it is usually better to call
this method manually.
Represent a key-value pair of data passed to the script (including file uploads).
Operations on arguments and lists of thereof.
simple_arg name value
creates an unstructured CGI argument
called name
with contents value
.
mime_arg ?name msg
creates a MIME-structured CGI argument
called name
with contents msg
. You can create msg
by
either Netmime.memory_mime_message or
Netmime.file_mime_message.
msg
header or it ""
if this field is not found.
true
by default.
It is easy to manipulate lists of arguments with the List
module. For example, List.filter (fun a -> a#name <> n) args
will remove from args
all occurrences of the argument with
name n
. The following functions are helpers for operations
specific to arguments.
set new_args args
creates a list of argument from args
deleting the arguments whose name appears in new_args
and
adding the new_args
arguments.
Old deprecated writable argument type.
Old deprecated simple argument class.
Old deprecated MIME argument class.
Functions to manipulate cookies.
You should know that besides the name
and value
attribute,
user agents will send at most the path
, domain
and port
and
usually will not send them at all.
For interoperability, cookies are set using version 0 (by Netscape) unless version 1 (RFC 2965 and the older RFC 2109) fields are set. While version 0 is well supported by browsers, RFC 2109 requires a recent browser and RFC 2965 is usually not supported. You do not have to worry however, cookies are always sent in such a way older browsers understand them -- albeit not all attributes of course -- so your application can be ready for the time RFC 2965 will be the norm.
make ?expires ?domain ?path ?secure name value
creates a new
cookie with name name
holding value
.
false
.
""
.
""
.
port c
the ports to which the cookie may be returned or []
if
not set.
The expiration time of the cookie, in seconds. None
means
that the cookie will be discarded when the browser exits.
This information is not returned by the browser.
Tells whether the cookie is secure. This information is not returned by the browser.
Returns the comment associated to the cookie or ""
if it
does not exists. This information is not returned by the
browser.
Returns the comment URL associated to the cookie or ""
if it
does not exists. This information is not returned by the
browser.
set_max_age c (Some t)
sets the lifetime of the cookie c
to t
seconds. If t <= 0
, it means that the cookie should
be discarded immediately. set_expires c None
tells the
cookie to be discarded when the user agent exits. (Despite
the fact that the name is borrowed from the version 1 of the
specification, it works transparently with version 0.)
Cookies are bound to a certain domain, i.e. the browser sends them only when web pages of the domain are requested:
None
: the domain is the hostname of the server.Some domain
: the domain is domain
.Cookies are also bound to certain path prefixes, i.e. the browser sends them only when web pages at the path or below are requested.
None
: the path is script name + path_infoSome p
: the path is p
. With Some "/"
you can disable the
path restriction completely.Cookies are also bound to the type of the web server:
set_secure false
means servers without SSL, set_secure
true
means servers with activated SSL ("https").
set_comment c s
sets the comment of the cookie c
to s
which must be UTF-8 encoded (RFC 2279). Because cookies can
store personal information, the comment should describe how
the cookie will be used so the client can decide whether to
allow the cookie or not. To cancel a comment, set it to ""
.
Cookie version 1 (RFC 2109).
set_comment_url c url
same as Netcgi.Cookie.set_comment
except that the cookie comment is available on the page
pointed by url
. To cancel, set it to ""
.
Cookie version 1 (RFC 2965).
set ports c (Some p)
says that the cookie c
must only be
returned if the server request comes from one of the listed
ports. If p = []
, the cookie will only be sent to the
request-port it was received from. set_ports c None
says
that the cookie may be sent to any port.
Cookie version 1 (RFC 2965).
Convert a Netscape cookie to the new representation
Convert to Netscape cookie (with information loss)
#
tmp_directory
| : string | ; | (* | The directory where to create temporary files. This should be an absolute path name. | *) |
#
tmp_prefix
| : string | ; | (* | The name prefix for temporary files. This must be a non-empty string. It must not contain '/'. | *) |
#
permitted_http_methods
| : http_method list | ; | (* | The list of accepted HTTP methods | *) |
#
permitted_input_content_types
| : string list | ; | (* | The list of accepted content types in requests. Content type parameters (like "charset") are ignored. If the list is empty, all content types are allowed. | *) |
#
input_content_length_limit
| : int | ; | (* | The maximum size of the request, in bytes. | *) |
#
max_arguments
| : int | ; | (* | The maximum number of CGI arguments | *) |
#
workarounds
| : [ | `MSIE_Content_type_bug | `Backslash_bug | `Work_around_MSIE_Content_type_bug | `Work_around_backslash_bug ] list | ; | (* | The list of enabled workarounds.
`Work_around_MSIE_Content_type_bug and
`Work_around_backslash_bug are deprecated versions of,
respectively, `MSIE_Content_type_bug and `Backslash_bug . | *) |
#
default_exn_handler
| : bool | ; | (* | Whether to catch exceptions raised by the script and display an error page. This will keep the connector running even if your program has bugs in some of its components. This will however also prevent a stack trace to be printed; if you want this turn this option off. | *) |
The default configuration is:
tmp_directory
: Netsys_tmp.tmp_directory()tmp_prefix
: "netcgi"permitted_http_methods
: `GET
, `HEAD
, `POST
.permitted_input_content_types
: "multipart/form-data"
,
"application/x-www-form-urlencoded"
.input_content_length_limit
: maxint
(i.e., no limit).max_arguments = 10000
(for security reasons)workarounds
: all of them.default_exn_handler
: set to true
.To create a custom configuration, it is recommended to use this syntax:
let custom_config = { default_config with tmp_prefix = "my_prefix" }
(This syntax is also robust w.r.t. the possible addition of new config flields.)
The following properties are standardised by CGI. The methods
return ""
(or None
in the case of the port number) when the
property is not available.
We recommend you to use the method Netcgi.cgi.request_method which is more type-safe and informative.
Returns a (possibly non-standard) CGI environment property.
If the property is not set, Not_found
is be raised unless
the default
argument is passed. The default
argument
determines the result of the function in this case.
The method takes the case-sensitive name and returns the value of the property. Usually, these properties have uppercase names.
For example, cgi_gateway_interface
returns the same as
cgi_property ~default:"" "GATEWAY_INTERFACE"
You cannot access the fields coming from the HTTP header. Use
the method input_header_field
instead.
Return all properties as an associative list.
A well-known extension is the HTTPS property. It indicates whether a secure connection is used (SSL/TLS). This method interprets this property and returns true if the connection is secure. This method fails if there is a HTTPS property with an unknown value.
#input_header_field ?default f
returns the value of a field
f
of the HTTP request header. The field name f
is
case-insensitive; if the name is a compound name, the parts
are separated by "-", e.g. "content-length"
. If there are
several fields with the same name only the first field will be
returned.
Not_found
if the field does not exist, unless the
default
argument is passed. The default
argument is the
result of the function in this case.
Returns the values of all fields with the passed name of the request header.
Returns the input header as (name,value) pairs. The names may consist of lowercase or uppercase letters.
This is a convenience method that returns the "User-agent"
field of the HTTP request header.
Returns the "Content-length"
request header field.
Not_found
if it is not set.
Returns the "Content-type"
request header field as a plain
string or ""
if it is not set.
Returns the parsed "Content-type"
request header field.
Not_found
if it is not set.
See also [root:Mimestring].scan_mime_type_ep.
Returns the value of a field of the response header. If the
field does not exist, Not_found
will be raised unless the
default
argument is passed. The default
argument determines
the result of the function in this case.
If there are several fields with the same name only the first field will be returned.
The anonymous string is the name of the field. The name is
case-insensitive, and it does not matter whether it consists
of lowercase or uppercase letters. If the name is a compound
name, the parts are separated by "-", e.g. "content-length"
.
Returns the values of all fields with the passed name of the repsonse header.
Returns the output header as (name,value) pairs. The names may consist of lowercase or uppercase letters.
Sets the value of a field of the response header. The previous value, if any, is overwritten. If there have been multiple values, all values will be removed and replaced by the single new value.
Sets multiple values of a field of the response header. Any previous values are removed and replaced by the new values.
Sets the complete response header at once.
Sets the response status. This is by definition the same as
setting the Status
output header field.
This method will encode and send the output header to the
output channel. Note that of the output_channel is
`Transactionnal
(as opposed to `Direct
), no output will
actually take place before you issue #commit_work()
-- thus
a #rollback_work()
will also rollback the headers as
expected.
#log_error msg
appends msg
to the webserver log.
The environment of a request consists of the information available besides the data sent by the user (as key-value pairs).
Determines how an URL part is generated:
`Env
: Take the value from the environment.`This v
: Use this value v
. It must already be URL-encoded.`None
: Do not include this part into the URL.Determines how the query part of URLs is generated:
`Env
: The query string of the current request.`This l
: The query string is created from the specified
argument list l
.`None
: The query string is omitted.`Args
: deprecated, use `This
(left for backward compatibility).This is only a small subset of the HTTP 1.1 cache control features, but they are usually sufficient, and they work for HTTP/1.0 as well. The directives mean:
`No_cache
: Caches are disabled. The following headers are
sent: Cache-control: no-cache
, Pragma: no-cache
, Expires:
(now - 1 second). Note that many versions of Internet Explorer
have problems to process non-cached contents when TLS/SSL is
used to transfer the file. Use `Max_age
in such cases (see
http://support.microsoft.com/kb/316431).`Max_age n
: Caches are allowed to store a copy of the
response for n
seconds. After that, the response must be
revalidated. The following headers are sent: Cache-control:
max-age n
, Cache-control: must-revalidate
, Expires:
(now +
n
seconds)`Unspecified
: No cache control header is added to the
response.Notes:
Pragma
and Expires
headers are sent, too. These fields are not interpreted by
HTTP/1.1 clients because Cache-control
has higher precedence.#argument name
returns the value of the argument named name
.
If the argument appears several times, only one of its
instances is used.
Not_found
if no such argument exists.
#argument_value
returns the value of the argument as a
string. If the argument does not exist, the default
is
returned.
""
.
#argument_exists
returns false
if the named parameter is
missing and true
otherwise.
#multiple_argument name
returns all the values of the
argument named name
.
The environment object. This object is the "outer layer" of the activation object that connects it with real I/O channels.
The HTTP method used to make the request.
This method calls #finalize
for every CGI argument
(including the possible one of PUT) to ensure that all files
are deleted. It also executes all functions registered with
#at_exit
. It does not close the in/out channels, however.
This method is not registered in the garbage collector, and it
is a bad idea to do so. However, all connectors offered in
Netcgi automatically call #finalize
at the end of the
request cycle (even when its terminated by an uncaught exception
when #config.default_exn_handler
is true) so you do not have
to worry much about calling it yourself.
Returns the URL of the current CGI-like script. (Note that it may differ from the actual URL that requested the script if, for example, rewriting rules were specified in the web server configuration.)
`Env
.
`Env
.
`Env
.
#store
being
`Memory
will be added. Default: `None
, i.e. no query
string.
Sets the header (removing any previous one). When the output
channel supports transactions, it is possible to set the
header (possibly several times) until the #out_channel
is
commited for the first time or #env#send_output_header()
is
called. When there is no support for transactions, the header
must be set before the first byte of output is written.
If #set_header
is called a second time, it will overwrite
all the header fields.
`Ok
status in this case.
"text/html"
.
set_cookies
.
[]
.
Remember that the browser may not support more than 20 cookies
per web server. You can query the cookies using env#cookies
and env#cookie
. If you set cookies, you want to think about
an appropriate cache
setting. You may also want to add a
P3P header (Platform for Privacy
Preferences) -- otherwise your cookies may be discarded by
some browsers.
`Unspecified
. It is strongly
recommended to specify the caching behaviour!!! You are on
the safe side with `No_cache
, forcing every page to be
regenerated. If your data do not change frequently, `Max_age
n
tells the caches to store the data at most n
seconds.
""
, i.e. no filename. Note: It is bad practice if the
filename contains problematic characters (backslash, double
quote, space), or the names of directories. It is recommended
that you set content_type
to "application/octet-stream" for
this feture to work with most browsers and, if possible, to
set content_length
because that usually improves the
download dialog.)
ONXXX
attributes containing scripts before the first
<SCRIPT>
element, because you cannot specify the script
language for the ONXXX
attributes otherwise. script_type
must be a media type, e.g. "text/javascript". Default: no
language is specified.
STYLE
attributes containing scripts before the first
<STYLE>
element, because you cannot specify the style
language for the STYLE
attributes otherwise. style_type
must be a media type, e.g. "text/css". Default: no language
is specified.
[]
.
Sets the header such that a redirection to the specified URL is performed. If the URL begins with "http:" the redirection directive is passed back to the client, and the client will repeat the request for the new location (with a GET method). If the URL begins with "/", the server performs the redirection, and it is invisible for the client.
The output channel to which the generated content is intended
to be written. The header is not stored in this channel, so
#pos_out
returns the size of the DATA in bytes (useful to
set Content-Length). Note that HEAD requests must not send
back a message body so, in this case, all data sent to this
channel is discarded. This allows your scripts to work
unmodified for GET, POST and HEAD requests.
The output channel may have transactional semantics, and
because of this, it is an trans_out_obj_channel
.
Implementations are free to support transactions or not.
After all data have been written, the method #commit_work()
must be called, even if there is no support for
transactions.
Simple Example:
cgi # out_channel # output_string "Hello world!\n";
cgi # out_channel # commit_work()
Example for an error handler and a transaction buffer: If an error happens, it is possible to roll the channel back, and to write the error message.
try
cgi # set_header ... ();
cgi # out_channel # output_string "Hello World!"; ...
cgi # out_channel # commit_work();
with err ->
cgi # out_channel # rollback_work();
cgi # set_header ... ();
cgi # out_channel # output_string "Software error!"; ...
cgi # out_channel # commit_work();
#at_exit f
registers the function f
to be executed when
#finalize
is called (which is done automatically when the
request finishes). The functions are executed in the reverse
order in which they were registered.
Object symbolizing a CGI-like request/response cycle.
This is the minimal set of services a connector must provide. Additional methods may be defined for specific connectors.
The ouput type determines how generated data is buffered.
`Direct sep
: Data written to the output channel of the
activation object is not collected in a transaction buffer, but
directly sent to the browser (the normal I/O buffering is still
active, however, so call #flush
to ensure that data is really
sent). The method #commit_work
of the output channel is the
same as #flush
. The method #rollback_work
causes that the
string sep
is sent, meant as a separator between the already
generated output, and the now following error message.`Transactional f
: A transactional channel tc
is created
from the real output channel ch
by calling f cfg ch
(here,
cfg
is the CGI configuration). The channel tc
is propagated
as the output channel of the activation object. This means that
the methods commit_work
and rollback_work
are implemented by
tc
, and the intended behaviour is that data is buffered in a
special transaction buffer until commit_work
is called. This
invocation forces the buffered data to be sent to the
browser. If, however, rollback_work
is called, the buffer is
cleared.Two important examples for `Transactional
are:
let buffered _ ch = new Netchannels.buffered_trans_channel ch in
`Transactional buffered
`Transactional(fun _ ch -> new Netchannels.tempfile_output_channel ch)
The output_type
implementing transactions with a RAM-based buffer
The output_type
implementing transactions with a tempfile-based
buffer
This is the type of functions arg_store
so that arg_store env
name header
tells whether to `Discard
the argument or to
store it into a `File
or in `Memory
. The parameters passed
to arg_store
are as follows:
env
is the CGI environment. Thus, for example, you can have
different policies for different cgi_path_info
.name
is the name of the argument.header
is the MIME header of the argument (if any).Any exception raised by arg_store
will be treated like if it
returned `Discard
. Note that the `File
will be treated
like `Memory
except for `POST
"multipart/form-data" and
`PUT
queries.
`Automatic
means to store it into a file if the header
contains a file name and otherwise in memory (strictly
speaking `Automatic
is not necessary since arg_store
can
check the header but is provided for your convenience).
`Memory_max
(resp. `File_max
, resp. `Automatic_max
) is
the same as `Memory
(resp. `File
, resp. `Automatic
)
except that the parameter indicates the maximum size in kB of
the argument value. If the size is bigger, the
Netcgi.cgi_argument methods #value
and #open_value_rd
methods will raise Netcgi.Argument.Oversized.
Remark: this allows for fine grained size constraints while
Netcgi.config.input_content_length_limit
option is a
limit on the size of the entire request.
A function of type exn_handler
allows to define a custom
handler of uncaught exceptions raised by the unit -> unit
parameter. A typical example of exn_handler
is as follows:
let exn_handler env f =
try f()
with
| Exn1 -> (* generate error page *)
env#set_output_header_fields [...];
env#send_output_header();
env#out_channel#output_string "...";
env#out_channel#close_out()
| ...
Directive how to go on with the current connection:
`Conn_close
: Just shut down and close descriptor`Conn_close_linger
: Linger, shut down, and close descriptor`Conn_keep_alive
: Check for another request on the same connection`Conn_error e
: Shut down and close descriptor, and handle the
exception e
Specific connectors can be found in separate modules. For example:
A typical use is as follows:
open Netcgi
let main (cgi:cgi) =
let arg = cgi#argument_value "name" in
...
cgi#out_channel#commit_work()
let () =
let buffered _ ch = new Netchannels.buffered_trans_channel ch in
Netcgi_cgi.run ~output_type:(`Transactional buffered) main