This module mainly defines the http_protocol
class which implements the
exchange of messages with a HTTP client. The request messages are represented
as sequence of req_token
values. The response is encapsulated in a separate
http_response
class. The contents of the response are represented as sequence
of resp_token
values.
These are the serious protocol violations after that the daemon stops any further processing.
Note that `Timeout
refers to a timeout in the middle of a request.
`Broken_pipe_ignore
is the "harmless" version of `Broken_pipe
(see config_suppress_broken_pipe
).
Long messages are fatal because it is suspected that they are denial
of service attacks. The kernel generates `Message_too_long
only for
long headers, not for long bodies.
Fatal server errors can happen when exceptions are not properly handled. As last resort the HTTP daemon closes the connection without notifying the client.
A bad request is a violation where the current request cannot be decoded, and it is not possible to accept further requests over the current connection.
Convert error to a string, for logging
Returns the best response code for the error
A data_chunk
is a substring of a string. The substring is described by
the triple (s, pos, len)
where s
is the container, pos
is the
position where the substring begins, and len
its length.
= (code, phrase)
The resp_token
represents a textual part of the response to send:
`Resp_info_line
is an informational status line (code=100..199). There can
be several informational lines, and they can be accompanied with their own
headers. Such lines are only sent to HTTP/1.1 clients.`Resp_status_line
is the final status line to send (code >= 200)`Resp_header
is the whole response header to send`Resp_body
is the next part of the response body to send.`Resp_trailer
is the whole response trailer to send (currently ignored)`Resp_action
is special because it does not directly represent a token
to send. The argument is a function which is called when the token is
the next token on the active event queue. The function is also called when
the event queue is dropped because of an error (the state of the
response object indicates this). The function must not raise exceptions
except Unix_error
, and it must not block.The response state:
`Inhibited
= it is not yet allowed to start the response`Queued
= the response waits on the queue for activation`Active
= the response is currently being transmitted`Processed
= the response has been completely sent`Error
= an error occurred during the transmission of this response`Dropped
= an earlier response forced to close the connection, and
this response is dequeuedTokens generated by http_response
:
`Resp_wire_data
are data tokens.`Resp_end
indicates the end of the response.See config
Represents the action of sending the response
This class has an internal
queue of response tokens that are not yet processed. One can easily add
new tokens to the end of the queue (send
).
The class is responsible for determining the transfer encoding:
Currently, the TE
request header is not taken into account. The trailer
is always empty.
The following headers are set (or removed) by this class:
Transfer-Encoding
Trailer
Date
Connection
Upgrade
Server
(it is appended to this field)Responses for HEAD requests have the special behaviour that the body is silently dropped. The calculation of header fields is not affected by this. This means that HEAD can be easily implemented by doing the same as for GET.
Responses for other requests that must not include a body must set
Content-Length
to 0.
These methods can be called by the content provider:
The bidrectional phase starts after "100 Continue" has been sent to the client, and stops when the response body begins. The bidirectional phase is special for the calculation of timeout values (input determines the timeout although the response has started).
Return whether the send queue is empty. When the state is `Inhibited
, this
method fakes an empty queue.
Returns whether the connection should be closed after this response.
This flag should be evaluated when the `Resp_end
front token has been
reached.
Returns the selected transfer encoding. This is valid after the header
has been passed to this object with send
.
The first token of the queue, represented as data_chunk
. Raises
Send_queue_empty
when there is currently no front token, or the state
is `Inhibited
.
If there is a front token, it will never have length 0.
Note that Unix_error
exceptions can be raised when `Resp_action
tokens are processed.
The function will be called when either set_state
changes the state,
or when the send queue becomes empty. Note that the callback must never
fail, it is called in situations that make it hard to recover from errors.
Accumulated size of the response body
These methods must only be called by the HTTP protocol processor:
Tell this object that n
bytes of the front token could be really
sent using Unix.write
. If this means that the whole front token
has been sent, the next token is pulled from the queue and is made
the new front token. Otherwise, the data chunk representing the
front token is modified such that the position is advanced by
n
, and the length is reduced by n
.
Encapsultation of the HTTP response for a single request
Exported for debugging and testing only
Sends the string argument as response body, together with the given status and the header (optional). Response header fields are set as follows:
Content-Length
is set to the length of the string.Content-Type
is set to "text/html" unless given by the header.
If the header object is passed in, these modifications are done
directly in this object as side effect.Sends the contents of a file as response body, together with the given status and the header (optional). The descriptor must be a file descriptor (that cannot block). The int64 number is the length of the body. Response header fields are set as follows:
Content-Length
is set to the length of the string.Content-Type
is set to "text/html" unless given by the header.Note that Content-Range
is not set automatically, even if the file is only
partially transferred.
If the header object is passed in, these modifications are done directly in this object as side effect.
The function does not send the file immediately, but rather sets the http_response
object up that the next chunk of the file is added when the send queue becomes
empty. This file will be closed when the transfer is done.
The method (including the URI), and the HTTP version
A req_token
represents a textual part of the received request:
`Req_header
is the full received header. Together with the header,
the corresponding http_response
object is returned which must
be used to transmit the response.`Req_expect_100_continue
is generated when the client expects that the
server sends a "100 Continue" response (or a final status code) now.
One should add `Resp_info_line resp_100_continue
to the send queue
if the header is acceptable, or otherwise generate an error response. In any
case, the rest of the request must be read until `Req_end
.`Req_body
is a part of the request body. The transfer-coding, if any,
is already decoded.`Req_trailer
is the received trailer`Req_end
indicates the end of the request (the next request may begin
immediately).`Eof
indicates the end of the stream`Bad_request_error
indicates that the request violated the HTTP protocol
in a serious way and cannot be decoded. It is required to send a
"400 Bad Request" response. The following token will be `Eof
.`Fatal_error
indicates that the connection crashed.
The following token will be `Eof
.`Timeout
means that nothing has been received for a certain amount
of time, and the protocol is in a state that the next request can begin.
The following token will be `Eof
.Note that it is always allowed to send
tokens to the client. The protocol
implementation takes care that the response is transmitted at the right point
in time.
Maximum size of the request line. Longer lines are immediately replied with a "Request URI too long" response. Suggestion: 32768.
Maximum size of the header, including the request line. Longer headers
are treated as attack, and cause the fatal error `Message_too_long
.
Suggestion: 65536.
Maximum size of the trailer
Limits the length of the pipeline (= unreplied requests). A value of 0 disables pipelining. A value of n allows that another request is received although there are already n unreplied requests.
Limits the size of the pipeline in bytes. If the buffered bytes in the input queue exceed this value, the receiver temporarily stops reading more data. The value 0 has the effect that even the read-ahead of data of the current request is disabled. The value (-1) disables the receiver completely (not recommended).
Whether to set the Server
header:
`Ignore
: The kernel does not touch the Server
header.`Ocamlnet
: Announce this web server as "Ocamlnet/<version>"`Ocamlnet_and s
: Announce this web server as s
and append
the Ocamlnet string.`As s
: Announce this web server as s
Whether to suppress `Broken_pipe
errors. Instead
`Broken_pipe_ignore
is reported.
Configuration values for the HTTP kernel
Default config:
config_max_reqline_length = 32768
config_max_header_length = 65536
config_max_trailer_length = 32768
config_limit_pipeline_length = 5
config_limit_pipeline_size = 65536
config_announce_server = `Ocamlnet
config_suppress_broken_pipe = false
Modifies the passed config object as specified by the optional arguments
Exchange of HTTP messages
In fd
one must pass the already connected socket. It must be in non-
blocking mode.
How to use this class: Basically, one invokes cycle
until the whole
message exchange on fd
is processed. cycle
receives data from the
socket and sends data to the socket. There are two internal queues:
The receive queue stores parts of received requests as req_token
.
One can take values from the front of this queue by calling receive
.
The response queue stores http_response
objects. Each of the objects
corresponds to a request that was received before. This queue is handled
fully automatically, but one can watch its length to see whether all responses
are actually transmitted over the wire.
The basic algorithm to process messages is:
let rec next_token () =
if proto # recv_queue_len = 0 then (
proto # cycle ();
next_token()
)
else
proto # receive()
let cur_token = ref (next_token()) in
while !cur_token <> `Eof do
(* Process first token of next request: *)
match !cur_token with
| `Req_header(req_line, header, resp) ->
(* Depending on [req_line], read further tokens until [`Req_end] *)
...
(* Switch to the first token of the next message: *)
cur_token := next_token()
| `Timeout -> ...
| `Bad_request_error(e,resp) ->
(* Generate 400 error, send it to [resp] *)
...
(* Switch to the first token of the next message: *)
cur_token := next_token()
| `Fatal_error e -> failwith "Crash"
| _ -> assert false
done;
while proto # resp_queue_len > 0 do
proto # cycle ();
done;
proto # shutdown()
See the file tests/easy_daemon.ml
for a complete implementation of this.
As one can see, it is essential to watch the lengths of the queues in order
to figure out what has happened during cycle
.
When the body of the request is empty, `Req_body
tokens are omitted.
Note that for requests like GET
that always have an empty body, it is
still possible that an errorneous client sends a body, and that `Req_body
tokens arrive. One must accept and ignore these tokens.
Error handling: For serious errors, the connection is immediately aborted.
In this case, receive
returns a `Fatal_error
token. Note that the
queued responses cannot be sent! An example of this is `Broken_pipe
.
There is a large class of non-serious errors, esp. format errors
in the header and body. It is typical of these errors that one cannot determine
the end of the request properly. For this reason, the daemon stops reading
further data from the request, but the response queue is still delivered.
For these errors, receive
returns a `Bad_request_error
token.
This token contains a http_response
object that must be filled with a
400 error response.
Looks at the file descriptor. If there is data to read from the descriptor,
and there is free space in the input buffer, additional data is read into
the buffer. It is also tried to interpret the new data as req_token
s,
and if possible, new req_token
s are appended to the receive queue.
If the response queue has objects, and there is really data one can send, and if the socket allows one to send data, it is tried to send as much data as possible.
The option block
(default: 0) can be set to wait until data
can be exchanged with the socket. This avoids busy waiting. The number
is the duration in seconds to wait until the connection times out
(0 means not to wait at all, -1 means to wait infinitely). When a timeout
happens, and there is nothing to send, and the last request was fully
processed, receive
will simply return `Timeout
(i.e. when
waiting_for_next_message
is true
). Otherwise, the
fatal error `Timeout
is generated.
Returns the first req_token
from the receive queue. Raises
Recv_queue_empty
when the queue is empty (= has no new data)
Peeks the first token, but leaves it in the queue.
Raises Recv_queue_empty
when the queue is empty.
Returns the length of the receive queue (number of tokens)
Returns the length of the internal response queue (number of http_response
objects that have not yet fully processed)
Returns the number of unanswered requests = Number of received `Req_end
tokens
minus number of responses in state `Processed
. Note that pipeline_len
can become -1
when bad requests are responded.
Returns the (estimated) size of the input queue in bytes
Whether the kernel is currently waiting for the beginning of a new
arriving HTTP request. This is false
while the request is being
received.
Suggests the calculation of a timeout value for input:
`Normal
: The normal timeout value applies`Next_message
: The timeout value applies while waiting for the next message`None
: The connection is output-driven, no input timeout valueShuts the socket down. Note: the descriptor is not closed.
Process a timeout condition as cycle
does
Stops the transmission of data. The receive queue is cleared and filled
with the two tokens `Fatal_error
and `Eof
.
The response queue is cleared. The cycle
method will return immediately without doing anything.
Returns true
iff the protocol engine is interested in new data from the
socket. Returns false
after EOF and after errors.
Returns true
iff the protocol engine has data to output to the socket
Returns true
when a lingering close operation is needed to reliably shut
down the socket. In many cases, this expensive operation is not necessary.
See the class lingering_close
below.
For testing: returns a list of tokens indicating into which cases the program ran.
The core event loop of the HTTP daemon
Closes a file descriptor using the "lingering close" algorithm
Usage:
while lc # lingering do lc # cycle ~block:true () done
Reads data from the file descriptor until EOF or until a fixed timeout
is over. Finally, the descriptor is closed. If block
is set, the method
blocks until data is available. (Default: false
)
Whether the socket is still lingering
Closes a file descriptor using the "lingering close" algorithm.
The optional preclose
function is called just before Unix.close
.