This module mainly defines the http_protocol class which implements the
exchange of messages with a HTTP client. The request messages are represented
as sequence of req_token values. The response is encapsulated in a separate
http_response class. The contents of the response are represented as sequence
of resp_token values.
These are the serious protocol violations after that the daemon stops any further processing.
Note that `Timeout refers to a timeout in the middle of a request.
`Broken_pipe_ignore is the "harmless" version of `Broken_pipe
(see config_suppress_broken_pipe).
Long messages are fatal because it is suspected that they are denial
of service attacks. The kernel generates `Message_too_long only for
long headers, not for long bodies.
Fatal server errors can happen when exceptions are not properly handled. As last resort the HTTP daemon closes the connection without notifying the client.
A bad request is a violation where the current request cannot be decoded, and it is not possible to accept further requests over the current connection.
Convert error to a string, for logging
Returns the best response code for the error
A data_chunk is a substring of a string. The substring is described by
the triple (s, pos, len) where s is the container, pos is the
position where the substring begins, and len its length.
= (code, phrase)
The resp_token represents a textual part of the response to send:
`Resp_info_line is an informational status line (code=100..199). There can
be several informational lines, and they can be accompanied with their own
headers. Such lines are only sent to HTTP/1.1 clients.`Resp_status_line is the final status line to send (code >= 200)`Resp_header is the whole response header to send`Resp_body is the next part of the response body to send.`Resp_trailer is the whole response trailer to send (currently ignored)`Resp_action is special because it does not directly represent a token
to send. The argument is a function which is called when the token is
the next token on the active event queue. The function is also called when
the event queue is dropped because of an error (the state of the
response object indicates this). The function must not raise exceptions
except Unix_error, and it must not block.The response state:
`Inhibited = it is not yet allowed to start the response`Queued = the response waits on the queue for activation`Active = the response is currently being transmitted`Processed = the response has been completely sent`Error = an error occurred during the transmission of this response`Dropped = an earlier response forced to close the connection, and
this response is dequeuedTokens generated by http_response:
`Resp_wire_data are data tokens.`Resp_end indicates the end of the response.See config
Represents the action of sending the response
This class has an internal
queue of response tokens that are not yet processed. One can easily add
new tokens to the end of the queue (send).
The class is responsible for determining the transfer encoding:
Currently, the TE request header is not taken into account. The trailer
is always empty.
The following headers are set (or removed) by this class:
Transfer-EncodingTrailerDateConnectionUpgradeServer (it is appended to this field)Responses for HEAD requests have the special behaviour that the body is silently dropped. The calculation of header fields is not affected by this. This means that HEAD can be easily implemented by doing the same as for GET.
Responses for other requests that must not include a body must set
Content-Length to 0.
These methods can be called by the content provider:
The bidrectional phase starts after "100 Continue" has been sent to the client, and stops when the response body begins. The bidirectional phase is special for the calculation of timeout values (input determines the timeout although the response has started).
Return whether the send queue is empty. When the state is `Inhibited, this
method fakes an empty queue.
Returns whether the connection should be closed after this response.
This flag should be evaluated when the `Resp_end front token has been
reached.
Returns the selected transfer encoding. This is valid after the header
has been passed to this object with send.
The first token of the queue, represented as data_chunk. Raises
Send_queue_empty when there is currently no front token, or the state
is `Inhibited.
If there is a front token, it will never have length 0.
Note that Unix_error exceptions can be raised when `Resp_action
tokens are processed.
The function will be called when either set_state changes the state,
or when the send queue becomes empty. Note that the callback must never
fail, it is called in situations that make it hard to recover from errors.
Accumulated size of the response body
These methods must only be called by the HTTP protocol processor:
Tell this object that n bytes of the front token could be really
sent using Unix.write. If this means that the whole front token
has been sent, the next token is pulled from the queue and is made
the new front token. Otherwise, the data chunk representing the
front token is modified such that the position is advanced by
n, and the length is reduced by n.
Encapsultation of the HTTP response for a single request
Exported for debugging and testing only
Sends the string argument as response body, together with the given status and the header (optional). Response header fields are set as follows:
Content-Length is set to the length of the string.Content-Type is set to "text/html" unless given by the header.
If the header object is passed in, these modifications are done
directly in this object as side effect.Sends the contents of a file as response body, together with the given status and the header (optional). The descriptor must be a file descriptor (that cannot block). The int64 number is the length of the body. Response header fields are set as follows:
Content-Length is set to the length of the string.Content-Type is set to "text/html" unless given by the header.Note that Content-Range is not set automatically, even if the file is only
partially transferred.
If the header object is passed in, these modifications are done directly in this object as side effect.
The function does not send the file immediately, but rather sets the http_response
object up that the next chunk of the file is added when the send queue becomes
empty. This file will be closed when the transfer is done.
The method (including the URI), and the HTTP version
A req_token represents a textual part of the received request:
`Req_header is the full received header. Together with the header,
the corresponding http_response object is returned which must
be used to transmit the response.`Req_expect_100_continue is generated when the client expects that the
server sends a "100 Continue" response (or a final status code) now.
One should add `Resp_info_line resp_100_continue to the send queue
if the header is acceptable, or otherwise generate an error response. In any
case, the rest of the request must be read until `Req_end.`Req_body is a part of the request body. The transfer-coding, if any,
is already decoded.`Req_trailer is the received trailer`Req_end indicates the end of the request (the next request may begin
immediately).`Eof indicates the end of the stream`Bad_request_error indicates that the request violated the HTTP protocol
in a serious way and cannot be decoded. It is required to send a
"400 Bad Request" response. The following token will be `Eof.`Fatal_error indicates that the connection crashed.
The following token will be `Eof.`Timeout means that nothing has been received for a certain amount
of time, and the protocol is in a state that the next request can begin.
The following token will be `Eof.Note that it is always allowed to send tokens to the client. The protocol
implementation takes care that the response is transmitted at the right point
in time.
Maximum size of the request line. Longer lines are immediately replied with a "Request URI too long" response. Suggestion: 32768.
Maximum size of the header, including the request line. Longer headers
are treated as attack, and cause the fatal error `Message_too_long.
Suggestion: 65536.
Maximum size of the trailer
Limits the length of the pipeline (= unreplied requests). A value of 0 disables pipelining. A value of n allows that another request is received although there are already n unreplied requests.
Limits the size of the pipeline in bytes. If the buffered bytes in the input queue exceed this value, the receiver temporarily stops reading more data. The value 0 has the effect that even the read-ahead of data of the current request is disabled. The value (-1) disables the receiver completely (not recommended).
Whether to set the Server header:
`Ignore: The kernel does not touch the Server header.`Ocamlnet: Announce this web server as "Ocamlnet/<version>"`Ocamlnet_and s: Announce this web server as s and append
the Ocamlnet string.`As s: Announce this web server as sWhether to suppress `Broken_pipe errors. Instead
`Broken_pipe_ignore is reported.
Configuration values for the HTTP kernel
Default config:
config_max_reqline_length = 32768config_max_header_length = 65536config_max_trailer_length = 32768config_limit_pipeline_length = 5config_limit_pipeline_size = 65536config_announce_server = `Ocamlnetconfig_suppress_broken_pipe = falseModifies the passed config object as specified by the optional arguments
Exchange of HTTP messages
In fd one must pass the already connected socket. It must be in non-
blocking mode.
How to use this class: Basically, one invokes cycle until the whole
message exchange on fd is processed. cycle receives data from the
socket and sends data to the socket. There are two internal queues:
The receive queue stores parts of received requests as req_token.
One can take values from the front of this queue by calling receive.
The response queue stores http_response objects. Each of the objects
corresponds to a request that was received before. This queue is handled
fully automatically, but one can watch its length to see whether all responses
are actually transmitted over the wire.
The basic algorithm to process messages is:
let rec next_token () =
if proto # recv_queue_len = 0 then (
proto # cycle ();
next_token()
)
else
proto # receive()
let cur_token = ref (next_token()) in
while !cur_token <> `Eof do
(* Process first token of next request: *)
match !cur_token with
| `Req_header(req_line, header, resp) ->
(* Depending on [req_line], read further tokens until [`Req_end] *)
...
(* Switch to the first token of the next message: *)
cur_token := next_token()
| `Timeout -> ...
| `Bad_request_error(e,resp) ->
(* Generate 400 error, send it to [resp] *)
...
(* Switch to the first token of the next message: *)
cur_token := next_token()
| `Fatal_error e -> failwith "Crash"
| _ -> assert false
done;
while proto # resp_queue_len > 0 do
proto # cycle ();
done;
proto # shutdown()
See the file tests/easy_daemon.ml for a complete implementation of this.
As one can see, it is essential to watch the lengths of the queues in order
to figure out what has happened during cycle.
When the body of the request is empty, `Req_body tokens are omitted.
Note that for requests like GET that always have an empty body, it is
still possible that an errorneous client sends a body, and that `Req_body
tokens arrive. One must accept and ignore these tokens.
Error handling: For serious errors, the connection is immediately aborted.
In this case, receive returns a `Fatal_error token. Note that the
queued responses cannot be sent! An example of this is `Broken_pipe.
There is a large class of non-serious errors, esp. format errors
in the header and body. It is typical of these errors that one cannot determine
the end of the request properly. For this reason, the daemon stops reading
further data from the request, but the response queue is still delivered.
For these errors, receive returns a `Bad_request_error token.
This token contains a http_response object that must be filled with a
400 error response.
Looks at the file descriptor. If there is data to read from the descriptor,
and there is free space in the input buffer, additional data is read into
the buffer. It is also tried to interpret the new data as req_tokens,
and if possible, new req_tokens are appended to the receive queue.
If the response queue has objects, and there is really data one can send, and if the socket allows one to send data, it is tried to send as much data as possible.
The option block (default: 0) can be set to wait until data
can be exchanged with the socket. This avoids busy waiting. The number
is the duration in seconds to wait until the connection times out
(0 means not to wait at all, -1 means to wait infinitely). When a timeout
happens, and there is nothing to send, and the last request was fully
processed, receive will simply return `Timeout (i.e. when
waiting_for_next_message is true). Otherwise, the
fatal error `Timeout is generated.
Returns the first req_token from the receive queue. Raises
Recv_queue_empty when the queue is empty (= has no new data)
Peeks the first token, but leaves it in the queue.
Raises Recv_queue_empty when the queue is empty.
Returns the length of the receive queue (number of tokens)
Returns the length of the internal response queue (number of http_response
objects that have not yet fully processed)
Returns the number of unanswered requests = Number of received `Req_end tokens
minus number of responses in state `Processed. Note that pipeline_len
can become -1 when bad requests are responded.
Returns the (estimated) size of the input queue in bytes
Whether the kernel is currently waiting for the beginning of a new
arriving HTTP request. This is false while the request is being
received.
Suggests the calculation of a timeout value for input:
`Normal: The normal timeout value applies`Next_message: The timeout value applies while waiting for the next message`None: The connection is output-driven, no input timeout valueShuts the socket down. Note: the descriptor is not closed.
Process a timeout condition as cycle does
Stops the transmission of data. The receive queue is cleared and filled
with the two tokens `Fatal_error and `Eof.
The response queue is cleared. The cycle
method will return immediately without doing anything.
Returns true iff the protocol engine is interested in new data from the
socket. Returns false after EOF and after errors.
Returns true iff the protocol engine has data to output to the socket
Returns true when a lingering close operation is needed to reliably shut
down the socket. In many cases, this expensive operation is not necessary.
See the class lingering_close below.
For testing: returns a list of tokens indicating into which cases the program ran.
The core event loop of the HTTP daemon
Closes a file descriptor using the "lingering close" algorithm
Usage:
while lc # lingering do lc # cycle ~block:true () done
Reads data from the file descriptor until EOF or until a fixed timeout
is over. Finally, the descriptor is closed. If block is set, the method
blocks until data is available. (Default: false)
Whether the socket is still lingering
Closes a file descriptor using the "lingering close" algorithm.
The optional preclose function is called just before Unix.close.