Up

module Netmime

: sig

Netmime contains high-level classes and functions to process mail and MIME messages.

Contents

The tutorial has been moved to [root:Netmime_tut].

#

Types

#
type store = [
| `Memory
| `File of string
]

Specifies where to store the body of a mail message. `Memory means in-memory, `File name means in the file name. The body is stored in decoded form (i.e. without transfer encoding).

#
exception Immutable of string

Raised if it is tried to modify a read-only value. The string denotes the function or method where the incident happened.

MIME headers and bodies are defined in two steps. First the subtype describing read access is defined (mime_header_ro, and mime_body_ro), and after that the full class type including write access is defined (mime_header, and mime_body).

The idea is that you can write functions that take an ro value as input to indicate that they do not modify the value. For example:

   let number_of_fields (h:#mime_header_ro) =
     List.length (h#fields)

This function accepts both mime_header, and mime_header_ro values as input, but the typing ensures that the function cannot mutate anything.

There is another way to ensure that a header or body is not modified. The read-only flag can be set when creating the object, and this flag causes that all trials to modify the value will raise the exception Immutable. Of course, such trials of mutation are only detected at run-time.

The advantage of the read-only flag is that it even works if mutation depends on a condition, but it can be ensured that this condition is never true. Furthermore, typing is much simpler (getting subtyping correct can be annoying).

#
class type mime_header_ro =
#
method fields : (string * string) list
#
method field : string -> string
#
method multiple_field : string -> string list

The current fields of the header. fields returns the complete header. field name returns the value of the field, or raises Not_found. multiple_field name returns all fields with the same name.

Note that field names are case-insensitive; field "content-length", and field "CONTENT-LENGTH" will return the same field. However, the method fields returns the original field names, without adjustment of the case.

The order of the fields is preserved.

Access methods for frequent standard fields.

These methods will raise Not_found if the fields are not present.

#
method content_length : unit -> int

Returns the Content-length field as integer

#
method content_type : unit -> string * (string * Mimestring.s_param) list

Returns the Content-type as parsed value. The left value of the pair is the main type, and the right value is the list of parameters. For example, for the field value "text/plain; charset=utf-8" this method returns ("text/plain", ["charset", p]) where p is an opaque value with Mimestring.param_value p = "utf-8".

#
method content_disposition : unit -> string * (string * Mimestring.s_param) list

Returns the Content-disposition field as parsed value. The left value is the main disposition, and the right value is the list of parameters. For example, for the field value "attachment; filename=xy.dat" this method returns ("attachment", ["filename", p]) where p is an opaque value with Mimestring.param_value p = "xy.dat".

#
method content_transfer_encoding : unit -> string

Returns the Content-transfer-encoding as string

This is the read-only version of a MIME header. There are only methods to read the header fields.

#
class type mime_header =
inherit mime_header_ro
#
method ro : bool

whether the header is read-only or not

#
method set_fields : (string * string) list -> unit
#
method update_field : string -> string -> unit
#
method update_multiple_field : string -> string list -> unit
#
method delete_field : string -> unit

These methods modify the fields of the header. If the header is read-only, the exception Immutable will be raised.

set_fields replaces the current fields with a new list of (name,value) pairs. update_field name value replaces all fields of the passed name with the single setting (name,value), or adds this setting to the list. update_multiple_field name values replaces all fields of the passed name with the list of values, or adds this list. Finally, delete_field name deletes all fields of the passed name. Nothing happens if there is no such field.

Both update_field and update_multiple_field first replace existing values by the new ones without changing the order of the fields in the header. Additional values are inserted after the last existing value, or at the end of the header.

A MIME header with both read and write method. It is still possible, however, to set the read-only flag to make this kind of header immutable, too.

#
class type mime_body_ro =
#
method value : string

The value method returns the _decoded_ body, i.e. transfer encodings are removed before the value is passed back.

When the body is stored in an external file, this method reads the complete file into memory.

#
method store : store

Where the body is actually stored.

#
method open_value_rd : unit -> Netchannels.in_obj_channel

Opens the value for reading. This works independently of where the body is stored. For example, to read the body line by line:

       let ch = body # open_value_rd () in
       try
         while true do
           let line = ch # input_line() in
           ... (* do something *)
         done;
         assert false; (* never reached *)
       with
         End_of_file ->
           ch # close_in()

As value, this method returns the value in decoded form. This method is quite economical with the resources, and takes only as much memory as needed for the channel operations.

#
method finalize : unit -> unit

After the body has been finalized, it cannot be accessed any longer. External resources (files) are deallocated, if they are seen as temporary.

This is the read-only version of a MIME body. There are only methods to read the body contents.

The value of the body can be returned either as string, or as object channel. Both ways are possible independently of where the value is stored, in-memory, or as external file.

#
class type mime_body =
inherit mime_body_ro
#
method ro : bool

whether this body is read-only or not

#
method set_value : string -> unit

Sets the value. If the value is immutable, the exception Immutable will be raised.

The passed string must be in decoded form. When the body is stored in an external file, the file is overwritten.

#
method open_value_wr : unit -> Netchannels.out_obj_channel

Opens the value for writing. The current value is overwritten. If the value is immutable, the exception Immutable will be raised.

For example, to copy the file f into the value:

       let ch = body # open_value_wr() in
       let f_ch = new Netchannels.input_file f in
       ch # output_channel f_ch;
       f_ch # close_in();
       ch # close_out();

A MIME body with both read and write method. It is still possible, however, to set the read-only flag to make this kind of body immutable, too.

The value of the body can be set either by a string, or by writing to an object channel. Both ways are possible independently of where the value is stored, in-memory, or as external file.

One can consider the pair (mime_header, mime_body) as simple MIME message with one header and one body. Of course, this simple representation does not support multi-part messages (attachments). For that reason, the complex_mime_message was invented: The body can be further structured as a sequence of parts that are complex messages themselves.

For example, a mail message with an attachment is usually represented as

   (mail_header, `Parts [ (main_header, `Body main_body);
                          (att_header, `Body att_body) ] )

Here, mail_header is the real header of the mail message. main_header is the header of the main message, usually only containing the content type of main_body, the body of the main message. The attachment has also its own att_header, again usually only containing the content type, and the data of the attachment can be found in att_body.

Nowadays, mails have often even a more complicated structure with `Parts containing nested `Parts. As complex_mime_message is recursive, any kind of nesting can be easily represented.

#
type complex_mime_message = mime_header * complex_mime_body
#
type complex_mime_body = [
| `Body of mime_body
| `Parts of complex_mime_message list
]
#
type complex_mime_message_ro = mime_header_ro * complex_mime_body_ro
#
type complex_mime_body_ro = [
| `Body of mime_body_ro
| `Parts of complex_mime_message_ro list
]

The read-only view of a complex_mime_message

Note: `Parts [], i.e. `Parts together with an empty list, is considered as illegal. Such a value cannot be transformed into printable text.

#
type mime_message = mime_header * [
| `Body of mime_body
]

Simple MIME message, in a form that is compatible with complex ones.

#
type mime_message_ro = mime_header_ro * [
| `Body of mime_body_ro
]

Read-only variant of simple messages

#

Classes

#
class basic_mime_header : ?ro:bool option -> (string * string) list -> mime_header

An implementation of mime_header.

The argument is the list of (name,value) pairs of the header.

Example: Create a MIME header with only the field "Content-type":

let h = new basic_mime_header ["content-type", "text/plain"]

Example: Set the field "Subject":

h # update_field "subject" "The value of this field"

This mime_header implementation bases on a mixture of a Map data structure and a doubly linked list. The efficiency of the operations (n=number of fields; m=average number of values per field; n*m=total number of values):

  • new, set_fields: O(m * n * log n), but the construction of the dictionary is deferred until the first real access
  • field: O(log n)
  • multiple_field: O(log n + m)
  • fields: O(n * m)
  • update_field, update_multiple_field: O(log n + m)
  • delete_field: O(n + m)
ro whether the header is read-only (default: false)
#
class memory_mime_body : ?ro:bool option -> string -> mime_body

An implementation of mime_body where the value is stored in-memory.

The argument is the initial (decoded) value of the body. The method store returns `Memory.

Example: To create a body from a string, call

new memory_mime_body "The value as string"
ro whether the body is read-only (default: false)
#
class file_mime_body : ?ro:bool option -> ?fin:bool option -> string -> mime_body

An implementation of mime_body where the value is stored in an external file.

The argument is the name of the file containing the (decoded) value. The method store returns `File filename. The method value loads the contents of the file and returns them as string.

Example: To create a body from the file "f", call

new file_mime_body "f"
ro whether the body is read-only (default: false)
fin whether to delete the file when the finalize method is called (default: false)
#

Parsing MIME messages

#
val read_mime_header : ?unfold:bool -> ?strip:bool -> ?ro:bool -> Netstream.in_obj_stream -> mime_header

Decodes the MIME header that begins at the current position of the netstream, and returns the header as class basic_mime_header. After returning, the stream is advanced to the byte following the empty line terminating the header.

Example: To read the header at the beginning of the file "f", use:

     let ch = new Netchannels.input_channel (open_in "f") in
     let stream = new Netstream.input_stream ch in
     let h = read_mime_header stream in
     ...
     stream#close_in();    (* no need to close ch *)

Note that although the stream position after parsing is exactly known, the position of ch cannot be predicted.

unfold whether linefeeds are replaced by spaces in the values of the header fields (Note: defaults to false here in contrast to Mimestring.scan_header!)
strip whether whitespace at the beginning and at the end of the header fields is stripped
ro whether the returned header is read-only (default: false)

Hint: To write the header h into the channel ch, use

Mimestring.write_header ch h#fields

Link: Mimestring.write_header

#
type multipart_style = [
| `None
| `Flat
| `Deep
]

How to parse multipart messages:

  • `None: Do not handle multipart messages specially. Multipart bodies are not further decoded, and returned as `Body b where b is the transfer-encoded text representation.
  • `Flat: If the top-level message is a multipart message, the parts are separated and returned as list. If the parts are again multipart messages, these inner multipart messages are not furher decoded and returned as `Body b.
  • `Deep: Multipart messages are recursively decoded and returned as tree structure.

This value determines how far the complex_mime_message structure is created for a parsed MIME message. `None means that no parts are decoded, and messages have always only a simple `Body b, even if b is in reality a multi-part body. With `Flat, the top-level multi-part bodies are decoded (if found), and messages can have a structured `Parts [_, `Body b1; _, `Body b1; ...] body. Finally, `Deep allows that inner multi-part bodies are recursively decoded, and messages can have an arbitrarily complex form.

#
val decode_mime_body : mime_header_ro -> Netchannels.out_obj_channel -> Netchannels.out_obj_channel

let ch' = decode_mime_body hdr ch: According to the value of the Content-transfer-encoding header field in hdr the encoded MIME body written to ch' is decoded and transferred to ch.

Handles 7bit, 8bit, binary, quoted-printable, base64.

Example: The file "f" contains base64-encoded data, and is to be decoded and to be stored in "g":

     let ch_f = new Netchannels.input_channel (open_in "f") in
     let ch_g = new Netchannels.output_channel (open_out "g") in
     let hdr = new basic_mime_header ["content-transfer-encoding", "base64" ] in
     let ch = decode_mime_body hdr ch_g in
     ch # output_channel ch_f;
     ch # close_out();
     ch_g # close_out();
     ch_f # close_in();

Note: This function is internally used by read_mime_message to decode bodies. There is usually no need to call it directly.

#
val storage : ?ro:bool -> ?fin:bool -> store -> mime_body * Netchannels.out_obj_channel

Creates a new storage facility for a mime body according to store. This function can be used to build the storage_style argument of the class read_mime_message (below). For example, this is useful to store large attachments in external files, as in:

     let storage_style hdr = 
       let filename = hdr ... (* extract from hdr *) in
       storage (`File filename)
ro whether the returned mime_bodies are read-only or not. Note that it is always possible to write into the body using the returned out_obj_channel regardless of the value of ~ro. Default: false
fin whether to finalize bodies stored in files. Default: false
#
val read_mime_message : ?unfold:bool -> ?strip:bool -> ?ro:bool -> ?multipart_style:multipart_style -> ?storage_style:(mime_header -> mime_body * Netchannels.out_obj_channel) -> Netstream.in_obj_stream -> complex_mime_message

Decodes the MIME message that begins at the current position of the passed netstream. It is expected that the message continues until EOF of the netstream.

Multipart messages are decoded as specified by multipart_style (see above).

Message bodies with content-transfer-encodings of 7bit, 8bit, binary, base64, and quoted-printable can be processed. The bodies are stored without content-transfer-encoding (i.e. in decoded form), but the content-transfer-encoding header field is not removed from the header.

The storage_style function determines where every message body is stored. The corresponding header of the body is passed to the function as argument; the result of the function is a pair of a new mime_body and an out_obj_channel writing into this body. You can create such a pair by calling storage (above).

By default, the storage_style is storage ?ro `Memory for every header. Here, the designator `Memory means that the body will be stored in an O'Caml string. The designator `File fn would mean that the body will be stored in the file fn. The file would be created if it did not yet exist, and it would be overwritten if it did already exist.

Note that the storage_style function is called for every non-multipart body part.

Large message bodies (> maximum string length) are supported if the bodies are stored in files. The memory consumption is optimized for this case, and usually only a small constant amount of memory is needed.

Example:

Parse the MIME message stored in the file f:

     let m = read_mime_message 
               (new input_stream (new input_channel (open_in f)))
unfold whether linefeeds are replaced by spaces in the values of the header fields (Note: defaults to false here in contrast to [root:Mimestring].scan_header!)
strip whether whitespace at the beginning and at the end of the header fields is stripped
ro Whether the created MIME headers are read-only or not. Furthermore, the default storage_style uses this parameter for the MIME bodies, too. However, the MIME bodies may have a different read-only flag in general.
#

Printing MIME Messages

#
val encode_mime_body : ?crlf:bool -> mime_header_ro -> Netchannels.out_obj_channel -> Netchannels.out_obj_channel

let ch' = encode_mime_body hdr ch: According to the value of the Content-transfer-encoding header field in hdr the unencoded MIME body written to ch' is encoded and transferred to ch.

Handles 7bit, 8bit, binary, quoted-printable, base64.

For an example, see decode_mime_body which works in a similar way but performs decoding instead of encoding.

crlf if set (this is by default the case) CR/LF will be used for end-of-line (eol) termination, if not set LF will be used. For 7bit, 8bit and binary encoding the existing eol delimiters are not rewritten, so this option has only an effect for quoted-printable and base64.
#
val write_mime_message : ?wr_header:bool -> ?wr_body:bool -> ?nr:int -> ?ret_boundary:string Pervasives.ref -> ?crlf:bool -> Netchannels.out_obj_channel -> complex_mime_message -> unit

Writes the MIME message to the output channel. The content-transfer- encoding of the leaves is respected, and their bodies are encoded accordingly. The content-transfer-encoding of multipart messages is always "fixed", i.e. set to "7bit", "8bit", or "binary" depending on the contents.

The function fails if multipart messages do not have a multipart content type field (i.e. the content type does not begin with "multipart"). If only the boundary parameter is missing, a good boundary parameter is added to the content type. "Good" means here that it is impossible that the boundary string occurs in the message body if the content-transfer-encoding is quoted-printable or base64, and that such an occurrence is very unlikely if the body is not encoded. If the whole content type field is missing, a "multipart/mixed" type with a boundary parameter is added to the printed header.

Note that already existing boundaries are used, no matter whether they are of good quality or not.

No other header fields are added, deleted or modified. The mentioned modifications are _not_ written back to the passed MIME message but only added to the generated message text.

It is possible in some cases that the boundary does not work (both the existing boundary, and the added boundary). This causes that a wrong and unparseable MIME message is written. In order to ensure a correct MIME message, it is recommended to parse the written text, and to compare the structure of the message trees. It is, however, very unlikely that a problem arises.

Note that if the passed message is a simple message like (_,`Body _), and if no content-transfer-encoding is set, the written message might not end with a linefeed character.

wr_header If true, the outermost header is written. Inner headers of the message parts are written unless ~wr_body=false.
wr_body If true, the body of the whole message is written; if false, no body is written at all.
nr This argument sets the counter that is included in generated boundaries to a certain minimum value.
ret_boundary if passed, the boundary of the outermost multipart message is written to this reference. (Internally used.)
crlf if set (this is by default the case) CR/LF will be used for end-of-line (eol) termination, if not set LF will be used. The eol separator is used for the header, the multipart framing, and for bodies encoded as quoted-printable or base64. Other eol separators are left untouched.
end