Base64, Quoted Printable, URL encoding, HTML escaping
Base64 encoding as described in RFC 2045
Compute the "base 64" encoding of the given string argument. Note that the result is a string that only contains the characters a-z, A-Z, 0-9, +, /, =, and optionally spaces, CR and LF characters.
If pos
and/or len
are passed, only the substring starting at
pos
(default: 0) with length len
(default: rest of the string)
is encoded.
The result is divided up into lines not longer than linelength
(without counting the line separator); default: do not divide lines.
If linelength
is smaller than 4, no line division is performed.
If linelength
is not divisible by 4, the produced lines are a
bit shorter than linelength
.
If crlf
(default: false) the lines are ended by CRLF; otherwise
they are only ended by LF.
(You need the crlf option to produce correct MIME messages.)
Decodes the given string argument.
If pos
and/or len
are passed, only the substring starting at
pos
(default: 0) with length len
(default: rest of the string)
is decoded.
If url_variant
(default: true
) is set, the functions also
accepts the characters '-' and '.' as produced by url_encode
.
If accept_spaces
(default: false
) is set, the function ignores
white space contained in the string to decode (otherwise the
function fails if it finds white space). Furthermore, the character
'>' is considered as "space", too (so you don't have trouble with
mbox mailboxes that accidentally quote "From").
This pipe encodes the data written into the pipe.
linelength
and crlf
work as in encode
.
This pipe decodes the data written into the pipe.
url_variant
and accept_spaces
work as in decode
.
This module implements the "Quoted Printable" encoding as described in RFC 2045.
This implementation assumes that the encoded string has a text MIME
type. On input both CR/LF and LF are accepted as end-of-line (eol) terminators,
but the output normalizes the eol delimiter as the crlf
argument
specifies. Note that this implies that
crlf
, the output uses CR/LF as line separator as MIME prescribesEncodes the string and returns it.
Since OcamlNet 0.98, soft line breaks are added to the output to ensure that all output lines have a length <= 76 bytes.
Note unsafe characters:
As recommended by RFC 2045, the characters !#$@[]^`|{}~
and the double quotes
are additionally represented as hex tokens.
Furthermore, the letter 'F' is considered as unsafe if it
occurs at the beginning of the line, so the encoded text
never contains the word "From" at the beginning of a line.
If pos
and/or len
are passed, only the substring starting at
pos
(default: 0) with length len
(default: rest of the string)
is encoded.
If crlf
is set (the default), the output text uses CR/LF as
line separator. Otherwise only LF is used.
Decodes the string and returns it.
Most format errors cause an Invalid_argument
exception.
If pos
and/or len
are passed, only the substring starting at
pos
(default: 0) with length len
(default: rest of the string)
is decoded.
This pipe encodes the data written into the pipe.
The "Q" encoding as described by RFC 2047.
Note: All characters except alphanumeric characters are protected by hex tokens. In particular, spaces are represented as "=20", not as "_".
Encoding/Decoding within URLs:
The following two functions perform the '%'-substitution for characters that may otherwise be interpreted as metacharacters.
According to: RFC 1738, RFC 1630
Option plus
: This option has been added because there are some
implementations that do not map ' ' to '+', for example Javascript's
escape
function. The default is true
because this is the RFC-
compliant definition.
Option plus
: Whether '+' is converted to space. The default
is true. If false, '+' is returned as it is.
The optional arguments pos
and len
may restrict the string
to process to this substring.
Option plus
: Whether spaces are converted to '+'. The default
is true. If false, spaces are converted to "%20", and
only %xx sequences are produced.
URL-encoded parameters:
The following two functions create and analyze URL-encoded parameters.
Format: name1=val1&name2=val2&...
The argument is a list of (name,value) pairs. The result is the single URL-encoded parameter string.
The argument is the URL-encoded parameter string. The result is the corresponding list of (name,value) pairs. Note: Whitespace within the parameter string is ignored. If there is a format error, the function fails.
Encodes characters that need protection by converting them to
entity references. E.g. "<"
is converted to "<"
.
As the entities may be named, there is a dependency on the character
set.
Legacy functions:
These functions have a more general interface and should be preferred in new programs.
The string contains '<', '>', '"', '&' and the control characters 0-8, 11-12, 14-31, 127.
The input string that is encoded as in_enc
is recoded to
out_enc
, and the following characters are encoded as HTML
entity (&name;
or &#num;
):
unsafe_chars
out_enc
. By
default (out_enc=`Enc_usascii
), only ASCII characters can be
represented, and thus all code points >= 128 are encoded as
HTML entities. If you pass out_enc=`Enc_utf8
, all characters
can be represented.For example, the string "(a<b) & (c>d)"
is encoded as
"(a<b) & (c>d)"
.
It is required that out_enc
is an ASCII-compatible encoding.
The option prefer_name
selects whether named entities (e.g. <
)
or numeric entities (e.g. <
) are prefered.
The efficiency of the function can be improved when the same encoding is applied to several strings. Create a specialized encoding function by passing all arguments up to the unit argument, and apply this function several times. For example:
let my_enc = encode ~in_enc:`Enc_utf8 () in
let s1' = my_enc s1 in
let s2' = my_enc s2 in ...
The input string is recoded from in_enc
to out_enc
, and HTML
entities (&name;
or &#num;
) are resolved. The input encoding
in_enc
must be ASCII-compatible.
By default, the function knows all entities defined for HTML 4 (this
can be changed using entity_base
, see below). If other
entities occur, the function lookup
is called and the name of
the entity is passed as input string to the function. It is
expected that lookup
returns the value of the entity, and that this
value is already encoded as out_enc
.
By default, lookup
raises a Failure
exception.
If a character cannot be represented in the output encoding,
the function subst
is called. subst
must return a substitute
string for the character.
By default, subst
raises a Failure
exception.
The option entity_base
determines which set of entities are
considered as the known entities that can be decoded without
help by the lookup
function: `Html
selects all entities defined
for HTML 4, `Xml
selects only <
, >
, &
, "
,
and '
,
and `Empty
selects the empty set (i.e. lookup
is always called).