Non-blocking streaming JSON codec.
Version 0.9.1 - Daniel Bünzli <daniel.buenzli at erratique.ch>
The type for JSON lexemes.
start and end arrays and
and end objects.
`Name is for the member names of objects.
A well-formed sequence of lexemes belongs to the language of
json = object / array object = `Os *member `Oe member = (`Name s) value array = `As *value `Ae value = `Null / `Bool b / `Float f / `String s / object / array
A decoder returns only well-formed sequences of
`Errors are returned. The
UTF-16, UTF-16LE and
UTF-16BE encoding schemes are supported. The strings of decoded
`String lexemes are however always UTF-8 encoded. In
these strings, characters originally escaped in the input are in
their unescaped representation.
An encoder accepts only well-formed sequences
of lexemes or
Invalid_argument is raised. Only the UTF-8
encoding scheme is supported. The strings of encoded
`String lexemes are assumed to be immutable and must be UTF-8
encoded, this is not checked by the module. In these strings,
the delimiter characters
aswell as the control characters
U+0000-U+001F are automatically
escaped by the encoders, as mandated by the standard.
The type for decoding errors.
The type for Unicode encoding schemes.
The type for JSON decoders.
decode d is:
`Manualsource and awaits for more input. The client must use Manual.src to provide it.
`Lexeme lif a lexeme
`Endif the end of input was reached.
`Error eif a decoding error occured. If the client is interested in a best-effort decoding it can still continue to decode after an error (see errorrecovery) although the resulting sequence of
`Lexemes is undefined and may not be well-formed.
Note. Repeated invocation always eventually returns
in case of errors.
The type for JSON encoders.
encode e v is:
`Manualdestination and needs more output storage. The client must use Manual.dst to provide a new buffer and then call encode with
`Okwhen the encoder is ready to encode a new
`Manual destinations, encoding
`End always returns
the client should as usual use Manual.dst and continue with
`Ok is returned at which point Manual.dst_rem
e is guaranteed
to be the size of the last provided buffer (i.e. nothing was written).
Invalid_argument if a non well-formed
sequence of lexemes is encoded or if
encoded after a
Manual input sources and output destinations.
Warning. Use only with
`Manual decoders and encoders.
Codec with comments and whitespace.
comments. The latter is non-standard JSON, fail on
decoding if you want to process whitespace but stick to the standard.
The uncut codec preserves as much of the original input as
possible. Perfect round-trip with
Jsonm is however impossible for
the following reasons:
':'and value separators
",". If you just reencode the sequence of decodes, whitespace and comments may (harmlessly, but significantly) commute with these separators.
'\n') for newlines.
`Floatlexemes may be rewritten differently by the encoder.
The uncut data model is the same as the regular data model, except that before or after any lexeme you may decode/encode one or more:
`White w, representing JSON whitespace
w. On input the sequence CR (
U+000D) and CRLF (<
U+000A>) are normalized to
U+000A. The string
wmust be a sequence of
cis the comment's content without the starting
//and the ending newline. The string
cmust not contain any newline.
cis the comment's content without the starting
/*and the ending
*/. The string
cmust not contain the sequence
Decoders parse valid JSON with the following limitations:
`Commentare limited by Sys.max_string_length. There is no built-in protection against the fact that the internal OCaml
Buffer.tvalue may raise
Failureon Jsonm.decode. This should however only be a problem on 32-bits platforms if your strings are greater than 16Mo.
Position tracking assumes that each decoded Unicode scalar value
has a column width of 1. The same assumption may not be made by
the display program (e.g. for
emacs' compilation mode you need
The newlines LF (
U+000A), CR (
U+000D), and CRLF are all normalized
to LF internally. This may have an impact in some corner
cases. For example the invalid escape sequence
a string will be reported as being
Encoders produce valid JSON provided the client ensures that the following holds.
`Floatlexemes must not be, Pervasives.nan, Pervasives.infinity or Pervasives.neg_infinity. They are encoded with the format string
"%.16g", this allows to roundtrip all the integers that can be precisely represented in OCaml
`Whitemust be made of JSON whitespace and
`Commentmust never be encoded.
After a decoding error, if best-effort decoding is performed. The following happens before continuing:
`Illegal_BOM, the initial BOM is skipped.
`Illegal_string_uchar, a Unicode replacement character (
U+FFFD) is substituted to the illegal sequence.
`Expected r, input is discarded until a synchronyzing lexeme that depends on
`Unclosed, the end of input is reached, further decodes will be
The result of
trip src dst has the JSON from
src written on
let trip ?encoding ?minify (src : [`Channel of in_channel | `String of string]) (dst : [`Channel of out_channel | `Buffer of Buffer.t]) = let rec loop d e = match Jsonm.decode d with | `Lexeme _ as v -> ignore (Jsonm.encode e v); loop d e | `End -> ignore (Jsonm.encode e `End); `Ok | `Error err -> `Error (Jsonm.decoded_range d, err) | `Await -> assert false in let d = Jsonm.decoder ?encoding src in let e = Jsonm.encoder ?minify dst in loop d e
trip_fd does the same but between Unix
let trip_fd ?encoding ?minify (fdi : Unix.file_descr) (fdo : Unix.file_descr) = let rec encode fd s e v = match Jsonm.encode e v with `Ok -> () | `Partial -> let rec unix_write fd s j l = let rec write fd s j l = try Unix.single_write fd s j l with | Unix.Unix_error (Unix.EINTR, _, _) -> write fd s j l in let wc = write fd s j l in if wc < l then unix_write fd s (j + wc) (l - wc) else () in unix_write fd s 0 (String.length s - Jsonm.Manual.dst_rem e); Jsonm.Manual.dst e s 0 (String.length s); encode fd s e `Await in let rec loop fdi fdo ds es d e = match Jsonm.decode d with | `Lexeme _ as v -> encode fdo es e v; loop fdi fdo ds es d e | `End -> encode fdo es e `End; `Ok | `Error err -> `Error (Jsonm.decoded_range d, err) | `Await -> let rec unix_read fd s j l = try Unix.read fd s j l with | Unix.Unix_error (Unix.EINTR, _, _) -> unix_read fd s j l in let rc = unix_read fdi ds 0 (String.length ds) in Jsonm.Manual.src d ds 0 rc; loop fdi fdo ds es d e in let ds = String.create 65536 (* UNIX_BUFFER_SIZE in 4.0.0 *) in let es = String.create 65536 (* UNIX_BUFFER_SIZE in 4.0.0 *) in let d = Jsonm.decoder ?encoding `Manual in let e = Jsonm.encoder ?minify `Manual in Jsonm.Manual.dst e es 0 (String.length es); loop fdi fdo ds es d e
The result of
memsel names src is the list of string values of
src that have their name in
names. In this example,
decoding errors are silently ignored.
let memsel ?encoding names (src : [`Channel of in_channel | `String of string]) = let rec loop acc names d = match Jsonm.decode d with | `Lexeme (`Name n) when List.mem n names -> begin match Jsonm.decode d with | `Lexeme (`String s) -> loop (s :: acc) names d | _ -> loop acc names d end | `Lexeme _ | `Error _ -> loop acc names d | `End -> List.rev acc | `Await -> assert false in loop  names (Jsonm.decoder ?encoding src)
A generic OCaml representation of JSON text is the following one.
type json = [ `Null | `Bool of bool | `Float of float| `String of string | `A of json list | `O of (string * json) list ]
The result of
json_of_src src is the JSON text from
src in this
representation. The function is tail recursive.
exception Escape of ((int * int) * (int * int)) * Jsonm.error let json_of_src ?encoding (src : [`Channel of in_channel | `String of string]) = let dec d = match Jsonm.decode d with | `Lexeme l -> l | `Error e -> raise (Escape (Jsonm.decoded_range d, e)) | `End | `Await -> assert false in let rec value v k d = match v with | `Os -> obj  k d | `As -> arr  k d | `Null | `Bool _ | `String _ | `Float _ as v -> k v d | _ -> assert false and arr vs k d = match dec d with | `Ae -> k (`A (List.rev vs)) d | v -> value v (fun v -> arr (v :: vs) k) d and obj ms k d = match dec d with | `Oe -> k (`O (List.rev ms)) d | `Name n -> value (dec d) (fun v -> obj ((n, v) :: ms) k) d | _ -> assert false in let d = Jsonm.decoder ?encoding src in try `JSON (value (dec d) (fun v _ -> v) d) with | Escape (r, e) -> `Error (r, e)
The result of
json_to_dst dst json has the JSON text
dst. The function is tail recursive.
let json_to_dst ~minify (dst : [`Channel of out_channel | `Buffer of Buffer.t ]) (json : json) = let enc e l = ignore (Jsonm.encode e (`Lexeme l)) in let rec value v k e = match v with | `A vs -> arr vs k e | `O ms -> obj ms k e | `Null | `Bool _ | `Float _ | `String _ as v -> enc e v; k e and arr vs k e = enc e `As; arr_vs vs k e and arr_vs vs k e = match vs with | v :: vs' -> value v (arr_vs vs' k) e |  -> enc e `Ae; k e and obj ms k e = enc e `Os; obj_ms ms k e and obj_ms ms k e = match ms with | (n, v) :: ms -> enc e (`Name n); value v (obj_ms ms k) e |  -> enc e `Oe; k e in let e = Jsonm.encoder ~minify dst in let finish e = ignore (Jsonm.encode e `End) in match json with `A _ | `O _ as json -> value json finish e | _ -> invalid_arg "invalid json text"