Irmin public API.
Irmin
is a library to design and use persistent stores with
built-in snapshot, branching and reverting mechanisms. Irmin uses
concepts similar to Git but it exposes
them as a high level library instead of a complex command-line
frontend. It features a bidirectional Git backend,
fully-compatible with the usual Git tools and workflows.
Irmin is designed to use a large variety of backends. It is written in pure OCaml and does not depend on external C stubs; it is thus very portable and aims is to run everywhere, from Linux to Xen unikernels.
Consult the basics and examples of use for a quick start.
Release %%VERSION%% - %%MAINTAINER%%
The version of the library.
Serializable data with reversible human-readable representations.
Tasks are used to keep track of the origin of reads and writes in the store. Every high-level operation is expected to have its own task, which is passed to every low-level calls.
Get the task date.
The date is computed by the user user when calling the
create function. When available,
Unix.gettimeofday ()
is a good value for such date. On more
esoteric platforms, any monotonic counter is a fine value as
well. On the Git backend, the date will be translated into the
commit Date field.
Get the task owner.
The owner identifies the entity (human, unikernel, process, thread, etc) performing an operation. For the Git backend, this will be directly translated into the Author field.
Get the task unique identifier.
By default, it is freshly generated on each call to create. That identifier is useful for debugging purposes, for instance to relate debug lines to the tasks which cause them, and might appear in one line of the commit message for the Git backend.
Merge
provides functions to build custom 3-way merge operators
for various user-defined contents.
Type for merge results.
Exception which might be raised when merging.
Create a default merge function. This is a simple merge functions which support changes in one branch at the time:
t1=t2
then the result of the merge is `OK t1
;t1=old
then the result of the merge is `OK t2
;t2=old
then return `OK t1
;`Conflict
.The default string merge function. Do not anything clever, just
compare the strings using the default
merge function.
The type for counter values. It is expected that the only valid operations on counters are increment and decrement. The following merge functions ensure that the counter semantics is preserved: i.e. it ensures that the number of increments and decrements is preserved.
We consider that the only valid operations for maps and association lists are:
We thus assume that no operation on maps is modifying the key names. So the following merge functions ensures that (i) new bindings are preserved (ii) removed bindings stay removed and (iii) modified bindings are merged using the merge function of values.
Note: We only consider sets of bindings, instead of multisets. Application developer should take care of concurrent adding and removal of similar bindings themselves, by using the appropriate multi-sets.
Useful merge operators.
open Irmin.Merge.OP
at the top of your file to use them.
The type for backend-specific configuration values.
Every backend has different configuration options, which are kept abstract to the user.
An Irmin store is automatically built from a number of lower-level stores, implementing fewer operations, such as append-only and read-write stores. These low-level stores are provided by various backends.
Read-only stores.
Type for stores.
Type for keys.
Type for values.
create config task
is a function returning fresh store
handles, with the configuration config
and fresh tasks
computed using task
. config
is provided by the backend and
task
is the provided by the user. The operation might be
blocking, depending on the backend.
Read-write stores.
watch t k
is the stream values associated to the key k
. The
stream return a new value every time the bindings is modified in
t
. It return None
if the binding is removed.
FIXME: add move
Branch-consistent stores.
They are two kinds of branch consistent stores: the persistent and the temporary ones.
The persistent stores are associated to a branch name, or tag. The tag value is updated every time the store is updated, so every handle connected or which will be connected to the same tag will see the changes.
These stores can be created using the of_tag functions.
Type for branch names, or tags. Tags usually share a common global namespace and that's the user responsibility to avoid name-clashes.
The temporary stores do not use global branch names. Instead, the operations are relative to a given store revision: a head. Every operation updates the store as a normal persistent store, but the value of head is only kept into the local store handle and it is not persisted into the store -- this means it cannot be easily shared by concurrent processes or loaded back in the future. In the Git terminology, these store handle are said to be detached heads.
Type for head values.
Return the head commit. This works for both persistent and temporary stores. In the case of a persistent store, this involves looking into the value associated to the branch tag, so this might blocks. In the case of a temporary store, it is a simple (non-blocking) look-up in the store handle local state.
Type for store slices.
export t ~depth ~min ~max
exports the store slice between
min
and max
, using at most depth
history depth (starting
from the max).
If max
is not specified, use the current heads
. If min
is
not specified, use an unbound past (but can be still limited by
depth
).
depth
is used to limit the depth of the commit history. None
here means no limitation.
If full
is set (default is true) the full graph, including the
commits, nodes and contents, is exported, otherwise it is the
commit history graph only.
Hashing functions.
Hash
provides user-defined hash function to digest serialized
contents. Some backends might be parameterize by such
a hash functions, other might work with a fixed one (for instance,
the Git format use only SHA1).
An SHA1 implementation is available to pass to the backends.
Exception raised when parsing a human-readable representation of a hash.
Contents
specifies how user-defined contents need to be serializable and mergeable.
The user need to provide:
to_sexp
function for debugging purposes (that might expose
the internal state of abstract values)to_json
and of_json
functions, to be used by the
REST interface.size_of
, write
and read
functions, to
serialize data on disk or to send it over the network.merge
function, to handle conflicts between multiple
versions of the same contents.Default contents for string, JSON and C-buffers like values are provided.
Contents store.
merge t
lifts the merge functions defined on contents
values to contents key. The merge function will: (i) read
the values associated with the given keys, (ii) use the
merge function defined on values and (iii) write the
resulting values into the store to get the resulting key.
If any of these operation fails, return `Conflict
.
User-defined tags. Tags are used to specify branch names in an Irmin store.
STORE
specifies the signature of tag stores.
A tag store is a key / value store, where keys are names created by users (and/or global names created by convention) and values are keys from the block store.
A typical Irmin application should have a very low number of keys in the tag store.
An Irmin store is a branch-consistent store where keys are lists of steps.
An example is a Git repository where keys are filenames, i.e.
list of '\'
-separated strings. More complex examples are
structured values, where steps might contains first-class fields
accessors and array offsets.
Irmin provides the follow gin features:
Private
defines functions only useful for creating new
backends. If you are just using the library (and not developing a
new backend), you should not use this module.
Backend configuration.
A backend configuration is a set of [root:keys] mapping to typed values. Backends define their own keys.
A configuration converter transforms a string value to an OCaml value and vice-versa. There are a few built-in converters.
The type for configuration converter parsers.
The type for configuration keys whose lookup value is 'a
.
key docs docv doc name conv default
is a configuration key named
name
that maps to value v
by default. converter
is
used to convert key values provided by end users.
docs
is the title of a documentation section under which the
key is documented. doc
is a short documentation string for the
key, this should be a single sentence or paragraph starting with
a capital letter and ending with a dot. docv
is a
meta-variable for representing the values of the key
(e.g. "BOOL"
for a boolean).
Invalid_argument
if the key name is not made of a
sequence of ASCII lowercase letter, digit, dash or underscore.
FIXME not implemented.name
as this
may lead to difficulties in the UI.
Watch
provides helpers to register event notifications on
read-write stores.
Node
provides functions to describe the graph-like structured
values.
The node blocks form a labeled directed acyclic graph, labeled by steps: a list of steps defines a unique path from one node to an other.
Each node can point to user-defined contents values.
Graph
specifies the signature for node graphs. A node graph
is a deterministic DAG, labeled by steps.
The type for store handles.
The type of user-defined contents.
The type for node values.
The type of steps. A step is used to pass from one node to an other. A list of steps forms a path.
closure t ~min ~max
is the transitive closure c
of t
's nodes such that:
t
from any nodes in min
to nodes
in c
. If min
is empty, that condition is always true.t
from any nodes in c
to nodes in
max
. If max
is empty, that condition is always false.
CUSTOM(B) Note:
Both min
and max
are subsets of c
.
Commit values represent the store history.
Every commit contains a list of predecessor commits, and the collection of commits form an acyclic directed graph.
Every commit also can contain an optional key, pointing to a node value. See the Node signature for more details on node values.
The signature for slices.
History
specifies the signature for commit history. The
history is represented as a partial-order of commits and basic
functions to search through that history are provided.
Every commit can point to an entry point in a node graph, where user-defined contents are stored.
The type for store handles.
The type for node values.
The type for commit values.
Signature for Irmin stores.
S_MAKER
is the signature exposed by any backend providing S
implementations. S
is the type of steps (a key is list of
steps), C
is the implementation of user-defined contents, T
is
the implementation of store tags and H
is the implementation of
store heads. It does not use any native synchronization
primitives.
The basic API considers default Irmin implementations using:
Only the contents is provided by the user.
The type for default store.
These examples are in the examples
directory of the
distribution.
We want to define mergeable debug log. We first define a log entry as a pair of a timestamp and a message, using the combinator exposed by mirage-tc:
module Entry = struct
include Tc.Pair (Tc.Int)(Tc.String)
let compare (x, _) (y, _) = Pervasives.compare x y
let time = ref 0
let create message = incr time; !time, message
end
A log file is a list of entries (one per line), ordered by
decreasing order of timestamps. The 3-way merge
operator for log
files concatenates and sorts the new entries and prepend them
to the common ancestor's ones.
module Log: Irmin.Contents.S with type t = Entry.t list = struct
include Tc.List(Entry)
(* Get the timestamp of the latest entry. *)
let timestamp = function
| [] -> 0
| (timestamp, _ ) :: _ -> timestamp
(* Compute the entries newer than the given timestamp. *)
let newer_than timestamp entries =
let rec aux acc = function
| [] -> List.rev acc
| (h, _) :: _ when h <= timestamp -> List.rev acc
| h::t -> aux (h::acc) t
in
aux [] entries
let merge ~old t1 t2 =
let open Irmin.Merge.OP in
let ts = timestamp old in
let t1 = newer_than ts t1 in
let t2 = newer_than ts t2 in
let t3 = List.sort Entry.compare (List.rev_append t1 t2) in
ok (List.rev_append t3 old)
end
Note: The serialization primitives provided by
mirage-tc: are not very
efficient in this case as they parse the file every-time. For real
usage, you would write buffered versions of Log.read
and
Log.write
.
To persist the log file on disk, we need to choose a backend. We
show here how to use the on-disk Git
backend on Unix.
(* Bring [Git_unix.task] and [Git_unix.Irmin_git] in scope. *)
open Irmin_unix
(* Build an Irmin store containing log files. *)
let store = Irmin.basic (module Irmin_git.FS) (module Log)
(* Set-up the local configuration of the Git repository. *)
let config = Irmin_git.config ~root:"/tmp/irmin/test" ~bare:true ()
We can now define a toy example to use our mergeable log files.
(* Name of the log file. *)
let file = [ "local"; "debug" ]
(* Read the entire log file. *)
let read_file t =
Irmin.read (t "Reading the log file") file >>= function
| None -> return_nil
| Some l -> return l
(* Persist a new entry in the log. *)
let log t fmt =
Printf.ksprintf (fun message ->
read_file t >>= fun logs ->
let logs = Entry.create message :: logs in
Irmin.update (t "Adding a new entry") file logs
) fmt
let () =
Lwt_unix.run begin
Irmin.create store config task >>= fun t ->
log t "Adding a new log entry" >>= fun () ->
Irmin.clone_force task (t "Cloning the store") "x" >>= fun x ->
log x "Adding new stuff to x" >>= fun () ->
log x "Adding more stuff to x" >>= fun () ->
log x "More. Stuff. To x." >>= fun () ->
log t "I can add stuff on t also" >>= fun () ->
log t "Yes. On t!" >>= fun () ->
Irmin.merge_exn "Merging x into t" x ~into:t >>= fun () ->
return_unit
end
The type for remote stores.
remote_uri s
is the remote store located at uri
. Use the
optimized native synchronization protocol when available for the
given backend.
remote_store t
is the remote corresponding to the local store
t
. Synchronization is done by importing and exporting store
slices, so this is usually much slower than native
synchronization using remote_uri but it works for all
backends.
Sync
provides functions to synchronization an Irmin store with
local and remote Irmin stores.
pull t ?depth r s
is similar to fetch but it
also updates t
's current branch. s
is the update strategy:
`Merge
uses
CUSTOM(S) .merge_head
. This strategy can return a conflict.`Update
uses
CUSTOM(S) .update_head.
push t ?depth r
populates the remote store r
with objects
from the current store t
, using t
's current branch. If b
is t
's current branch, push
also updates the head of b
in
r
to be the same as in t
.
Note: Git semantics is to update b
only if the new
head if more recent. This is not the case in Irmin.
View
provides an in-memory partial mirror of the store, with
lazy reads and delayed write.
Views are like staging area in Git: they are temporary non-persistent areas (they disappear if the host crash), hold in memory for efficiency, where reads are done lazily and writes are done only when needed on commit: if if you modify a key twice, only the last change will be written to the store when you commit. Views also hold a list of operations, which are checked for conflicts on commits and are used to replay/rebase the view if needed. The most important feature of views is that they keep track of reads: i.e. you can have a conflict if a view reads a key which has been modified concurrently by someone else.
rebase_path x t path v
rebases the view v x
on top of
the contents of t x
's sub-tree pointed by the path
path
. Rebasing means re-applying every actions
stored in t
, including the reads. Return Merge.Conflict
if one of the action cannot apply cleanly. See merge_path for
more details.
merge_path x t path v
merges the view v x
with the
contents of t x
's sub-tree pointed by the path path
. Merging
means applying the merge function for map between
the view's contents and t
's sub-tree.
Action
provides information about operations performed on a
view.
Each view stores the list of actions that have already been performed on it. These actions are useful when the view needs to be rebased: write operations are replayed while read results are checked against the original run.
Signature for actions performed on a view.
Dot
provides functions to export a store to the Graphviz `dot`
format.
output_buffer t ?html ?depth ?full buf
outputs the Graphviz
representation of t
in the buffer buf
.
html
(default is false) enables HTML labels.
depth
is used to limit the depth of the commit history. None
here means no limitation.
If full
is set (default is not) the full graph, including the
commits, nodes and contents, is exported, otherwise it is the
commit history graph only.
API to create new Irmin backends. A backend is an implementation exposing either a concrete implementation of S or a functor providing S once applied.
There are two ways to create a concrete Irmin.S implementation: