Managed Strings
Managed strings are used in XDR context for constant strings that are stored either as string or as memory (bigarray of char).
A managed string ms
is declared in the XDR file as in
typedef _managed string ms<>;
In the encoded XDR stream there is no difference between strings and managed strings, i.e. the wire representation is identical. Only the Ocaml type differs to which the managed string is mapped. This type is Xdr_mstring.mstring (below).
In the RPC context there is often the problem that the I/O backend would profit from a different string representation than the user of the RPC layer. To bridge this gap, managed strings have been invented. Generally, the user can determine how to represent strings (usually either as an Ocaml string, or as memory), and the I/O backend can request to transform to a different representation when this leads to an improvement (i.e. copy operations can be saved).
Only large managed strings result in a speedup of the program (at least several K).
There are two cases: The encoding case, and the decoding case.
In the encoding case the mstring
object is created by the user
and passed to the RPC library. This happens when a client prepares
an argument for calling a remote procedure, or when the server
sends a response back to the caller. In the decoding case the client
analyzes the response from an RPC call, or the server looks at the
arguments of an RPC invocation. The difference here is that in the
encoding case user code can directly create mstring
objects by
calling functions of this module, whereas in the decoding case the
RPC library creates the mstring
objects.
For simplicity, let us only look at this problem from the perspective of an RPC client.
Encoding. Image a client wants to call an RPC, and one of the
arguments is a managed string. This means we finally need an mstring
object that can be put into the argument list of the call.
This library supports two string representation specially: The normal
Ocaml string
type, and Netsys_mem.memory which is actually just
a bigarray of char's. There are two factories fac
,
and both can be used to create the mstring
to pass to the
RPC layer. It should be noted that this layer can process the
memory
representation a bit better. So, if the original data
value is a string, the factory for string
should be used, and
if it is a char bigarray, the factory for memory
should be used.
Now, the mstring
object is created by
let mstring = fac # create_from_string data pos len copy_flag
, or bylet mstring = fac # create_from_memory data pos len copy_flag
.Of course, if fac
is the factory for strings, the create_from_string
method works better, and if fac
is for memory
, the create_from_memory
method works better. pos
and len
can select a substring of data
.
If copy_flag
is false
, the mstring
object does not copy the data
if possible, but just keeps a reference to data
until it is accessed;
otherwise if copy_flag
is true
, a copy is made immediately.
Of couse, delaying the copy is better, but this requires that data
is not modified until the RPC call is completed.
Decoding. Now, the call is done, and the client looks at the
result. There is also an mstring
object in the result. As noted
above, this mstring
object was already created by the RPC library
(and currently this library prefers string-based objects if not
told otherwise). The user code can now access this mstring
object with the access methods of the mstring
class (see below).
As these methods are quite limited, it makes normally only sense
to output the mstring
contents to a file descriptor.
The user can request a different factory for managed strings. The function Rpc_client.set_mstring_factories can be used for this purpose. (Similar ways exist for managed clients, and for RPC servers.)
Potential. Before introducing managed strings, a clean analysis
was done how many copy operations can be avoided by using this
technique. Example: The first N bytes of a file are taken as
argument of an RPC call. Instead of reading these bytes into a
normal Ocaml string, an optimal implementation uses now a memory
buffer for this purpose. This gives:
memory
value), and the second copy
writes the data into the socket.Part of the optimization is that Unix.read
and Unix.write
do a completely avoidable copy of the data which is prevented by
switching to Netsys_mem.mem_read and Netsys_mem.mem_write,
respectively. The latter two functions exploit an optimization
that is only possible when the data is memory
-typed.
The possible optimizations for the decoding side of the problem are slightly less impressive, but still worth doing it.
The length of the managed string
blit_to_string mpos s spos len
: Copies the substring of the
managed string from mpos
to mpos+len-1
to the substring of
s
from spos
to spos+len-1
blit_to_string mpos mem mempos len
: Copies the substring of the
managed string from mpos
to mpos+len-1
to the substring of
mem
from mempos
to mempos+len-1
Returns the contents as string. It is undefined whether the returned string is a copy or the underlying buffer. The int is the position where the contents start
Returns the contents as memory. It is undefined whether the returned memory is a copy or the underlying buffer. The int is the position where the contents start
Whether as_memory
or as_string
is cheaper
The object holding the string value
create_from_string s pos len must_copy
: Creates the mstring
from the
sub string of s starting at pos
with length len
If must_copy
the mstring object must create a copy. Otherwise
it can just keep the string passed in.
create_from_memory m pos len must_copy
: Creates the mstring
from the
sub string of m starting at pos
with length len
If must_copy
the mstring object must create a copy. Otherwise
it can just keep the memory passed in.
The object creating new mstring
objects
Represent a string as mstring (no copy)
Uses memory to represent mstrings. The memory bigarrays are allocated
with Bigarray.Array1.create
Represent memory as mstring (no copy)
Uses memory to represent mstrings. The memory bigarrays are allocated
with Netsys_mem.alloc_memory_pages if available, and
Bigarray.Array1.create
if not.
Uses memory to represent mstrings. The memory bigarrays are obtained from the pool. The length of these mstrings is limited by the blocksize of the pool.
concatenates the mstrings and return them as single string. The returned string may be shared with one of the mstrings passed in.
prefix_mstrings l n
: returns the first n
chars of the
concatenated mstrings l
as single string
blits the mstrings one after the other to the memory, so that they appear there concatenated