Object-oriented I/O: Basic types and classes
Contents
The tutorial has been moved to [root:Netchannels_tut].
There are three levels of class types for channels:
rec_in_channel and rec_out_channel: Primitive, but standardized levelraw_in_channel and raw_out_channel: Unix levelin_obj_channel and out_obj_channel: Application levelThe "rec" level has been recently introduced to improve interoperability with other libraries (e.g. camomile). The idea is to standardize the real core methods of I/O, so they have the same meaning in all libraries. Read "Basic I/O class types" for more.
The "raw" level represents the level of Unix file descriptors.
The application level is what should be used in programs. In addition
to the "raw" level one can find a number of convenience methods,
e.g. input_line to read a line from the channel. The downside is that
these methods usually work only for blocking I/O.
One can lower the level by coercion, e.g. to turn an in_obj_channel
into a rec_in_channel, apply the function
(fun ch -> (ch : in_obj_channel :> rec_in_channel))
To higher the level, apply lift_in or lift_out, defined below.
Interface changes: Since ocamlnet-0.98, the semantics of
the methods input and output has slightly changed. When the end
of the channel is reached, input raises now End_of_file. In previous
releases of ocamlnet, the value 0 was returned. When the channel cannot
process data, but is in non-blocking mode, both methods now return the
value 0. In previous releases of ocamlnet, the behaviour was not
defined.
Ocamlnet-3.0 changed the behavior of close_out. Errors are no longer
reported - instead, the exception is logged to [root:Netlog]. For a stricter
error handling, it is suggested to call flush first. Also, close_in
and close_out no longer raise Closed_channel when the channel is
already closed. Read more about this in the section
Netchannels.rec_out_channel.close_error.
Raised when channel operations are called when the channel is closed
Raised by input methods if the internal buffer of the channel is too empty to read even one byte of data. This exception is only used by certain implementations of channel classes.
Raised by close_in or close_out if the channel is connected with
another process, and the execution of that process fails.
Description
This class type is defined in "Basic I/O class types" as collaborative effort of several library creators.
Reads octets from the channel and puts them into the string. The
first int argument is the position of the substring, and the second
int argument is the length of the substring where the data are
stored. The method returns the number of octets actually read and
stored.
When the end of the channel is reached and there is no further octet
to read, the exception End_of_file will be raised. This has
been changed in ocamlnet-0.97! In previous releases the number 0
was returned at the end of the channel.
When the channel is non-blocking, and there are currently no bytes to read, the number 0 will be returned. This has been changed in ocamlnet-0.97! In previous releases this behaviour was undefined.
When the channel is closed, the exception Closed_channel will be
raised if an ocamlnet implementation is used. For implementations
of other libraries there is no standard for this case.
Closes the channel for input.
When the channel is already closed, this is a no-op.
Error policy: Exceptions are only raised in cases of serious corruption, e.g. if the underlying descriptor is invalid.
Recommended input class type for library interoperability.
Returns the current channel position. This position can be expected
to be consistent with the returned number of bytes of input, i.e.
when input returns n, the position is advanced by n.
As seek operations are outside the scope of Netchannels,
implementations may or may not take seek operations into account.
Basic Unix-level class type for input channels as used by ocamlnet. In addition to the recommended standard, ocamlnet always support a position counter
Description
This class type is defined in "Basic I/O class types" as collaborative effort of several library creators.
Takes octets from the string and writes them into the channel. The
first int argument is the position of the substring, and the second
int argument is the length of the substring where the data can
be found. The method returns the number of octets actually written.
The implementation may choose to collect written octets in a buffer before they actually delivered to the underlying resource.
When the channel is non-blocking, and there are currently no bytes to write, the number 0 will be returned. This has been changed in ocamlnet-0.97! In previous releases this behaviour was undefined.
When the channel is closed, the exception Closed_channel will be
raised if an ocamlnet implementation is used. For implementations
of other libraries there is no standard for this case.
If there is a write buffer, it will be flushed. Otherwise, nothing happens.
Flushes the buffer, if any, and closes the channel for output.
When the channel is already closed, this is a no-op.
The close_out method has actually two tasks: First, it writes out
all remaining data (like flush), and second, it releases OS
resources (e.g. closes file descriptors). There is the question
what has to happen when the write part fails - is the resource released
anyway or not?
We choose here a pragmatic approach under the assumption that
an OS error at close time is usually unrecoverable, and it is
more important to release the OS resource. Also, we
assume that the user is wise enough to call flush first if
it is essential to know write errors at close time. Under these
assumptions:
flush method fully reports any errors when writing out
the remaining data.flush raises an error exception, it should discard
any data in the buffer. This is not obligatory, however,
but considered good practice, and is subject to discussion.close_out method usually does not report errors by
raising exceptions, but only by logging them via [root:Netlog].
The OS resource is released in any case. As before, this
behavior is not obligatory, but considered as good practice,
and subject to discussion.This ensures that the following code snippet reports all errors, but also releases OS resources:
       try 
         ch # flush();
         ch # close_out();
       with error -> 
          ch # close_out(); raise error
      There are some cases where data can be first written when it is
known that the channel is closed. These data would not be written
by a preceding flush. In such cases:
write_eof, that marks the data as logically
being complete, so a following flush can do the complete
shutdown cycle of the channel.close_out releases the descriptor: the first close_out
will report the error condition as exception, but discard
all data in the channel. The second close_out finally
releases the OS resource.In any way, hard errors indicating bugs of the program logic (like invalid file descriptors) should always be immediately reported.
Recommended output class type for library interoperability.
Returns the current channel position. This position can be expected
to be consistent with the returned number of bytes of output, i.e.
when output returns n, the position is advanced by n.
As seek operations are outside the scope of Netchannels,
implementations may or may not take seek operations into account.
Basic Unix-level class type for output channels as used by ocamlnet. In addition to the recommended standard, ocamlnet always support a position counter
A channel supporting both input and output. The input and output aspects are strictly separated
Reads exactly as many octets from the channel as the second int
argument specifies. The octets are placed at the position denoted
by the first int argument into the string.
When the end of the channel is reached before the passed number of
octets are read, the exception End_of_file is raised.
Reads exactly one character from the channel, or raises End_of_file
Reads the next line from the channel. When the channel is already
at the end before input_line is called, the exception End_of_file
is raised.
Reads exactly one octet from the channel and returns its code,
or raises End_of_file
Further methods usually supported by ocamlnet channel implementations. These methods are only reasonable when the channel is of blocking type, i.e. waits for input when not enough data are available to perform an operation. Implementations may choose to fail when they detect the channel is non-blocking.
The application-level input channel supports raw and complemented methods
Writes exactly as many octets to the channel as the second int
argument specifies. The octets are taken from the string position
denoted by the first int argument.
Writes exactly one character
Writes exactly the passed string
Writes exactly one byte passed as integer code
Writes the contents of an in_obj_channel until the end of the
input channel is reached.
Further methods usually supported by ocamlnet channel implementations. These methods are only reasonable when the channel is of blocking type, i.e. waits for output readiness when the underlying resource currently cannot process enough data. Implementations may choose to fail when they detect the channel is non-blocking.
The application-level output channel supports raw and complemented methods
A channel supporting both input and output. The input and output aspects are strictly separated
Flushes the transaction buffer, and writes its contents to the underlying resource.
Empties the transaction buffer
A transactional output channel has a buffer for uncommitted data.
This means that all data written to this channel is collected in the
buffer until either commit_work or rollback_work is called.
When the channel is closed, the buffer may optionally be committed. This is implementation-defined.
The method flush does not have any effect on the transaction
buffer.
Creates an input channel from an in_channel, which must be open.
The method pos_in reflects the real position in the channel as
returned by Pervasives.pos_in. This works for both seekable and
non-seekable channels.
The method close_in also closes the underlying in_channel.
The function onclose is called after the in_channel has been closed.
Runs the command with /bin/sh, and reads the data the command prints
to stdout.
The method pos_in returns the number of read octets.
When close_in is invoked, the subprocess is waited for. If the
process exits with code 0, the method returns normally. Otherwise,
the exception Command_failure is raised.
Creates an input channel from a (constant) string.
The method pos_in reflects the real position in the string, i.e.
a character read at position k can be found at s.[k] in the string
s.
Creates an input channel and a shutdown function for a netbuffer. This is a destructive implementation: Every time data is read, the octets are taken from the beginning of the netbuffer, and they are deleted from the netbuffer (recall that a netbuffer works like a queue of characters).
Conversely, the user of this class may add new data to the netbuffer at any time. When the shutdown function is called, the EOF condition is recorded, and no further data must be added.
If the netbuffer becomes empty, the input methods raise Buffer_underrun
when the EOF condition has not yet been set, and they raise
End_of_file when the EOF condition has been recorded.
Creates a lexical buffer from an input channel. The input channel is not closed when the end is reached
This function does not work for non-blocking channels.
Reads from the input channel until EOF and returns the characters as string. The input channel is not closed.
This function does not work for non-blocking channels.
Reads from the input channel until EOF and returns the lines as string list. The input channel is not closed.
This function does not work for non-blocking channels.
with_in_obj_channel ch f:
Computes f ch and closes ch. If an exception happens, the channel is
closed, too.
Creates an output channel writing into an out_channel.
The method pos_out reflects the real position in the channel as
returned by Pervasives.pos_out. This works for both seekable and
non-seekable channels.
The method close_out also closes the underlying out_channel.
There is some implicit logic to either use close_out or close_out_noerr
depending on whether the immediately preceding operation already reported
an error.
close_out method is
invoked, just after the underlying out_channel has been closed.
  Runs the command with /bin/sh, and data written to the channel is
piped to stdin of the command.
The method pos_out returns the number of written octets.
When close_out is invoked, the subprocess is waited for. If the
process exits with code 0, the method returns normally. Otherwise,
the exception Command_failure is raised. (The channel is closed
even if this exception is raised.)
close_out method is
invoked, just after the underlying descriptor has been closed.
  This output channel writes the data into the passed buffer.
The method pos_out returns the number of written octets.
close_out method is
invoked, just after the underlying descriptor has been closed.
  This output channel writes the data into the passed netbuffer.
The method pos_out returns the number of written octets.
close_out method is
invoked, just after the underlying descriptor has been closed.
  This output channel discards all written data.
The method pos_out returns the number of discarded bytes.
close_out method is
invoked, just after the underlying descriptor has been closed.
  with_out_obj_channel ch f:
Computes f ch and closes ch. If an exception happens, the channel is
closed, too.
Delegation classes just forward method calls to an parameter
object, i.e. when method m of the delegation class is called,
the definition of m is just to call the method with the same
name m of the parameter object. This is very useful in order
to redefine methods individually.
For example, to redefine the method pos_in of an in_obj_channel,
use
   class my_channel = object(self)
     inherit in_obj_channel_delegation ...
     method pos_in = ...
   end
      As a special feature, the following delegation classes can suppress
the delegation of close_in or close_out, whatever applies.
Just pass close:false to get this effect, e.g.
   class input_channel_don't_close c =
     in_obj_channel_delegation ~close:false (new input_channel c)
      
This class does not close c : in_channel when the close_in
method is called.
The following classes and functions add missing methods to reach
a higher level in the hierarchy of channel class types. For most
uses, the lift_in and lift_out functions work best.
Turns a rec_in_channel or raw_in_channel, depending on the passed
variant, into a full in_obj_channel object. (This is a convenience
function, you can also use the classes below directly.) If you
want to define a class for the lifted object, use
     class lifted_ch ... =
       in_obj_channel_delegation (lift_in ...)
        
  input_line recognizes any of the passed strings as EOL
delimiters. When more than one delimiter matches, the longest
is taken. Defaults to ["\n"]. The default cannot be
changed when buffered=false (would raise Invalid_argument).
The delimiter strings must neither be empty, nor longer than
buffer_size.
  max_int, i.e. it is off.
  Turns a rec_out_channel or raw_out_channel, depending on the passed
variant, into a full out_obj_channel object. (This is a convenience
function, you can also use the classes below directly.) If you
want to define a class for the lifted object, use
     class lifted_ch ... =
       out_obj_channel_delegation (lift_out ...)
        
  max_int, i.e. it is off.
  As in raw_in_channel
As in raw_in_channel
As in raw_in_channel
This class implements the methods from compl_in_channel by calling
the methods of raw_in_channel. There is no additional buffering.
The performance of the method input_line is very bad (consider
to override it, e.g. by enhanced_input_line as defined below).
This class implements pos_in and the methods from compl_in_channel
by calling the methods of rec_in_channel.
There is no additional buffering.
The performance of the method input_line is very bad (consider
to override it, e.g. by enhanced_input_line as defined below).
The method pos_in is implemented by counting the number of octets
read by the input method.
pos_in.
Defaults to 0.
  As in raw_out_channel
As in raw_out_channel
As in raw_out_channel
As in raw_out_channel
This class implements the methods from compl_out_channel by calling
the methods of raw_out_channel. There is no additional buffering.
This class implements the methods from compl_out_channel by calling
the methods of raw_out_channel. There is no additional buffering.
This class implements pos_out and the methods from compl_out_channel
by calling the methods of rec_out_channel.
There is no additional buffering.
The method pos_out is implemented by counting the number of octets
read by the output method.
pos_out.
Defaults to 0.
  This type is for the method enhanced_input of enhanced_raw_in_channel.
`Data n means that n bytes have been copied to the target string`Separator s means that no bytes have been copied, but that an
end-of-line separator s has been foundAn improved implementation of input_line that uses the buffer
Works similar to input, but distinguishes between normal data
and end-of-line separators. The latter are returned as
`Separator s. When normal data is found, it is copied to the
string, and `Data n is returned to indicate that n bytes
were copied.
Defines private methods reading text line by line
This class adds a buffer to the underlying raw_in_channel.
As additional feature, the method enhanced_input_line is a fast
version of input_line that profits from the buffer.
enhanced_input_line recognizes any of the passed strings as EOL
delimiters. When more than one delimiter matches, the longest
is taken. Defaults to ["\n"]. Note that input_line
always only recognizes "\n" as EOL character, this cannot
be changed.
The delimiter strings must neither be empty, nor longer than
buffer_size.
  max_int, i.e. it is off.
  This class adds a buffer to the underlying raw_out_channel.
max_int, i.e. it is off.
  Creates a raw_in_channel for the passed file descriptor, which must
be open for reading.
The pos_in method returns logical positions, i.e. it counts the number
of read octets. It is not tried to determine the real file position.
The method close_in also closes the file descriptor.
This class also supports Win32 proxy descriptors referring to an input channel.
true.
  pos_in is initialized when
the channel is created, by default 0
  Creates a raw_out_channel for the passed file descriptor, which must
be open for writing.
The pos_out method returns logical positions, i.e. it counts the number
of written octets. It is not tried to determine the real file position.
The method close_out also closes the file descriptor.
This class also supports Win32 proxy descriptors referring to an output channel.
true.
  pos_out is initialized when
the channel is created, by default 0
  Creates a raw_io_channel for the passed socket descriptor, which must
be open for reading and writing, and not yet shut down in either
direction. The raw_io_channel is used to represent a bidirectional
channel: close_out shuts the socket down for sending, close_in
shuts the socket down for reading, and when both directions are down,
the descriptor is closed.
The pos_in and pos_out methods returns logical positions.
This class supports sockets and Win32 named pipes. Note, however, that for Win32 named pipes it is not possible to shut down only one direction of the bidirectional data channel.
pos_in is initialized when
the channel is created, by default 0
  pos_out is initialized when
the channel is created, by default 0
  Whether a close_out implies a commit or rollback operation
A transactional output channel with a transaction buffer implemented in memory
close_out, by default
`Commit
  Creates a temporary file in the directory tmp_directory with a name
prefix tmp_prefix and a unique suffix. The function returns
the triple (name, inch, outch) containing the file name,
the file opened as in_channel inch and as out_channel outch.
"netstring". This needs not to be
unique, but just descriptive.
  A transactional output channel with a transaction buffer implemented as temporary file
close_out, by default
`Commit
  make_temporary_file
  make_temporary_file
  Note that this has nothing to do with "pipes" on the Unix level. It is, however, the same idea: Connecting two I/O resources with an intermediate buffer.
A pipe has two internal buffers (realized by Netbuffer). The
output methods of the class write to the incoming buffer. When
new data are appended to the incoming buffer, the conversion function
conv is called; the arguments are the incoming buffer and the outgoing
buffer. The conversion function must convert the data available in the
incoming buffer and append the result to the outgoing buffer. Finally,
the input methods of the class return the data found in the outgoing
buffer.
The conversion function is called as follows:
conv incoming_buffer at_eof outgoing_buffer
The conversion function is allowed to do nothing if the incoming data are not complete enough to be converted. It is also allowed to convert only the beginning of the incoming buffer.
If the outgoing buffer is empty, the input methods will raise
Buffer_underrun.
If close_out is invoked, the end of the data stream will be recorded.
In this case, the conversion function is called with at_eof = true,
and it is expected that this function converts the whole data found
in the incoming buffer.
close_in implies close_out.
The conversion function may raise exceptions. The exceptions will
fall through to the caller of the input methods. (The output methods
and close_in, close_out never fail because of such exceptions.)
The default conversion function copies everything from the incoming buffer to the outgoing buffer without modification.
An output_filter filters the data written to it through the
io_obj_channel (usually a pipe), and writes the filtered data
to the passed out_obj_channel.
If the filter is closed, the io_obj_channel will be closed, too,
but not the destination out_obj_channel (so you can still append
further data).
An input_filter filters the data read from it through the
io_obj_channel (usually a pipe after the data have been
retrieved from the passed in_obj_channel.
An input_filter object never generates Buffer_underrun exceptions.
However, if the passed in_obj_channel or io_obj_channel raises such
an exception, the exception will fall through the calling chain.
If the filter is closed, the io_obj_channel will be closed, too,
but not the source in_obj_channel (so you can still read further
data from it).
If you have the choice, prefer output_filter over input_filter.
The latter is slower.
The primary application of filters is to encode or decode a channel on the fly. For example, the following lines write a BASE64-encoded file:
let ch = new output_channel (open_out "file.b64") in
   let encoder = new Netencoding.Base64.encoding_pipe ~linelength:76 () in
   let ch' = new output_filter encoder ch in
   ... (* write to ch' *)
   ch' # close_out();
   ch  # close_out();  (* you must close both channels! *)
      All bytes written to ch' are BASE64-encoded and the encoded bytes are
written to ch.
There are also pipes to decode BASE64, and to encode and decode the "Quoted printable" format. Encoding and decoding work even if the data is delivered in disadvantageous chunks, because the data is "re-chunked" if needed. For example, BASE64 would require that data arrive in multiples of three bytes, and to cope with that, the BASE64 pipe only processes the prefix of the input buffer that is a multiple of three, and defers the encoding of the extra bytes till the next opportunity.