An extension of the standard StringLabels. If you open Core.Std, you'll get these in the String module.
Caseless
compares and hashes strings ignoring case, so that for example
Caseless.equal "OCaml" "ocaml"
and Caseless.("apple" < "Banana")
are true
, and
Caseless.Map
, Caseless.Table
lookup and Caseless.Set
membership is
case-insensitive.
Maximum length of a string.
Substring search and replace functions. They use the Knuth-Morris-Pratt algorithm (KMP) under the hood.
The functions in the Search_pattern
module allow the program to preprocess the
searched pattern once and then use it many times without further allocations.
create pattern
preprocesses pattern
as per KMP, building an int array
of
length length pattern
. All inputs are valid.
pos < 0
or pos >= length string
result in no match (hence index
returns
None
and index_exn
raises).
Substring search and replace convenience functions. They call Search_pattern.create
and
then forget the preprocessed pattern when the search is complete. pos < 0
or pos
>= length t
result in no match (hence substr_index
returns None
and
substr_index_exn
raises). may_overlap
indicates whether to report overlapping
matches, see Search_pattern.index_all
.
lfindi ?pos t ~f
returns the smallest i >= pos
such that f i t.[i]
, if there is
such an i
. By default, pos = 0
.
rfindi ?pos t ~f
returns the largest i <= pos
such that f i t.[i]
, if there is
such an i
. By default pos = length t - 1
.
foldi
works similarly to fold
, but also pass in index of each character to f
tr_inplace target replacement s
destructively modifies s (in place!)
replacing every instance of target
in s
with replacement
.
Operations for escaping and unescaping strings, with paramaterized escape and escapeworthy characters. Escaping/unescaping using this module is more efficient than using Pcre. Benchmark code can be found in core/benchmarks/string_escaping.ml.
escape_gen_exn escapeworthy_map escape_char
returns a function that will escape a
string s
as follows: if (c1,c2)
is in escapeworthy_map
, then all occurences of
c1
are replaced by escape_char
concatenated to c2
.
Raises an exception if escapeworthy_map
is not one-to-one. If escape_char
is
not in escapeworthy_map
, then it will be escaped to itself.
escape ~escapeworthy ~escape_char s
is
escape_gen_exn ~escapeworthy_map:(List.zip_exn escapeworthy escapeworthy)
~escape_char
.
Duplicates and escape_char
will be removed from escapeworthy
. So, no
exception will be raised
unescape_gen_exn
is the inverse operation of escape_gen_exn
. That is,
let escape = Staged.unstage (escape_gen_exn ~escapeworthy_map ~escape_char) in
let unescape = Staged.unstage (unescape_gen_exn ~escapeworthy_map ~escape_char) in
assert (s = unescape (escape s))
always succeed when ~escapeworthy_map is not causing exceptions.
Any char in an escaped string is either escaping, escaped or literal. For example, for escaped string "0_a0__0" with escape_char as '_', pos 1 and 4 are escaping, 2 and 5 are escaped, and the rest are literal
is_char_escaping s ~escape_char pos
return true if the char at pos
is escaping,
false otherwise.
is_char_escaped s ~escape_char pos
return true if the char at pos
is escaped,
false otherwise.
is_literal s ~escape_char pos
return true if the char at pos
is not escaped or
escaping.
index s ~escape_char char
find the first literal (not escaped) instance of
char in s starting from 0.
rindex s ~escape_char char
find the first literal (not escaped) instance of
char in s starting from the end of s and proceeding towards 0.
index_from s ~escape_char pos char
find the first literal (not escaped)
instance of char in s starting from pos and proceeding towards the end of s.
rindex_from s ~escape_char pos char
find the first literal (not escaped)
instance of char in s starting from pos and towards 0.
split s ~escape_char ~on
s
that are separated by
literal versions of on
. Consecutive on
characters will cause multiple empty
strings in the result. Splitting the empty string returns a list of the empty
string, not the empty list."foo"; "bar_,baz"
split_on_chars s ~on
s
that are separated by
one of the literal chars from on
. on
are not grouped. So a grouping of on
in
the source string will produce multiple empty string splits in the result.',';'|'
"foo_|bar,baz|0" ->
"foo_|bar"; "baz"; "0"