Piqi for OCaml User’s Manual

Table of Contents

 1. Overview

 2. Piqi compiler and generated OCaml code

      2.1. Protocol Buffers serialization

      2.2. Customized runtime library for Protocol Buffers serialization

      2.3. Multi-format serialization

      2.4. Command-line parameters

 3. Piqi to OCaml mapping

      3.1. Modules

           3.1.1. Includes

           3.1.2. Imports

      3.2. Primitive types

      3.3. User-defined types

      3.4. Custom OCaml types

      3.5. Piqi extensions

 4. Examples

 5. Limitations

 6. Supported OCaml and Protocol Buffers versions

1. Overview

Piqi includes a data serialization system for OCaml. It can be used for serializing OCaml values in 4 different formats: Google Protocol Buffers, JSON, XML and Piq.

A typical Piqi usage scenario involves the following steps:

1. Build and install Piqi libraries for OCaml

Piqi source code is distributed as a self-contained package with no external dependencies. Builds have been tested on all major platforms.

The installation instructions are available here.

2. Describe data structures using the Piqi data definition language

The Piqi data definition language can describe many OCaml types, both primitive and user-defined. This includes int, uint32 and uint64, floats, bools, strings, lists, arrays, records and polymorphic variants (including sub-variants).

In addition to types supported by default, Piqi has a mechanism for adding support for arbitrary monomorphic OCaml types. It can be used, for example, to add support for OCaml’s nativeint or char.

Refer to the "Piqi to OCaml mapping" section below for details.

3. Call piqic-ocaml, the Piqi compiler for OCaml, to generate OCaml type definitions and serialization code : See the next section for the detailed description.

4. Use generated serializes/deserializers/printers in a user’s program

When multi-format serialization mode is used, one can specify a desired format at runtime.

5. Link the user’s program with the Piqi runtime library

There are two Piqi runtime libraries: piqirun.pb and piqirun.ext. The first one is for Protocol Buffers serialization. The second one is used for multi-format serialization. See the next section for more details.

The Examples section contains links to several sample OCaml projects that use Piqi for data serialization and demonstrate usage of steps 2–5.

2. Piqi compiler and generated OCaml code

piqic-ocaml, the Piqi compiler for OCaml, can generate two flavors of OCaml code. The first one is used only for Protocol Buffers serialization. The second flavor can be used for serializing multiple formats, including Protobuf, XML, JSON and Piq.

Multi-format serialization is an extension of the basic Protocol Buffers serialization mode. It requires linking with a different runtime library.

2.1. Protocol Buffers serialization

When called without --multi-format, piqic-ocaml generates OCaml type definitions and code for Protocol Buffers serialization.

piqic-ocaml command takes a Piqi module <dir path>/<file>.piqi and produces an output <ocaml-module>.ml file in the current working directory. By default, <ocaml-module> is a valid OCaml module name equivalent to <file>_piqi.

For each specified, imported or included module <m>.piqi, the compiler tries to load and automatically include <m>.ocaml.piqi. This mechanism is called Extension Modules. It is described in detail in the Piqi language section.

Output directory can be overridden using the -C command-line option.

Generated <ocaml-module>.ml file contains OCaml type definitions and functions for serializing and deserializing OCaml values.

For each defined data type <typename>, piqic-ocaml produces several functions:

  • parse_<typename> — for deserializing a value

  • gen_<typename> — for serializing a value

  • default_<typename> — type constructor: returns a minimally serializable value of this type

Compiled .ml files should be linked with the piqilib.pb findlib package. For example:

# generate "test_piqi.ml"
piqic-ocaml test.piqi

# compile and link it with the runtime library using findlib/ocamlfind
ocamlfind ocamlc -linkpkg -package piqilib.pb test_piqi.ml

2.2. Customized runtime library for Protocol Buffers serialization

Under the hood, the piqilib.pb package resolves to a single compiled module named Piqirun. This is the module the generated _piqi.ml OCaml code uses.

Sometimes, it may be useful to use a modified version of the Piqirun module. For example, a customized version could have extra optimizations, improve error handling or implement serialization code for Custom OCaml types.

To swap the default runtime for a customized one, call piqic-ocaml with --runtime <module-name> option, where <module-name> is the name of the module to use instead of Piqirun.

2.3. Multi-format serialization

When piqic-ocaml is called with the --multi-format flag, it generates an additional code for serializing values in XML, JSON, Protobuf and Piq formats.

The additional module generated by this compiler is named <ocaml-module>_piqi_ext.ml.

The parse_<typename> and gen_<typename> functions from this module take an additional parameter specifying which serialization format to use:

type input_format = [ `piq | `json | `xml | `pb | `wire ]

type output_format = [ input_format | `json_pretty | `xml_pretty ]

In addition, piqic-ocaml --multi-format generates some other functions:

  • print_<typename> — for printing a value to stdout in Piq format

Each parse_<typename> and gen_<typename> function accepts an optional ?opts argument representing a set of serialization options that can be constructed using Piqirun_ext.make_options:

(* Construct serialization options to be passed as an optional argument to
 * gen_<typename> and parse_<typename> functions. Available options:
 *
 * pretty_print
 *
 *      Pretty-print generated JSON and XML output (default = true)
 *
 * json_omit_missing_fields
 *
 *      Omit missing optional and empty repeated fields from JSON
 *      output instead of representing them as {"field_name": null} and
 *      {"field_name", []} JSON fields (default = true)
 *
 * use_strict_parsing
 *
 *      Treat unknown and duplicate fields as errors when parsing JSON,
 *      XML and Piq formats (default = false)
 *
 * piq_frameless_output
 *
 *      Print a frame (i.e. :<typename> []) around a single output Piq object
 *      (default=false)
 *
 * piq_frameless_input
 *
 *      Expect a frame around a single input Piq object (default=false)
 *
 * piq_relaxed_parsing
 *
 *      Parse Piq format using "relaxed" mode (default=false);
 *
 *      For instance, when set to `true`, single-word string literals don't have
 *      to be quoted
 *)
val make_options:
        ?pretty_print:bool ->
        ?json_omit_missing_fields:bool ->
        ?use_strict_parsing:bool ->
        ?piq_frameless_output:bool ->
        ?piq_frameless_input:bool ->
        ?piq_relaxed_parsing:bool ->
        unit -> options

Compiled .ml files should be linked with the piqirun.ext findlib package. For example:

# generate "test_piqi.ml" and "test_piqi_ext.ml"
piqic-ocaml --multi-format test.piqi

# compile and link them with the runtime library using findlib/ocamlfind
ocamlfind ocamlc -linkpkg -package piqirun.ext test_piqi.ml test_piqi_ext.ml

2.4. Command-line parameters

piqic-ocaml accepts the following command-line parameters.

  • --multi-format generate extended OCaml stubs for multi-format (JSON/XML/Piq/Pb) serialization, i.e. <module>_piqi_ext.mla file

  • --ext same as --multi-format

  • --normalize-names true|false — convert "CamelCase"-style identifiers from the original type spec into "camel_case" OCaml names (names will be capitalized when appropriate). When the argument is false, the original identifiers will be lowercased without performing any additional transformations, e.g. "CamelCase" turns into "camelCase". The default value is true.

  • --reserved-name — add a reserved name in addition to the standard OCaml keywords. Can be used several times. Such names will be prefixed with underscores in the generated OCaml code.

  • --runtime <module> name of the Protobuf serialization runtime module (default = Piqirun)

  • -C <dir> — specify output directory for the generated .ml files.

  • -I <dir> — add directory to the list of imported .piqi search paths

  • -e <name> — try including extension for all loaded modules (can be used several times)

  • --gen-preserve-unknown-fields — generate code that preserves unknown Protobuf fields when they are serialized back. When enabled, unknown (unrecognized) Protobuf fields are captured during de-serialization in a special ‘piqi_unknown_pb’ field and automatically written back when the record is serialized to Protobuf.

  • --strict treat unknown and duplicate fields as errors

  • --no-warnings — don’t print warnings

  • --trace — turn on tracing (verbose output)

  • --version — print piqi-ocaml version and exit

  • --piqi-version — print piqi (piqilib) version and exit

  • -h, --help — print command-line options help

3. Piqi to OCaml mapping

The following sections describe how different Piqi constructs such as modules and types are mapped to OCaml.

3.1. Modules

The name of OCaml module is derived from Piqi module name unless overridden by ocaml-module top-level field.

If Piqi module name is "example.com/foo/bar", then "Bar" (the last part of the Piqi module name) will be used as the OCaml module name. It is possible to override such default name assignment by specifying .ocaml-module "<OtherName>" in the Piqi module.

3.1.1. Includes

Piqi takes all "include" directives of a Piqi module, resolves them internally and produces a compound Piqi module which is then mapped to the resulting OCaml module.

3.1.2. Imports

Piqi "import" directives are mapped to OCaml modules in the following way.

Imports that do not specify local module name are used directly as a part of Ocaml type names for imported types. For example,

.import [ .module example.com/foo/bar ]
.variant [
    .name v
    .option [
        .name o
        .type bar/t
    ]
]

is mapped to:

type v = [ `o of Bar.t ]

Imports that do have imported module name result in generation of module alias. For example,

.import [ .module example.com/foo/bar .name fum ]

is mapped to:

module Fum = Bar

It is possible to override ocaml names of an imported module by using .ocaml-name and .ocaml-module properties in the import statement. For example, given an import statement like this:

.import [
    .module m
    .ocaml-module "Foo.Bar"
    .ocaml-name "Fum"
]

piqic-ocaml will generate the following statement at the top of the generated _piqi.ml:

module Fum = Foo.Bar

3.2. Primitive types

The table below represents correspondence between Piqi primitive types and OCaml types.

(Mapping between Piqi and Protocol Buffers primitive type is documented here).

Piqi type(s) OCaml type Protobuf type(s)
bool bool bool
string string string
binary string bytes
int, uint int sint32, uint32
int32, uint32, int32-fixed, uint32-fixed, protobuf-int32 int32 sint32, uint32, sfixed32, fixed32, int32
int64, uint64, int64-fixed, uint64-fixed, protobuf-int64 int64 sint64, uint64, sfixed64, fixed64, int64
float, float64, float32 float double, float

If there is a need to add serialization support for other OCaml types, such as char, nativeint or bigint, refer to Custom OCaml types section which describes a method for mapping custom OCaml types to Piqi types.

3.3. User-defined types

  • Type names

    Each user-defined type is identified by its name. Piqi type names are converted to OCaml type name using the following rule.

    By default, Piqi identifiers are normalized and all hyphen characters are replaced by underscores. Normalization means converting "CamelCase" to "camel-case".

    If --normalize false command-line option is specified, then instead of full normalization, the first letter of the type name is uncapitalized.

    Sometimes it is necessary to override this rule and specify a custom OCaml name for a type. For example, when a Piqi type name conflicts with one of OCaml keywords. In such case, custom OCaml name can be specified using .ocaml-name "<ocaml name>" field next to the original .name <name> entry. (This feature also works for field names, option names, import names and function names).

    For those Piqi fields or options that do not specify names, OCaml name is derived from the name of the Piqi type for that field.

  • Records are mapped to OCaml records.

    As a workaround for OCaml’s flat namespace for record labels, Piqi puts each record definition in a separate OCaml module. The module’s name is set to the record’s name.

    For example, Piqi record

    .record [
        .name r
        .field [ .name a .type int ]
    ]

    will be mapped to the following OCaml module:

    module R =
      struct
        record t = { mutable a : int }
      end

    (In fact, the real example would be more verbose, because Piqi uses recursive modules which require signature definition in addition to module implementation.)

    To make working with records defined in separate modules easier, you can use "local opens" introduced in OCaml 3.12. For example, records can be created as

    R.({ a = 10; b = ... })

    instead of

    {R.a = 10; b = ...}

    Similarly, if you have a significant portion of code working with some record’s fields, you can open the record’s module before the code:

    let open R in
    ...
    
    (let open R in ... is a full equivalent of R.( ... ))

    This way, you can refer to record felds as x.a instead of x.R.a. Note that this doesn’t work with several records simultaneously.

    required Piqi fields are mapped directly to OCaml record fields.

    optional Piqi fields of type <t> are mapped to fields with type <t> option.

    optional Piqi fields with specified default values are mapped to OCaml fields the same way as required fields unless .ocaml-optional flag is specified in which case the field will have type <t> option.

    If a value of such field is not defined in serialized object, it will be set to the default value during deserialization. Also, see Limitations section below.

    optional Piqi fields without type (i.e. flags) are mapped to bool OCaml fields. The value of the field will be set to true if the flag is present in the record.

    repeated Piqi fields of type <t> are mapped to fields with type <t> list. It is possible to use OCaml array instead of list by specifying an additional .ocaml-array property in the field definition.

  • Enums and Variants are mapped directly to OCaml polymorphic variants.

    For example, these definitions:

    .enum [
        .name e
        .option [ .name a ]
        .option [ .name b ]
    ]
    .varint [
        .name v
        .option [ .type e ]
        .option [ .name f ]
        .option [ .name i .type int ]
    ]

    are mapped to:

    type e = [ `a | `b ]
    type v = [ e | `f | `i of int ]
  • List type is mapped OCaml list type.

    For example,

    .list [
        .name l
        .type x
    ]

    is mapped to:

    type l = x list

    It is possible to use OCaml array instead of list by specifying an additional .ocaml-array property in the field definition, e.g.:

    .list [
        .name l
        .type x
        .ocaml-array
    ]

    is mapped to:

    type l = x array
  • Aliases are mapped to OCaml type definitions.

    For example,

    .alias [
        .name a
        .type x
    ]

    is mapped to:

    type a = x

3.4. Custom OCaml types

Piqi provides a way to define mappings between custom OCaml types and Piqi types. Such mechanism is useful when there is a need to automatically serialize an OCaml type using some relevant Piqi type, but there is no way to describe the desired OCaml type using Piqi.

Inability to use Piqi to define an OCaml type would mean that the OCaml type is either a primitive built-in or abstract type (e.g. char or bigint), or some higher-order parametric type (e.g. string Map.Make(String).t).

The mapping mechanism works as follows. Suppose we need to add support for serializing OCaml’s char type as Piqi int. This can be done in a few steps:

  1. First, define Piqi alias for such mapping:

    .alias [
        % the new Piqi type
        .name char
    
        % the original Piqi type
        .type int
    
        % OCaml type (should be point to the namespace with the mapping
        % implementation -- see below)
        .ocaml-type "Piqirun_custom.char"
    
        % optionally, define a custom OCaml name for this type
        % .ocaml-name "char"
    ]
  2. Second, implement runtime functions for mapping the custom OCaml type to the Piqi type:

    In module piqirun_custom.ml:

    type char = Char.t
    
    let char_of_int: int -> char = Char.chr
    let char_to_int: char -> int = Char.code

After that, the only thing that’s left is to compile and link piqirun_custom.ml with your OCaml program.

More examples of how to map various OCaml types to Piqi types can be found here.

3.5. Piqi extensions

There is no direct notion of Piqi extensions in OCaml: Piqi extensions are all resolved and applied to Piqi types before generating OCaml types from them.

4. Examples

  • The first example is based on the "addressbook" example from Protocol Buffers source distribution. It contains OCaml implementation of two simple programs: for adding a record to an addressbook and for listing addressbook contents. The programs implement the same functionality as programs from the Protocol Buffers examples written in C++, Java and Python.

    examples/ocaml

  • Data serialization in XML, JSON and Piq formats using piqic-ocaml --multi-format

    In the same directory, there is the io_json_xml_pb.ml OCaml module. It reads and writes the addressbook data structure from the previous example in a variety of formats.

  • Piqi implementation itself makes another example

    Piqi is implemented in OCaml but the Piqi language and Piqi internal representation are defined in a series of Piqi specifications which are mapped to OCaml types.

    Piqi self-specifications

  • More complicated example demonstrating complex types and module imports

    Piqi compiler for OCaml (piqic-ocaml) produces OCaml parsers and generators from Piqi self-specification (piqi.piqi). After that, an OCaml program reads (and writes back) Piqi self-specification represented as a binary object.

    tests/ocaml_piqi

  • Examples of serializing custom OCaml types using Piqi

    examples/ocaml-custom-types

  • Example of using Piq as a config file format

    examples/ocaml-piq-config

5. Limitations

The way how Piqi records are mapped to OCaml records introduces several limitations.

  • Limited support for defaults.

    There is no way to tell if the value of an optional field came from the original serialized object or it is the default value.

    There is no way to skip default values when serializing an optional field since concrete value for that field must be always present in the OCaml record. This behavior may be optimized in the future, but at the moment, it will produce somewhat excessive serialized representation for optional fields with default values.

  • No other dynamic properties.

    For example, in Protocol Buffers, there is a way to get the count of repeated fields and access them using field index. (Update: now it is possible to do that by specifying to use OCaml arrays instead of lists for repeated fields).

An alternative method would use OCaml objects for representing records, providing "setters" and "getters" for object fields. Such method is used, for example, in Protocol Buffers mappings for C++, Java and Python languages.

Although it is possible (and even easier) to implement this method for OCaml, the current method has several advantages:

  • Native syntax for record construction and field access.

  • Pattern matching works with records.

  • No runtime overhead of calling constructors, setters and getters.

Other limitations:

  • Piqi runtime library hasn’t been heavily optimized for performance yet.

6. Supported OCaml and Protocol Buffers versions

Piqi works with OCaml >= 3.12 and Protocol Buffers >= 2.3.0