Camlp5 syntax extensions¶
A few syntax extensions are used in this project.
For testing we use the one from logger-p5
opam package. It allows to convert OCaml
expression to its string form. For example, it rewrites let _ = REPR(1+2)
to
$ echo 'REPR(1+2)' > /tmp/a.ml
$ camlp5o `ocamlfind query logger`/pa_log.cmo pr_o.cmo /tmp/a.ml
let _ = "1 + 2", 1 + 2
For OCanren itself we use syntax extension to simplify writing relational programs
$ echo 'let _ = fresh (x) z' > /tmp/a.ml
$ camlp5o _build/default/camlp5/pa_ocanren.cma pr_o.cmo /tmp/a.ml
let _ = OCanren.Fresh.one (fun x -> delay (fun () -> z))
Camlp5 syntax extension is actually two syntax extensions
the one that make syntax similar to Scheme’s miniKanren. PPXes do the same.
another one, which makes it more ocaml-like (@dboulytchev likes this syntax very much)
Here we will try to describe the second one.
Note
Frankly speaking, PPXes are documented better.
OCanren vs. miniKanren¶
The correspondence between original miniKanren and OCanren constructs is shown below:
miniKanren |
OCanren |
---|---|
|
success |
|
failure |
|
|
|
|
|
|
|
|
In addition, OCanren introduces explicit disjunction (|||
) and conjunction
(&&&
) operators for goals.
OCanren-specific Camlp5 syntax extension¶
(Also known as regular extension)
This section is a part of Yue Li’s tutorial on OCanren.
We take a look at the
implementation of
the ocanren{}
quotation which we will call the formula parser in
the rest of this lesson. Our terminology follows Camlp5 Reference
Manual.
We take a top-down approach, starting with an overview of the structure of the parser, then
explain its individual parts.
The structure of the parser: an overview¶
We describe the formula parser as the reader (a Camlp5 novice) sees it, and then putting it in perspective, briefly explain how it works.
What we see¶
This code links with Camlp5 libraries, in particular the one that provide ways to extend syntax: pa_extend. Loading them amounts to extending the OCaml syntactic category expression with several sub-categories one of which is named extend:
expr = ... | extend ;
extend = "EXTEND", extend-body, "END" ;
An expression that belongs to the category “extend” would be called an EXTEND statement.
Our formula parser has only one EXTEND statement, whose extend-body starts with a global indicator followed by a semicolon separated list of entries (whose names are, exhaustively, long_ident, expr, ocanren_embedding, ocanren_expr, ocanren_term, ocanren_term’ and ctyp). An entry is a (vertical bar separated) list of levels (with a pair of enclosing square brackets); a level is a (vertical bar separated) list of rules (with a pair of enclosing square brackets); a (non-empty) rule is a (semicolon separated) list of “psymbols” (collectively acting as a pattern) followed by an optional semantic action that produces an abstract syntax tree (or AST, of any string that matches the pattern specified by the list of psymbols). The details on the syntax and semantics of the “extend” category can be found in the Extensible Grammars section of the Camlp5 Manual.
Besides the EXTEND statement our formula parser has some auxiliary functions such as decapitalize, ctor and fix_term etc.
How it works¶
The syntax extension kit pa_extend
is fundamental for the
cascade of extensions described below. The entries expr
and ctyp
origin from the module Pcaml that is the core of Camlp5 and is
opened by the
formula parser. Pcaml initializes the (empty) grammar entries expr
and ctyp
. The standard OCaml parsing kit of Camlp5 then defines them
by means of an EXTEND statement and accordng to the standard syntax of
OCaml. Our EXTEND statement further extends these global entries with
locally defined entries — entries other than expr
and ctyp
in
our EXTEND statement are locally defined, such as ocanren_embedding
,
ocanren_expr
and ocanren_term
etc. Below you can find
three stages of extension with links to documentation
.. The following table summarizes the stages of extension, providing links to copies of relevant files from either OCanren ource or Camlp5 source, together with their documentations.
Initialization is documented in The Pcaml module and happend in file pcaml.ml : expr and ctyp.
As a preprocessing tool, Camlp5 defines its own parser pa_o.ml
for
standard OCaml, so that any standard OCaml code can be converted by it
into an AST recongnizable by the OCaml
compiler. Is
pa_o.ml
a redundant piece of work for we can just use the OCaml
compiler to build the AST? Not exactly, because besides pa_o.ml
,
Camlp5 also provides EXTEND statments so that syntactic categories
defined in pa_o.ml
can be extended. The result is that using the
combination of pa_ocanren.ml
and pa_o.ml
we can convert code
that is not wholly in OCaml into a purely OCaml AST.
Conclusion¶
The OCanren formula parser has the EXTEND statement as its core, which
consists of a list of entries, notably the global entries expr
and
ctyp
that extend the corresponding predefined entries that conform
to standard OCaml. Such extension is characterized by the locally
defined entries such as ocanren_embedding
.
We will next focus on the extension of expr
and leave ctyp
.
As far as the semantics is concerned entries are
parsers for syntactic categories. From now on we use the words “entry”
and “parser” interchangeably.
The global entry: expr
¶
This is the major entry of the OCanren formula parser, which starts like:
expr: LEVEL "expr1" [ ...
where we see the entry name expr and the position LEVEL "expr1"
.
We now use OCanren-expr to refer to the expr
entry in the
OCanren formula parser, and OCaml-expr to refer to the predefined
entry expr
in the Camlp5 parsing kit for standard OCaml.
OCanren-expr extends OCaml-expr in the position
LEVEL "expr1"
: the first level of the OCanren-expr is merged
with the level named “expr1” of the
OCaml-expr, i.e., their rules are put together and grouped as a
single level named “expr1”; other levels from OCanren-expr are
inserted into OCaml-expr as new levels, right below the extended
“expr1” level. There are three levels in the OCanren-expr, the third
of which is:
[ e=ocanren_embedding -> e ]
This third level of OCanren-expr is inserted as a new level in
OCaml-expr, and the entry ocanren_embedding
directly corresponds
to the ocanren{}
quotations we see in the library implementation, so
that we can mix ocanren{}
quotations with standard OCaml
expressions, and Camlp5 will take care to convert such mixture into
standard OCaml AST. We now explain the local entry
ocanren_embedding
.
Local entries I: ocanren_embedding
and ocanren_expr
¶
The entry ocanren_embedding
directly corresponds to the
ocanren{}
quotations we see in the library implementation, and it
further calls the entry ocanren_expr
to parse the content between
the braces:
ocanren_embedding: [[ "ocanren"; "{"; e=ocanren_expr; "}" -> e ]];
The ocanren_expr
entry has four levels which strongly reminds us of
the recursive definition of a formula, i.e, a formula is either atomic,
or a conjunction/ disjunction of two formulae, or an existential
quantification over a formula, or an explicitly delimited formula (with
braces).
1. The first level parses a disjunction:
"top" RIGHTA [ l=SELF; "|"; r=SELF -> <:expr< OCanren.disj $l$ $r$ >> ]
The second level parses a conjunction:
RIGHTA [ l=SELF; "&"; r=SELF -> <:expr< OCanren.conj $l$ $r$ >> ]
3. The third level parses a fresh variable introduction (i.e., existential quantification):
[ "fresh"; vars=LIST1 LIDENT SEP ","; "in"; b=ocanren_expr LEVEL "top" ->
List.fold_right
(fun x b ->
let p = <:patt< $lid:x$ >> in
<:expr< OCanren.call_fresh ( fun $p$ -> $b$ ) >>
)
vars
b
]
4. The fourth level parses atomic, named and grouped formulae (and else):
"primary" [
p=prefix; t=ocanren_term -> let p = <:expr< $lid:p$ >> in <:expr< $p$ $t$ >>
| l=ocanren_term; "==" ; r=ocanren_term -> <:expr< OCanren.unify $l$ $r$ >>
| l=ocanren_term; "=/="; r=ocanren_term -> <:expr< OCanren.diseq $l$ $r$ >>
| l=ocanren_term; op=operator; r=ocanren_term -> let p = <:expr< $lid:op$ >> in
let a = <:expr< $p$ $l$ >> in
<:expr< $a$ $r$ >>
| x=ocanren_term -> x
| "{"; e=ocanren_expr; "}" -> e
(* other rules omitted *)
...
]
The order of the levels determines the precedence of the logic connectives: the parser first sees if the formula is a disjunction at the top level, if not, sees if it is conjunction, and so on, implying that disjunction has the least precedence, above which is conjunction, then existential quantification, and finally syntactic equality, disequality and braced groups (among others) enjoy the highest precedence. We can justly call a level: a “precedence level”.
The first and second level also have the associativity indicator
RIGHTA
, requiring that the conjunction and disjunction connectives
associate to the right.
The third level refers back to the first level (named “top”) when
parsing the <body>
part of a formula of the form
fresh <vars> in <body>
, implying that the scope of fresh
extends
to the right as far as possible.
Quotations and antiquotations¶
In every rule above we could see at least one quotation:
quotation = "<:", quotation name, "<", quotation body, ">>"
Within a quotation body we may see an antiquotation:
antiquotation = "$", antiquotation body, "$"
If antiquotations are not allowed, then a quotation body is any
expression in the revised
syntax of OCaml. At
parse time a quotation is expanded by the
(loaded and
predefined)
quotation expander q_MLast.cmo
into an AST of the quotation body. An
antiquotaion body is usually a pattern variable bound to some other AST
which is inserted into the the quotation body’s AST.
Local entries II: ocanren_term
and ocanren_term'
¶
The values that we write in an ocanren{}
quotation, such as
"this is a string"
, 'c'
(a single character), true
(a
boolean value), S (S O)
(a constructor application), (O, S O)
(a
tuple), 15
(an integer), [1;2;3]
(a list) and false :: []
(amending a list) etc., are converted into the injected level from the
ground level where they seem to be. For example, the occurrence of
S (S O)
in the expression below is transformed into
s (s (o ()))
:
ocanren { fresh x in S (S O) == x }
Such conversion bridges the gap between the programmer’s intuition of
writing OCaml values and OCanren’s internal representation of the same
values, Inspecting the entries ocanren_term
, ocanren_term'
and
their auxiliary functions help us know precisely how the conversion is
performed.
Below is the definition of the entry ocanren_term
:
ocanren_term: [[ t=ocanren_term' -> fix_term t ]];
where the ocanren_term'
parser is called immediately to process
expressions like S (S O)
and the intermediate result (an AST) is
bound to the pattern variable t
and then passed to the auxiliary
function fix_term
. The AST returned by fix_term
is returned by
the parser ocanren_term
.
The ocanren_term'
parser has four levels, namely:
1. “app”, for applications.
"app" LEFTA [ l=SELF; r=SELF -> <:expr< $l$ $r$ >> ]``
Applications are treated as being left associative as indicated by
LEFTA
. This level not yet converts constructor applications into
injection function applications. Instead it simply builds the AST of the
application in a straightforward manner, not distinguishing a
constructor application from a function application.
“list” , for non-empty lists with
::
as the top level constructor.
"list" RIGHTA [ l=SELF; "::"; r=SELF -> <:expr< OCanren.Std.List.cons $l$ $r$ >> ]
The constructor ::
is replaced by the OCanren standard library
function cons
which is the injection function for the constructor
OCanren.Std.List.Cons.
3. “primary”, which has rules for:
"!"; "("; e=expr; ")" -> eSo that the
ocanren{}
quotation would take any<value>
from!(<value>)
as is without further processing. In other words, the<value>
will be parsed using the entryexpr
.c=INT -> let n = <:expr< $int:c$ >> in <:expr< OCanren.Std.nat $n$ >>Thus, occurrences of integers like
15
within theocanren{}
quotation would be converted to values of the Peano number type that is provided by the OCanren standard library OCanren.Std.Nat.
characters and strings
c=CHAR -> let s = <:expr< $chr:c$ >> in <:expr< OCanren.inj (OCanren.lift $s$) >> | s=STRING -> let s = <:expr< $str:s$ >> in <:expr< OCanren.inj (OCanren.lift $s$) >>Characters and strings are injected using the primary injection function
!!
(see its signature and implementation).| true" -> <:expr< OCanren.Std.Bool.truo >> | "false" -> <:expr< OCanren.Std.Bool.falso >>Boolean values are converted into the corresponding injected values from the OCanren standard library LBool.
lists delimited by
[]
and;
<https://github.com/JetBrains-Research/OCanren/blob/master/camlp5/pa_ocanren.ml#L273>`__"["; ts=LIST0 ocanren_term' SEP ";"; "]" -> ( match ts with | [] -> <:expr< OCanren.Std.nil () >> | _ -> List.fold_right (fun x l -> <:expr< OCanren.Std.List.cons $x$ $l$ >> ) ts <:expr< OCanren.Std.nil () >> )The entry ocanren_term’ is recursively called to process the list members and the injection functions for list constructors are applied.
operators (which are not qualified)
"("; op=operator_rparen -> <:expr< $lid:op$ >>Operators are specified by the auxiliary function
`is_operator
<.https://github.com/JetBrains-Research/OCanren/blob/master/camlp5/pa_ocanren.ml#L87>`__ and extracted by another auxiliary function operator_rparen (the name of which reads “operator right parenthesis”).
(possibly empty) tuples
"("; ts=LIST0 ocanren_term' SEP ","; ")" -> (match ts with | [] -> <:expr< OCanren.inj (OCanren.lift ()) >> | [t] -> t | _ -> <:expr< ( $list:ts$ ) >> )There is a recursive call of the entry itself to process members of the tuple, and then the AST of the tuple is built.
The level for long identifiers.
[ long_ident ]
This level calls the entry long_ident to build AST’s of (possibly qualified) upper / lower case identifiers and operators which are taken as is.
Therefore, given S (S O)
the ocanren_term'
parser would return a straightforward translation into an AST. The interesting thing is done by fix_term
and its helper ctor
(read “C-tour”). The latter tests the input: if it is a (possibly qualified) uppercase identifier then sets the initial letter to lowercase and wraps the whole thing by
Some
, e.g., Mod1.Mod2.ABC
becomes (roughly) Some Mod1.Mod2.aBC
; if the input is not a (qualified) uppercase
identifier then returns None
:
let rec ctor e =
let loc = MLast.loc_of_expr e in
match e with
| <:expr< $uid:u$ >> -> Some (<:expr< $lid:decapitalize u$ >>)
| <:expr< $m$ . $e$ >> -> (match ctor e with Some e -> Some (<:expr< $m$ . $e$ >>) | _ -> None)
| _ -> None
The fix_term function then recurses down the structure of applications to systematically replace uppercase identifiers with lowercase identifiers
produced by ctor
. After a constant constructor is changed to lowercase, it is provided with the unit value ()
as the argument, e.g., O
becomes o ()
. A non-constant constructor is not only set to lowercase, but also has its argument list transformed, e.g., Cons(a,b)
becomes (roughly) cons a b
. Tuples are also replaced
by their OCanren standard library counterpart — logic tuples.
These lowercase identifiers converted from constructors are supposed to be injection functions, which must be defined by the programmer somewhere in the program, otherwise there would be some compile-time error like “unbound identifier”. This explains why the injection function names are always differ from the corresponding constructor names by one letter: the initial letter.