Syntax of first-order logic

From Learning Logic for Computer Science

A formula of first-order logic describes a situation in terms of the objects involved, the relations between them, and quantifications such as "for all" and "for at least one".


Example $\forall x_0 (R_0(x_0) \rightarrow R_1(x_0, x_0))$ is a first-order formula which specifies that on all objects for which $R_0$ holds, $R_1$ is reflexive.

Example $\forall x_0 \forall x_1 (f(x_0, x_1) \doteq f(x_1, x_0))$ describes that the order of the arguments of a binary function does not matter when it comes to evaluating this function, in other words, that the binary function is symmetric.

Clearly, in both examples a formula is a sequence of symbols (a character string). From a technical point of view, it is, however, more reasonable to imagine and define a formula as a tree, where

  • a tree has a root and a finite positive number of nodes,
  • each node has a label, and
  • the children of a node are linearly ordered.

The first formula as a tree

In this example the root has only one child. This node has two children, the first one is labelled $R_0$ and has one child labelled $x_0$, the second one is labelled $R_1$ and has two children, both labelled $x_0$.

For simplicity, when a tree is drawn its label is shown, because every node is labelled (and an empty label is not allowed).

It is important to note that the syntax of the formulas depends on the current signature. In one signature, a symbol can be a constant symbol, in another one, it can be a relation symbol.

The second example also shows that there are meaningful parts of a formula which are not formulas themselves: $\text{add}(x_0, x_1)$ stands for an object of the current domain, but not for a truth value. Such expressions are called terms.

The definition of formulas presupposes the definition of terms.

Formal definition of terms

A first-order variable is of the form \(x_i\) where \(i\) stands for a natural number. The set of all first-order variables is denoted \(\text{FOVars}\). There are two symbols for boolean constants, \(\bot\) and \(\top\), which stand for false and true, respectively, and are read false and true, respectively.

The definition of the set of all terms over a given signature $\mathcal S$, also called $\mathcal S$-terms, is an inductive definition.

Base elements:

  • A one-node tree labelled with a first-order variable is an $\mathcal S$-term.
  • A one-node tree labelled with a constant symbol $c \in \mathcal C$ is an $\mathcal S$-term.

These terms are called atomic $\mathcal S$-terms.

Inductive rule:

  • If $f/\!/n \in \mathcal S$ and $t_0, \dots, t_{n-1}$ are $\mathcal S$-terms, then the tree with root labelled $f$ and the $n$ subtrees $t_0$, ..., $t_{n-1}$ is a $\mathcal S$-term.

These terms are called composite $\mathcal S$-terms.

Example When $\mathcal S = \{\text{zero}, \text{one}, \text{add}/\!/2, \text{mult}/\!/2, \text{LT}/2\}$, then Syntax-term.svg is a composite $\mathcal S$-term.

It is important to note that in the definition of the terms over a given signature $\mathcal S$ with $\Sigma$, the relation symbols (and the boolean symbols) do not play a role.

A term is a ground term if no variables occur in it.

Formal definition of formulas

The definition of the set of all first-order formulas over a given signature $\mathcal S$, also called $\mathcal S$-formulas, is an inductive definition.

Base elements:

  • The one-node trees where the only node is labelled with one of the two constant symbols, \(\bot\) and \(\top\), are first-order formulas.
  • If $t_0$ and $t_1$ are $\mathcal S$-terms, then the tree with root labelled $\doteq$ and the subtrees $t_0$ and $t_1$ is a first-order formula.
  • If $R/n \in \mathcal S$ and $t_0, \dots, t_{n-1}$ are terms, then the tree with root labelled \(R\) and the \(n\) subtrees $t_0, \dots, t_{n-1}$ is a first-order formula.

These formulas are called atomic $\mathcal S$-formulas.

Inductive rule:

  • If \(C\) is a connective of arity \(n\) and \(\varphi_0, \dots, \varphi_{n-1}\) are first-order formulas, then the tree with root labelled \(C\) and the \(n\) subtrees \(\varphi_0\), ..., \(\varphi_{n-1}\) is a first-order formula.
  • If $x_i$ is a first-order variable and $\varphi$ a $\mathcal S$-formula, then
    • the tree with the root labelled $\exists x_i$ and the subtree $\varphi$ and
    • the tree with the root labelled $\forall x_i$ and the subtree $\varphi$
are first-order formulas.

These formulas are called composite $\mathcal S$-formulas.

Example Syntax-tree-first-order-formula.svg is an $\mathcal S$-formula for $\mathcal S = \{R_0/1, R_1/2\}$ and $\mathcal S = \{R_0/1, R_1/2, f/\!/5\}$, but not for $\mathcal S = \{R_0/1, R_1/1\}$.

Commonly used notation

To simplify notation, formulas are often written as sequences of symbols (strings) using parantheses rather than drawn as trees. There is a general notation, which is used in definitions and proofs, and a more human-readable one.

Prefix notation

The general notation is the commonly used prefix notation, where a quantification, a relation symbols, a function symbol, or a connective is prefixed to the sequence of the strings for the children and the elements of the sequence of children are separated by commas. Beginning and end of this sequence are marked by an opening and a closing parenthesis.

Example $\forall x_0 (\rightarrow (R_0(x_0), R_1(x_0, x_0)))$ is a good example for this notation.

Human-readable notation

The starting point for human-readable notation is the above prefix notation. The difference is that some parentheses are saved to avoid cluttering and connectives as well as certain function and relation symbols are written infix to get rid of commas.

The rules for human-readable notation are:

  1. If a connective is used with arity two or more, then it is written in infix notation with parentheses around the string. At the same time, capital connectives become small connectives (which is consistent with the semantics). Example: $\neg (R(x_0) \wedge R(x_2) \wedge S(x_0, x_1))$ instead of $\neg (\bigwedge(R(x_0), R(x_2), S(x_0, x_1)))$.
  2. These parantheses can be ommitted for the connective at the root (in the corresponding tree) and if the connective of the corresponding parent node has lower precedence. For the precedence of the connectives, see list of connectives. Example: \(R(x_0) \wedge R(x_2) \wedge S(x_0, x_1)\) instead of \(\bigwedge(R(x_0), R(x_2), S(x_0, x_1))\).
  3. Parentheses for a connective with arity \(1\) or a quantification can be omitted when the child is a constant symbol or a variable or a connective with arity \(1\). Example: $\exists x_0 R(x_0)$ instead of $\exists x_0 (R(x_0))$.
  4. Similar rules apply to terms, but there are no precedences. Function and relation symbols are only written in infix notation when they are of arity $2$ and, in addition, they are really symbols rather than names. So $x_0 \doteq f(x_0)$ is used instead of $\doteq(x_0, f(x_0))$.
  5. Symbols for terms bind stronger than all other symbols. Example: $\exists x_0 x_0 \doteq x_1$ is a valid way of writing the formula which is otherwise written $\exists x_0(\doteq(x_0, x_1))$ in prefix notation.

Reading formulas

The second inductive rule in the definition of formulas introduces what is called quantification. From the semantics it will become clear that it is reasonable to read these quantifications as follows.

First, an existential quantification $\exists x_i$ is read "there exists $x_i$ such that ...". Second, a universal quantification $\forall x_i$ is read "for all $x_i$ ...".