# A Formal Introduction to Models of Computation in Coq (Part 1)

## Introduction

This series will work through some introductory material on models of computation (mostly based on this course. In particular, this article follows this). There are two goals: to teach the material itself, as well as teach how to use Coq/the basic theory behind it.

You should probably be somewhat familiar with some functional programming language, though which one is probably not important. You should also probably have some knowledge of basic math/logic, but other than that, it should be mostly self-contained.

This article will simply introduce the idea of using Coq to prove things, then doing some simple proofs involving strings to set things up for next time.

## Introduction to Coq

Coq is a system for theorem proving, called a proof assistant. It includes several sub-programming languages for variable specifications of definitions and proofs.

### Curry-Howard Isomorphism

The whole thing is based on the Curry-Howard isomorphism; the basic idea is that types are propositions, programs are proofs. What does that mean?

Essentially, if it is possible to construct a value of some type, $$\tau$$, then that type is "true", in the sense that we can produce a proof of it. And what is the proof? The value of type $$\tau$$. This probably seems weird, because it's weird to think of the type Int as being "true", but it just means that I can prove that an Int exists: just pick your favorite one (mine is 0).

In this case, a slightly more complicated example will probably make more sense. Let's consider implication: if $$P$$, then $$Q$$. What does this correspond to as a type? A function! Proving if $$P$$, then $$Q$$ is the same as writing a function that takes a value of type $$P$$ and produces a value of type $$Q$$. In Java we might write:

public <P, Q> Q f(P p) {
// ...
}

but since this series is about Coq, I'll mostly write in Coq from now on:

Definition f : P -> Q := (* ... *).
(* alternatively *)
Definition f (p : P): Q := (* ... *).

As a more concrete example of implication, take the following introduction rule in intuitionistic logic:

$$\frac{A~~~B}{A \wedge B}$$

This rule says that if $$A$$ is true, and $$B$$ is true, then $$A \wedge B$$ is true. This is a fairly obvious definition for what "and" means, and it's a nice and simple example to demonstrate how the Curry-Howard isomorphism works. Recalling that "types are propositions", let's consider what the proposition here is: if $$A$$ and $$B$$, then $$A \wedge B$$. What would this mean in a programming lanugage? If you have a value of type a, and a value of type b, then you can create a value which contains values of both type a and b. That is, a pair, of the two values.

In Haskell, we would write this as:

prop :: a -> b -> (a, b)
prop a b = (a, b)

The first line, the type, is the proposition. The second line, the definition of the function, is the proof: a program that shows how to go from the assumptions (a value of type a exists, and a value of type b exists), to the conclusion (so there is a value which contains values of both types).

Translating to Coq, this becomes:

Definition conjunc (a : A) (b : B) : A /\ B := conj a b.

## Definining Strings

We will define strings inductively, because it's the most natural way to do it. We're actually going to define lists, not strings, but the two behave very similarly, at least for the purposes of this article.

Inductive list (A : Type) : Type :=
| nil : list A
| cons : A -> list A -> list A

This is actually already defined for us in Coq, so we'll just use the following lines to import it (and some nicer notation):

Require Import List.

Import ListNotation.

The new notation essentially gives us the following equivalences (note the below is not valid Coq code):

[] = nil
x :: xs = cons x xs

Similarly, we also want to use natural numbers (defined as follows), because we care about the length of our strings, and the length of a list is well-represented by a natural number.

Inductive nat : Type :=
| O : nat (* note this is the letter O, not the number 0 *)
| S : nat -> nat

Similar to above, we import a module to gain access to definitions, and, more importantly, proofs of obvious properties of natural numbers, like commutativity of addition. Without these proofs, it's a pain to work with any numbers because of the sheer quantity of seemingly "obvious" statements about natural numbers that we take for granted, such as associativity or commutativity of addition.

Require Import PeanoNat.

Finally, we define an arbitrary type, which essentially functions as the alphabet for our strings. That is, each "letter" (element) of our string (list) is a value of type A.

Variable A : Type.

### Defining Basic Operations

Let's now define some basic operations: length, concatenation, and reversals.

Each of these definitions will be recursive, so we'll use the Fixpoint keyword to indicate this to Coq. Note that Coq requires structural recursion, which ensures that all our functions will terminate. Basically, structural recursion means that you can only do recursive calls on "smaller" values: if someone calls $$f(n)$$, you can recursively call $$f(n-1)$$, but not $$f(n+1)$$. Without this, nonterminating functions may be used to prove anything, which is not really what we want. We lose Turing completeness like this, but as you'll see, we can still prove plenty of things.

#### Length

The first operation to define is the length of a string. As you will see is quite common, we define the function for each case or constructor of the type we are recursing on; in this case, we are recursing on a list, so we need to cover both the empty list and the non-empty list. Doing this, we can define that a list has a length whose value is given by:

Fixpoint length (s : list A) : nat :=
match s with
| []      => 0
| _ :: xs => 1 + length xs
end.

That is, if we have the empty string, [] (or, mathematically $$\epsilon$$), then it's length is just 0. Otherwise, if we have some symbol followed by a string $$xs$$, the total length is simply one greater than the length of $$xs$$. Note that when we don't care about the value of a particular variable, in this case, the first symbol in the string, we can simply write an underscore to indicate that we don't care. This is good practice because it's a hint to whoever is reading our definitions that they can mentally ignore that part altogether.

#### Concatenation

This operation will take two strings, "hello " and "world" and join them into "hello world". You are probably already familiar with this operation, but may not have seen the recursive definition before. Similar to above, we define it both for the case where a is the empty list, and when it is not empty. While recursive definitions like this may seem confusing, they actually make the proofs much more natural.

Fixpoint concat (a b : list A) : list A :=
match a with
| []      => b
| x :: xs => x :: concat xs b
end.

Note that, even though there are two strings, we still only recurse on one of them, specifically, the first one, a. We also introduce new notation to make our lives easier (this line allows us to write x ++ y instead of concat x y)):

Notation "x ++ y" := (concat x y).

#### Reversing

Finally, the reversal of a string. This definition is not computationally the most efficient (given our concatenation definition, it has to traverse the entire list to append the element to the end. We could fairly easily write a $$O(n)$$ version using an accumulator, but this definition is not only simpler to state, but it's also equivalent to the other definition. More importantly, given that we're just interested in it for proof purposes (for the moment, at least), we only care that it works and that it's easy to prove theorems with it. In this case, both criteria are satisfied, so we're good.

Fixpoint reverse (s : list A) : list A :=
match s with
| []      => []
| x :: xs => reverse xs ++ [x]
end.

As before, we handle the case of the empty list and non-empty list separately. Reversing an empty list has no effect. Reversing a non-empty list simply takes the first element, reverses the rest of the elements, and places this first element at the end of the list. If you are unsure of how this definition (or any of the preceding definitions) works, you should stop and take the time to figure them out.

## Proving Basic String Properties

### Basic Lemmas

In this section, we will prove the following lemmas, which will be useful to us throughout the rest of this article.

#### Empty String is Concatenation Identity

The first of these is that the empty string, $$\epsilon$$ functions as a right identity for concatentation (it is already a left identity by definition). We write the following lines, which put us into Coq's interactive proof evironment. Here we use a set of proof tactics which take our current propositions and transform them until we can apply axioms to finish the proof.

Lemma concat_empty_string_id : forall (s : list A), s ++ [] = s.
Proof.

After this line, the goals we have are:

forall (s : list A), s ++ [] = s.

This is exactly what we wrote, and to start the proof, we must introduce variables, so that we can talk about the variables in our hypothesis. In this case, we have only one variable, s. Continuing the idea that proofs are functions, we can think of this variable as a parameter to the function, concat_empty_string_id To add this to our goals, we can write:

intro s.

Now we have:

1 subgoal
s : list A
______________________________________(1/1)
s ++ [] = s

Above the bar is our hypotheses, and below the line is what we are trying to prove. This means we are trying to prove, for some arbitrary list s, that s ++ [] = s. We will do so by structural induction on the list (this is one of the things defined for us in the List module). Again, going back to programming, this is like case anaylsis: if the list is empty, then we our function. Otherwise, if the list is not empty, we write our function by recursively calling our function on the smaller list. To use induction, we simply write:

induction s.

Now we get:

2 subgoals
______________________________________(1/2)
[] ++ [] = []
______________________________________(2/2)
(a :: s) ++ [] = a :: s

This means that we have two subgoals to prove: the base case, when the list is empty, and the inductive case, when it is not. In this inductive case, we know that the list of is of the form a :: s, that is, there is some element, a, followed by possibly many more characters, that is, another string, called s.

By default, we prove the subgoals in order, though it is possible to switch the order if you really need to. In this case (and most), the default order works well. Thinking about what the first subgoal is saying, [] ++ [] = [], we know that it is quite obviously that this is true—but we must still provide a proof. Luckily, simply by definition (in the first case), we can see that the left hand side reduces to [] (this is the base case in our definition of the ++ operation). In Coq, to apply function definitions, we can use the simpl tactic, like so:

simpl.

Now we are left with:

2 subgoals
______________________________________(1/2)
[] = []
______________________________________(2/2)
(a :: s) ++ [] = a :: s

Whenever you see the same thing on both sides of the equality, you can use the reflexivity tactic to prove your goal, since the proposition is true by reflexivity. So we end the proof of this case with:

reflexivity.

This resolves the first subgoal, and now we are left with:

1 subgoal
a : A
s : list A
IHs : s ++ [] = s
______________________________________(1/1)
(a :: s) ++ [] = a :: s

This looks considerably more complicated, so let's consider each part on it's own. By doing induction, we've handle the case of the empty string, so now we just need to handle the case that the string is not empty, that is, it starts with some symbol a and ends in a string s. The part below the line is simply the thing we are trying to prove, x ++ [] = x, but with a :: s instead of x. Finally, IHs is the inductive hypothesis, which tells us that the lemma is true for smaller list, s.

Usually, if we have an equality, we will want to make a substitution in our goal, either substituting the left hand side for the right, or vice versa. However, in this case, substitution doesn't seem to help us, as the only way we can substitute, the right hand side for the left, only serves to complicate the equation. Instead, we want to simplify the equation first. It's often a good idea to simplify your equation whenever you can, and in this case, we can apply the second case in our concat function (via the simpl tactic) to simplify this as follows.

1 subgoal
a : A
s : list A
IHs : s ++ [] = s
______________________________________(1/1)
a :: (s ++ []) = a :: s

I have added the parentheses myself for clarity. Now we can use the inductive hypothesis, IHs to replace s ++ [] with s. To do so, we use the rewrite tactic, like so:

rewrite IHs.

Note that to substitute the right hand side for the left, that is, to substitute s ++ [] for s, we would write:

rewrite <- IHs.

After rewriting, we have:

1 subgoal
a : A
s : list A
IHs : s ++ [] = s
______________________________________(1/1)
a :: s = a :: s

Now we have an equality with both sides the same, and we can finish the proof with an application of reflexivity. When you see:

No more subgoals.

When you see this, you can finish the proof with:

Qed.

The entire proof is shown below, with the tactics for each induction subgoal on the same line:

Lemma concat_empty_string_id : forall (s : list A), s ++ [] = s.
Proof.
intuition.
induction s.
simpl. reflexivity. (* empty list case *)
simpl. rewrite IHs. reflexivity. (* non-empty list (inductive) case *)
Qed.

The rest of the proofs will be done in far less detail.

#### Length with Concatenation

The next lemma is that the length of x concatenated with y, is the length of x added to the length of y. In Coq we write:

Lemma length_concat : forall (a b : list A), length (a ++ b) = length a + length b.

The proof is nearly identical to the first proof we did, except that we only do induction on a. This makes sense, given that concatenation is defined recursively only on the first argument. We begin as we did before:

Proof.
intros a b.
induction a.

The base case is again obvious by simplification:

simpl. reflexivity.

The other case is proved similarly:

simpl. rewrite IHa. reflexivity.

Note that we use IHa rather than IHs, because we did induction on a, rather than s. Had we named our variables differently, the inductive hypothesis will have a different name.

#### Concatenation is Associative

This lemma is an important fact that is very useful for proving things, as well as essentially finishing the proof that strings form a monoid under concatenation. In Coq:

Lemma concat_assoc : forall (a b c : list A), a ++ (b ++ c) = (a ++ b) ++ c.

The proof is identical to the previous two (but you should still do it!).

## Various Lemmas

In this section we continue to prove several more lemmas involving the definitions and lemmas above.

### Reversing a string doesn't change it's length

Lemma length_reverse : forall (s : list A), length s = length (reverse s).

This proof will require slightly more effort, though we begin with induction just like the other proofs. In particular, we'll need the use of a lemma from above and the fact that addition is commutative for natural number. Beginning by induction, the base case is trivial because the length of an empty string is 0 by definition, and the reversal of an empty string is also an empty string.

intros s.
induction s.
simpl. reflexivity.

Our inductive case is now to show (after a simplification):

S (length s) = length (reverse s ++ [a])

S denotes the successor function, so our goal is $$1 + |s| = |s^R \cdot a|$$, where $$s^R$$ is the reverse of $$s$$. Luckily, we have a lemma from above that allows us to simplify the right hand side (that is, $$|x \cdot y| = |x| + |y|$$. We can apply it like so, and then applying the inductive hypothesis almost finishes the proof.

rewrite length_concat.
simpl.
rewrite IHs.

The final step we need is to rewrite $$|s^R| + 1$$ as $$1 + |s^R|$$, which can be accomplished by a built-in theorem, imported from PeanoNat. This property is simply commutativity of addition, and is appropriately named Nat.add_comm. After application, simplification finishes the proof for us:

rewrite Nat.add_comm.
simpl.
reflexivity.
Qed.

### Reversing Concatentation

Lemma reverse_concat : forall (a b : list A), reverse (a ++ b) = reverse b ++ reverse a.

As all the other proofs in this article, we begin via induction:

intros a b.
induction a.
simpl.

This time, however, the base case is not as trivial. We must show $$b^R = b^R \cdot \epsilon$$, which is obviously true, but is not evident from the definition. Luckily, we proved this earlier, so we can reuse our proof, which finishes up this case:

rewrite concat_empty_string_id.
reflexivity.

For the inductive step, we immediately simplify and use the inductive hypothesis. This makes our goal into:

(reverse b ++ reverse a0) ++ [a] = reverse b ++ reverse a0 ++ [a]

The only difference here is the grouping—luckily we proved that concatenation is associative earlier, so here we simply use it:

rewrite <- concat_assoc.
reflexivity.
Qed.

Note that we rewrite the "opposite" way: replacing the right hand side with the left, so we use an arrow <-. This finishes our proof.

### Reversing a reversed string gives the original string

Lemma reverse_reverse_id : forall (s : list A), reverse (reverse s) = s.

Again, we proceed by induction. Here the base case is trivial, as it all simplifies by simple application of the definitions:

intros s.
induction s.
simpl. reflexivity.

The inductive case is also relatively straightforward with an application of the previous theorem, followed by the use of the inductive hypothesis.

simpl.
rewrite reverse_concat.
simpl.
rewrite IHs.
reflexivity.
Qed.

## Conclusion

As a parting note, the tactic intuition can often simplify multiple steps at a time, as well as serving to introduce variables and hypotheses. So we can rewrite the first proof we did as, though in this case, it provides only a minor benefit:

Lemma concat_empty_string_id : forall (s : list A), s ++ [] = s.
Proof.
intuition.
induction s.
intuition.
simpl. rewrite IHs. reflexivity.
Qed.`

Also, note that while we wrote our proofs using proof tactics with the interactive proving environment, we can also write proofs as normal functions (it just takes a little more work).

This series will continue soon, going more deeply into strings and languages, followed by automata and more. The following parts will also be visible on this blog.