In this article we introduce the theory of computation and lay the foundation for articles to come by going over mathematical concepts and how to prove theorems which will deem useful in understanding the domain of compiler design.
Table of contents.
 Introduction.
 Mathematical groundwork.
 Proofs.
 Summary.
 References.
Introduction.
The purpose of the theory of computation is to develop mathematical models of computation which reflect realworld machines.
It aims to answer questions such as the following:
 What is a computation?
 Can everything be computed?
 What are the mathematical properties involved with computer hardware and software.
This theory is applicable in very many areas in computer science such as compiler design, string and pattern matching, artificial intelligence, computer security and design of new programming languages just to name a few.
The theory of computation is divided into complexity theory, computability theory and automata theory.
Complexity theory.
The question answered here is this, What makes some problems hard to compute while others easy to compute?
First we classify problems in degrees of difficulty then come up with proofs to prove that they are indeed hard problems.
We define an easy problem as a problem which can be efficiently solved. An example is trying to find a name in a large phone book or sorting names or numbers.
A problem is considered hard if it can't be solved efficiently.
For example, computing VLSI chip layouts or getting the prime factors of a 300digit number.
Computability Theory.
Can a problem be solved by a computer? Here we classify problems as either solvable or unsolvable.
Godel, Turing and Church in the 1930s discovered that some problems were unsolvable by a computer for example, determining if an arbitrary mathematical statement is true or false.
It is these theoretical models which were proposed to understand solvable and unsolvable problems that prompted the development of real computers.
Automata Theory.
This aims to answer the question, Do the defined models have similar capabilities or can one model solve more problems than another.
These models include:
 Finite Automata: which is used in text processing, hardware and compiler design.
 ContextFree Grammars: Which define programming languages. They are also applicable in artificial intelligence.
 Turing Machines: Which form an abstract model of a computer.
Mathematical groundwork.
The following mathematical concepts are important to know as we proceed:

A set is a collection of welldefined objects for example the set of all prime numbers.
The set of natural numbers is defined as follows,
$\mathbb{N}$={1, 2, 3, 4, ...} 
$\mathbb{Z}$ = {..., 5, 4, 3, ..., 1, 0, 1, 2, 3, 4, ...} is the set of integers.

$\mathbb{Q}$ = {m/n: mâˆˆ$\mathbb{Z}$, nâˆˆ$\mathbb{Z}$, nâ‰ 0} is the set of rational numbers.

$\mathbb{R}$ is the set of all real numbers.

If A and B are sets, A is a subset of B and we write it as AâŠ†B, that is if for every element in A there exists an element of B.
An example: The set of all even numbers which is a subset of all natural numbers. 
If B is a set, power set P(B) of B is the subset of all subsets of B.
P(B) = {A: A âŠ† B}.
Note that âˆ… âˆˆ P(B) and B âˆˆ P(B). 
If A and B are two sets, then;
** Their union is A âˆª B={x : x âˆˆ A or x âˆˆ B}
** Their intersection is A âˆ© B={x : x âˆˆ A and x âˆˆ B}
** Their difference is A \ B = { x : x âˆˆ A and x âˆ‰ B}
** Their cartesian product is A Ã— B={(x, y) : x âˆˆ A and y âˆˆ B}
** And their complement $\stackrel{\xe2\u02c6\u2019}{A}$ ={ x : x âˆ‰ A} 
A binary relation between two sets A and B is a subset of A x B.

A function f from A to B denoted by f: A â†’ B is a binary relation of R with the property that for each element a âˆˆ A there exists exactly one ordered pair R whose first component is a.
We also state that f(a) = b or f maps to a or b or the image of a under f is b.
This set A is referred to as the domain of f and the set is the range of f.
The set: {b âˆˆ B : there is an a âˆˆ A with f(a) = b} is referred to as the range of f. 
A function f : A â†’ B is onetoone(injective) if for any two distinct elements a and a' in A, we have f(a) â‰ f(a').
The function f is onto(surjective) if for each element b âˆˆ B there exists an element a âˆˆ A such that f(a) = b.
Thus the range of f is equal to the set B. 
A function f is bijection if f is both injective and surjective.

A binary relation R âŠ† A x A is a equivalence relation if it satisfies the listed conditions;
** R is reflective: that is, for every element a âˆˆ A, we have (a, a) âˆˆ R.
** R is symmetric: meaning that for all a and b in A, if (a, b) âˆˆ R, then (b, a)âˆˆ R.
** R is transitive: that is, for all a, b and c in A, if (a, b) âˆˆ R and (b, c) âˆˆ R then also (a, c) âˆˆ R. 
A graph G = (V, E) is a pair which consists of a set of vertices(V) and edges(E).
A degree of a vertex V denoted by deg(V) is the number of edges which are incident on V.
A path in a graph is formed by a sequence of vertices connected by edges.
A cycle is formed when a path starts and end at the same place otherwise it is a simple path meaning that there are no repeated vertices.
A connected graph has paths between all pairs of vertices. 
Coming to strings, the alphabet is a finite set with symbols, example alphabets are Î£ = {0,1} and Î£ = {a, b, c, d, ...., y, z}.

A string over an alphabet Î£ is a finite sequence of symbols whereby each is an element of Î£. Length of a string w is denoted by w, that is the number of symbols in w while an empty string is denoted by âˆˆ and has length of zero. E.g if an alphabet Î£ = {0, 1}, then 10, 1000, 0, 101, âˆˆ are strings over Î£ with the respective lengths 2, 4, 1, 3, 0.

A language is a set of strings.

Boolean values include 1 and 0 representing true and false. Basic operations include:
** negation or NOT(Â¬)
** conjunction or AND(âˆ§)
** disjunction or OR (âˆ¨)
** exclusiveor or XOR (âˆ…)
** equivalence, (â‡” or â†”)
** implication, (â‡’ or â†’)
Proofs.
A theorem is considered a true statement in mathematics and a proof is a sequence of mathematical statements which form an argument showing a theorem is indeed correct.
We come up with proofs using axioms which are assumptions about underlying mathematical structures, hypotheses and previously proved theorems.
Generally there is no specific way to come up with a proof however there strategies used.
Here are some strategies:
Direct proofs.
Here we approach a problem directly.
For example given the following theorem:
Theorem 1: If n is an odd positive integer, then ${n}^{2}$ is also odd.
Proof An odd positive integer n can be written n = 2k + 1 for an integer k â‰¥ 0, then
${\mathrm{n}}^{2}={\left(2k+1\right)}^{2}=4{\mathrm{k}}^{2}+4k+1=2(2{\mathrm{k}}^{2}+2k)+1$
Since $2(2{\mathrm{k}}^{2}+2k)$ is even and even + 1 = odd we conclude that ${\mathrm{n}}^{2}$ is odd.
Constructive proofs.
Here we not only show the existence of an object but also provide a method for creating it.
Theorem 2: For every integer n â‰¥ 4, there exists a 3regular graph with n vertices.
Proof: We define V = {0, 1, 2, ... , n  1}
and E = { {i, i + 1} : 0 â‰¤ i â‰¤ n âˆ’ 2} âˆª { {n âˆ’ 1, 0} } âˆª { {i, i + n/2}: 0 â‰¤ i â‰¤ n/2 âˆ’ 1}.
Then a graph G = (V, E) is 3regular.
Now we convince ourselves that the graph is indeed 3regular so as to assist us to come up with a graph for e.g n = 10.
Nonconstructive proofs.
We show that a certain objects exist even without creating it.
Theorem 3: There exists irrational numbers x and y such that ${\mathrm{x}}^{\mathrm{y}}$ is rational.
Proof:
Case 1: ${\sqrt{2}}^{\sqrt{2}}\xe2\u02c6\u02c6\mathbb{Q}$
Where we take x = y = $\sqrt{2}$
And prove by contradiction that $\sqrt{2}$ is irrational.
Case 2: ${\sqrt{2}}^{\sqrt{2}}\xe2\u02c6\u2030\mathbb{Q}$
Here we take x = ${\sqrt{2}}^{\sqrt{2}}$ and y = $\sqrt{2}$ since ${\mathrm{x}}^{\mathrm{y}}=({\sqrt{2}}^{\sqrt{2}}{)}^{\sqrt{2}}={\sqrt{2}}^{2}=2$ the claim in the theorem.
This proof proves the stated theorem but doesn't provide an example of a pair of irrational numbers x and y such that ${\mathrm{x}}^{\mathrm{y}}$ is rational.
Proofs by contradiction.
We will explain this strategy using the following examples:
Theorem 4: Let n be a positive integer. If ${\mathrm{n}}^{2}$ is even then n is even.
Proof: Assume ${\mathrm{n}}^{2}$ is even but n is odd and we know from theorem 1 that ${\mathrm{n}}^{2}$ is odd. We have a contradiction following our assumption that ${\mathrm{n}}^{2}$ is even.
Theorem 5: $\sqrt{2}$ is irrational, that is, $\sqrt{2}$ cannot be written as a fraction of two integers m and n.
Proof: To prove this by contradition we follow the steps;
Assume $\sqrt{2}$ is rational therefore it can be written as a fraction of two integers $\sqrt{2}$ = m/n where m â‰¥ 1 and n â‰¥ 1.
Assume that m and n don't share common factors, that is, the GCD of m and n is 1 and if not the case we get rid of common factors.
By squaring $\sqrt{2}$ = m/n, we have ${\mathrm{2n}}^{2}$ = ${\mathrm{m}}^{2}$ which implies that ${\mathrm{m}}^{2}$ is even(Theorem 4)
meaning we can write m as m = 2k for some positive integer k.
Following this, we have ${\mathrm{2n}}^{2}$ = ${\mathrm{m}}^{2}$ = ${\mathrm{4k}}^{2}$ which implies that ${\mathrm{n}}^{2}$ = ${\mathrm{2k}}^{2}$ and therefore ${\mathrm{n}}^{2}$ is even(Theorem 4)
And we have shown m and n are both even but we know they are not even and this is a contradiction and therefore our assumption of $\sqrt{2}$ is rational is wrong and we conclude that $\sqrt{2}$ is irrational.
Proofs by induction.
This is very useful and important strategy for proving theorems.
For each positive integer n, P(n) is a mathematical statement that depends on n now assuming we want to prove that P(n) is true for all positive integers n. A proof by induction would be:
Base: Here we prove that P(1) is true.
Induction Step: We prove that for all n â‰¥ 1, the following holds:
If P(n) is true then P(n+1) is also true.
Here we choose an arbitrary integer n â‰¥ 1 and assume P(n) is true  induction hypothesis.
Then we prove P(n+1) is also true.
The pigeon hole principle.
Say is n+1 or more objects are placed in n boxes, then there exists at least a box which contains two or more objects.
In other words if A and B are two sets such that A > B then there is no onetoone function from A to B.
Summary.
This article presents a brief overview of the theory of computation. The concepts we have learned here play an important part in helping us understand following articles related to compiler design.
We have discussed the questions this theory aims to answer, established a mathematical foundation to understand all symbols which will be used to explain compiler design concepts, and gone over how to prove theorems which are very common in algorithmic design and computer science/mathematical fields.
References.
 Finite Automata and Regular Languages
 Mathematical Thinking in Computer Science
 Proofs from THE BOOK Book  Martin Aigner