Commit 67acceec authored by Vladimír Uhlík's avatar Vladimír Uhlík
Browse files

Add changes from ntb.

parent 1adc71df
Loading
Loading
Loading
Loading
Loading
+11 −0
Original line number Diff line number Diff line
@@ -35,3 +35,14 @@
    author={Přispěvatelé Wikibooks},
    url={https://en.wikibooks.org/wiki/LaTeX}
}

@book{dragonbook,
    author    = {Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman},
    date      = {2007},
    title     = {Compilers : principles, techniques, and tools},
    edition   = {2}, % should not be stated
    location  = {},
    publisher = {},
    isbn      = {0-321-48681-1},
    langid    = {english}
}
 No newline at end of file
+129 −13
Original line number Diff line number Diff line
\chapter{C programming language}

Since its inception in the early 70s, C has become one of the most
Since its creation in the early 70s, C has become one of the most
popular languages ever. Thanks to its simplicity, speed, and
versatility, it is still widely used in various areas such as operating
systems, embedded systems, real-time systems, etc.
C is an imperative, procedural and compiled programming language, which
means that the code is translated into machine code of particular machine.
This thesis will focus on its standard \textit{ISO/IEC 9899:1999}, also known as \textit{C99}.
In this chapter, we will describe several concepts of the language that
In this chapter, we will briefly describe several concepts of the language that
we will need in further chapters. You can obtain more information in~\cite{cstandard}
or~\cite{cppreference} for further reading.

\section{Object, declarations and types}
An object is an abstraction of memory in the execution environment.
Objects can be created, destroyed, accessed and manipulated.
It has its size, alignment, type, lifetime, storage duration etc.
There are several ways how to create objects~: 
It can be created, destroyed, accessed and manipulated.
Among its properties belong its size, alignment, type, lifetime,
storage duration etc. There are several ways how to create objects~: 
\begin{itemize}
    \item \textbf{Declaration}
    \item \textbf{Allocation}
    \item \textbf{String literal}
    \item \textbf{Compoud literal}
\end{itemize}
However, with objects we can not work directly, even though
they have their own identity, in form of address in memory,
but they do not have name. And that is where variables come
However, we can not work with objects directly. Even though
they have their own identity\footnote{As an object identity is consider its address},
we can not work with them on syntactical level. And that is where variables come
into play. Variable is a named object. The assigned name
can not be changed and has its syntactical scope.

A scope represents a portion of C program where each identifier
(variable, function etc.) is visible and valid only within this portion.
There are four kinds of scopes in C:
\begin{itemize}
      \item \textbf{Block scope}
      \item \textbf{File scope}
      \item \textbf{Function scope}
      \item \textbf{Function prototype scope}      
\end{itemize} 

As mentioned before, object must have its type. Type, resp.
data type determines the properties of particular objects.
For example, it's size or permitted operations. In C, we
have several built-in data types.
For example, it's size or permitted operations. Objects are assigned
data types when they are created. In C, we have several built-in data types.
\begin{itemize}
    \item \textbf{Character types:} char witch its signed and unsigned form.
    \item \textbf{Integer types:} short, int, long in their signed and unsigned form.
          Mentioned types may differ in their sizes.
    \item \textbf{Real floating types:} float and more precise double.
\end{itemize}

Besides basic types mentioned above, there are several derived types:
\begin{itemize}
    \item \textbf{Array types:} represents continous allocated nonempty
@@ -60,7 +72,7 @@ specifies a computation. In C, we have several kinds of operators:
    \item \textbf{Assignment} 
    \item \textbf{Incerement} in prefix or postfix version.
    \item \textbf{Arithmetic} including basic mathematical and bitwise operations.
    \item \textbf{Logical}
    \item \textbf{Logical} in form of conjuction, disjunction and negation.
    \item \textbf{Comparison}
    \item \textbf{Member access} such as array subscript, pointer dereference, address of
          or member acces of struct or union.
@@ -68,6 +80,110 @@ specifies a computation. In C, we have several kinds of operators:
\end{itemize}
Every expression must be evaluated into a value.

\section{Control flow}
\section{Statements}
Statements are fragments of the C program that are executed in sequence.
Each statement is either terminated with semicolon or it is a compound statement
put in braces. Compoind statements introduce their own block scope.

Another category represent selection statemens \textbf{if}, \textbf{else} and
\textbf{switch}. First two mentioned are ususally coupled together in sense that
it is not possible to use \textbf{else} without \textbf{it}. \textbf{If} as such
is used for conditional code execution: condition body is only executed when a
certain condition holds. If it does not, optional \textbf{else} can be chosen.
However, \textbf{switch} works a bit differently: it executes code according
to value of integral expression.

Iteration statemens, or loops are used for repetitive execution of a statement
or compound statement. In C, there are three kinds of loops:
\begin{itemize}
      \item \textbf{For}
      \item \textbf{While}
      \item \textbf{Do~\cdots~while}
\end{itemize}
The control flow within a loop can by modified by \textbf{continue}\footnote{Skips to another iteration.}
and \textbf{break}\footnote{Exits current loop. It can be used within switch, too.} statemens.

Unconditional local\footnote{They can be used only within a single function.}
jumps performed by \textbf{goto} transfers control to desired location.
They are mostly used for handling exceptional states. However, they can be only used
within a single function.

\section{Functions}
A function associates its body (compound statement) with an identifier (function name).
Every C program must have function called \textbf{main} which serves as an entry point
to each program. Each function is introduced by its delcaration and definition. It 
may accept zero or more parameters that are initialized from arguments when the function
is called. If needed, we can return a value from a function with \textbf{return} statement.

\section{Example of C program}
After a brief summary of various parts of C language we will finish this chapter
with properly commented snippet of a C program, where we show how aforementioned
contructs can be used in practise.

\begin{minted}{c}

/* On the next line, you can see the function declaration.
 * All declarations must define the return type of the function,
 * function name and type of its arguments, if there are any.
 *
 * The const qualifier in type of arr argument means that
 * the value of the variable can not be changed. In this case,
 * it states that the value the pointer is referencing to can
 * not be changed.
 */
int sum_array( const int *arr, int size )
{
      /* Declaration and definition of the variable result.
       * More precisely, we create an object with value 0, which
       * we assing name result and type int.
       */
      int result = 0;

      /* Header of a for loop consists of three sections:
       *    - initialization ( int i = 0 )
       *    - condition      ( i < 10 )
       *    - iteration      ( ++i )
       * ++i represents prefix increment of variable i.
       * Semantically it has same effect as i += 1.
       */
      for ( int i = 0; i < 10; ++i )
      /* We can access elements of an array with an subscription operator. */
            result += arr[ i ];

      /* Return statement allows us to return a value from function. */
      return result;
}

/* There might be functions that do not return some value.
 * In such cases, we use void as a return type of the function,
 * to express that the function does not return a value.
 */
void categorize( int sum, int *category )
{
      if ( sum < 0 )
      /* Through dereference we can assign a value to the
       * object to writh the pointer variable category points.
       */
            *category = -1;
      
      /* With else if we can join several conditions in a sequence. */
      else if ( sum == 0 )
            *category = 0;

      /* If none of above conditions, we can use a default else branch. */
      else
            *category = 1;
}

/* The main function which serves a an entry point to every C program. */
int main()
{
      /* Array declaration and initialization. */
      const int *array[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
      int category;
      int sum = sum_array( array, 8 );
      categorize( &category );
      return 0;
}

\end{minted}
+18 −0
Original line number Diff line number Diff line
\chapter{Compiling C to Cthulhu}
Let's move from theory to practise. In this chapter,
we are going to decsribe the implementation and desing
of C compiler to Cthulhu language. Our compiler is based
on light-weight C compiler called \textbf{chibicc} available
on []. We used its front end and implemented our own
code generation part. The implementation is pretty minimalistic,
the whole source bundle consists of roughly 10000 lines of code.
However, current implementation does not suppor C standard library.

\section{Preprocessor}
\section{Front end description}
\section{Memory model}
\section{Compiling expressions}
\section{Adding data types}
\section{Compiling statemens}
\section{Adding control flow}
\section{Compiling functions}
\section{Compiling whole programs}
 No newline at end of file
+25 −0
Original line number Diff line number Diff line
\chapter{Compilers in nutshell}

In this chapter, we will provide a brief introduction to
compilers theory. Compiler is a software program that 
translates a program in source language to target language.
Compilation as such consists of several parts, which we will
introduce one-by-one. All provided theory is based on
"legendary" book \textit{Compilers principles, techniques and tools}[].
For further information, we encourage the reader to look for
in the book.

\section{Symbol table}
Symbol table is a data structure that is used by compiler to hold information
about source-program constructs. Its entries are created and used during the
analysis phase by the lexical analyzer, the parser, and the semantic analyzer.

\section{Lexical analysis}
The main task of the lexical analyzer is to read the input characters of the
source program, group them into lexemes, and produce as output a sequence of
tokens for each lexeme in the source program.
\section{Syntax analysis}
\section{Semantical analysis}
\section{Intermediate code generation}
\section{Machine indenpendent code optimization}
\section{Code generation}
\section{Machine dependent code optimization}
 No newline at end of file
+11 −79
Original line number Diff line number Diff line
\chapter{Cthulhu}

\section{Linear type system}

Before explaining Cthulhu features, we need to cover several important concepts.
As the first one, we choose a linear type system. The description of such a system
is quite straightforward: Linear type system ensures that each value of a particular
type is used exactly once. Having such restriction on types plays a key role in safe
memory management [1]. As an example, consider the following snippet of code written in~C~:

\begin{minted}{c}
    #include <stdlib.h>
Cthulhu is low-level, functional and minimalistic programming
language aimed for program transformation, testing and verification. It has linear
type system, abstraction support and a few built-in types. In this
chapter, we will describe its syntax, type system, computation model,
and provide an example of Cthulhu's initial state.

    void foo( int *p )
    {
        free( p );
    }

    int main()
    {
        int *ptr = malloc( 4 );
        foo( ptr );
        free( ptr );
    }
\end{minted}
\section{Linear type system}
Before we start explaining Cthulhu's semantics, we define the linearity
of type system. Formal definition of linearity is as follows:

Even though you might find the example silly, it perfectly shows the problem
with non-linear types. As you can see, the memory allocated through \texttt{malloc} is
double-freed, which causes the program to fail. Now the question is, how can
linear types help us. Provided that type \texttt{int *} was linear, the value
\texttt{ptr} could not be used twice as an argument in the call of \texttt{foo} and \texttt{free}.
Therefore, \texttt{free} could not be called more than once on the same pointer.
Since all the information regarding the variable's usage is known at compile time,
compiler with a checker of the correct usage of linear types should report an error.
Informally, we can say that linear type system ensures that each value is
used exactly once [1]. In Cthulhu's case, this restriction is meaningful
to ensure...

\section{Signature and structure}

As indicated in the previous section, Cthulhu's type system will fulfill the
linearity principle. In the current section, we will introduce three new terms,
which will serve as building blocks in applying abstractions in Cthulhu's programs.

In many programming languages we can find various means of packing several
operations into one abstract module. As an example, consider interfaces known
from languages like Java or C\#. However, we are looking for a formalism that
will allow us to reason about the programm. We can find such formalism in 
\texttt{many-sorted~,,first-order``\footnote{Also known as predicate logic.} logic}.
This logic allows quantification over a variety of domains (called sorts)~\cite{msl}.
Formall definitions of syntax and semantics are as follows.
Both definitions are based on the work of Dr.~Jonáš~\cite{matajon}:

\begin{definition}[Syntax]
Let $\mathcal{S}$ be a countable set if \texttt{sort symbols} that contains a distinguished
sort symbol \texttt{Bool}. For each sort $\sigma \in \mathcal{S}$, we have a countably infine
set of variables $vars_{\sigma} = \{ x_1^{\sigma}, x_2^{\sigma}, \cdots \}$
such that all such sets are pairwise disjoint. We denote the union of all  $vars_{\sigma}$
as $vars$. A $signature$ $\Sigma$ is a triple $(\Sigma^{\mathcal{S}}, \Sigma^f, \tau)$
consisting of
\begin{itemize}
    \item a set of sort symbols $\Sigma^{\mathcal{S}} \subseteq \mathcal{S}$ that contains the sort \texttt{Bool}
    \item a set $\Sigma^f$ of function symbol
    \item a sort mapping $\tau$, which to each function symbol $f \in \Sigma^f$ assings its sort
          $\tau(f) = (\sigma_1, \sigma_2,~\cdots, \sigma_n, \sigma)$ for some $n \geq 0$ and
          $\sigma, \sigma_1, \sigma_2,~\cdots, \sigma_n \in \Sigma^{\mathcal{S}}$
\end{itemize}
The number $n$ in the previous definition is called the $arity$ of the corresponding function
symbol. Function symbols with arity $0$ are called $constants$.
\end{definition}

\begin{definition}[Semantics]
A $\Sigma - structure$\footnote{$\Sigma$ represents arbitrary signature $(\Sigma^{\mathcal{S}}, \Sigma^f, \tau)$}
$\mathcal{M}$ is a pair $(\nu, (\_)~^\mathcal{M})$, where
\begin{itemize}
    \item $\nu$ is a function that assigns to each sort $\sigma$ $\in$ $\sigma \in \mathcal{S}$ a non-empty
          set $\nu(\sigma)$, which is called its domain. The function has to assign dijoint sets
          $\nu(\sigma) \cap \nu(\sigma') = \varnothing$ to all sorts $\sigma \neq \sigma'$ and the sort
          \texttt{Bool}, it has to assign the two element set $\nu(Bool) = \{1, 0\}$;
    \item $(\nu, (_)~^\mathcal{M})$ is a function that assigns to each function symbol $f \in \Sigma^f$
          of a sort $\tau(f) = (\sigma_1, \sigma_2,~\cdots, \sigma_n, \sigma)$ a function
          $f^\mathcal{M} : \nu(\sigma_1) \times \nu(\sigma_2) \times \cdots \times \nu(\sigma_n) \rightarrow \nu(\sigma) $.
\end{itemize}
The elements of $\nu(\sigma)$ for some sort $\sigma \neq Bool$ are called $objects$ of the given $\Sigma - structure$.
    
\end{definition} 

Informally, we can describe \texttt{signatures} as abstract modules
that collect related data types and operations on these altogether.
Concrete implementation of particular signature is known as a \texttt{structure}.
Structures provide semantics to signatures that are syntactical constructs.
Therefore, while writing programs, we only need to know signatures; for execution,
their corresponding structures must be known.

\section{Data types}
\subsection{Stack}