Ajla tutorial
Contents
Introduction
Hello World
Fizz-buzz
Statements
Types
Operators
Clauses
Automatic parallelization
Caching
Functions
Lists
Arrays
Strings
Exceptions
I/O
Units
Type classes
Threads
FFI
Performance considerations
Hacking Ajla
Introduction
Ajla is a purely functional programming language that has look-and-feel like
traditional imperative languages. The return value of every function in Ajla
depends only on its arguments. Ajla has mutable local variables and control flow
statements (if, while, for, goto) that are known from imperative languages. Ajla
doesn't have mutable global variables because they break purity and they may be
subject to race conditions.
Ajla is memory-safe — i.e. you can't create a segmentation fault in Ajla.
Ajla doesn't have garbage collection; it uses reference counts to track
allocated memory. Ajla has mutable arrays — if the array reference count
is one, it is mutated in place, if it is different from one, a copy of the array
is created.
Unpacking Ajla
Before compiling Ajla, install the packages libgmp-dev and libffi-dev. If
libgmp-dev is not installed, Ajla will use slow built-in version. If libffi-dev
is not installed, the possibility to call C functions from Ajla will be
disabled.
Download the newest Ajla version from the
downloads directory. Unpack
the tar.gz archive and compile it with ./configure && make
. Install
it with make install
.
If you cloned the git repository, there is no ./configure
script.
You can generate it with the ./rebuild
script. You need autoconf,
automake and ed installed.
It is recommended to use gcc &mdash the compilation will take several minutes
and it will consume about 4GB memory when compiling the files ipret.c and
ipretc.c. Compilation with clang works, but it is very slow, it may take an hour
or more to compile the files ipret.c and ipretc.c.
When running Ajla programs, it is recommended to set the cpu governor to
'performance'. Ajla starts and stops threads rapidly and the 'ondemand' governor
underclocks the cores, resulting in slower performance.
You can compile and run a program directly by executing "ajla
program.ajla
". The program will be compiled as it runs and the result
will be saved in the file ~/.cache/ajla/program.sav
. When you run
the program second time, it will be faster because there will be no compilation.
If you change the source code file, the cache file will be invalidated and the
program will be compiled again.
There's a --compile
switch — it will compile the whole
program without running it and save it to the ~/.cache/ajla
directory. You should use this switch when you are developing a program, because
it will scan all the source code units and look for syntax errors.
The standard library is placed in the stdlib
subdirectory, you can
read it to find out what types and functions are there. The
system.ajla
file is implicitly included before compiling any Ajla
source file.
In "programs/acmd/
" there's Ajla Commander — a Midnight
Commander clone written in Ajla. You can run it with "./ajla
programs/acmd/acmd.ajla
".
Hello World
Programming language tutorials usually start with a program that prints "Hello
World". Let's have look at "Hello World" in Ajla:
fn main(w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes)) : world
[
w := write(w, h[1], "Hello World!" + nl);
return w;
]
Copy this piece of code to a file "hello.ajla" and run "ajla hello.ajla" to
execute it.
Here we declare a function main with the following arguments. The symbol before
the dot is the name of the argument and the expression after the dot is the type
of the argument.
w : world
- This is a token that must be passed to and returned from all functions that
do I/O to sequence the I/O properly.
d : dhandle
- This is a handle to the current working directory. The handle can be used
when the user needs to open files relative to the working directory.
h : list(handle)
- The list of handles that were passed to the program when it was executed.
Usually, it contains 3 entries, the first for standard input, the second for
standard output and the third for standard error.
args : list(bytes)
- The arguments that were passed to the program. The type "
bytes
"
represents a sequence of bytes, so list(bytes)
is a list of
sequences of bytes.
env : treemap(bytes, bytes)
- This represents the environment variables for the program. The environment
is represented as a tree that maps a sequence of bytes (i.e. the variable name)
to another sequence of bytes (i.e. the variable content). This is implemented as
an AVL tree, so that searching it is efficient.
The function contains the following statements:
w := write(w, h[1], "Hello World!" + nl);
- This statement writes the string "Hello World!" and a newline to the
handle 1 (standard output). The statement takes a
w
variable as an
I/O token and returns the w
variable back. Passing the world
variable back and forth between functions that do I/O is required to maintain
I/O ordering. return w;
- This statement returns the world variable to the caller.
Passing the world variable
In order to show how the world passing works, let's split the "Hello World"
program to three write statements.
fn main(w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes)) : world
[
w := write(w, h[1], "Hello ");
w := write(w, h[1], "World!");
w := write(w, h[1], nl);
return w;
]
In a functional language that has non-strict evaluation, we can't just do I/O
anywhere because the I/Os could be reordered or not executed at all. We need
some mechanism to maintain I/O ordering. Haskell uses monads to maintain I/O
ordering, Ajla uses a different mechanism — world passing. Every function
that performs I/O takes "world" as an argument and returns "world" as a return
value (functions can have multiple return values in Ajla). The "world" variable
makes sure that the functions can't be reordered. In this example, w :=
write(w, h[1], nl)
may be executed only after w := write(w, h[1],
"World!")
finished. And w := write(w, h[1], "World!")
may be
executed only after w := write(w, h[1], "Hello ")
finished.
Using implicit variables
Now, let's have look at another Ajla feature — implicit variables. We add
the implicit
keyword to the function argument w :
world
and we drop the w
variable from the code that does
writes.
fn main(implicit w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes)) : world
[
write(h[1], "Hello ");
write(h[1], "World!");
write(h[1], nl);
]
If a variable is declared as implicit
, the compiler will
automatically add the variable to the function calls where does it fit. The
write
function should have three arguments, here it has only two
arguments, so the compiler will search for all implicit variables and it will
use an implicit variable that fits into the remaining argument.
If we don't specify a return value for the write
function call, the
compiler will search for all implicit variables and assign the return value to a
variable where does it fit.
If we don't use the return
statement at the end of the function,
the compiler will search for implicit variables and automatically return a
variable that fits into the return value.
Here we can see that with the implicit variables the code really looks as if it
were written in a procedural programming language. But it is not procedural
— the code is translated to this and that is what is
running on the virtual machine.
Omitting the arguments
The compiler already knows what arguments should the main
function
have, so you can omit them. So, we can simplify our program to this:
fn main
[
write(h[1], "Hello World!" + nl);
]
This is the simplest way how to write a "Hello World" program in Ajla.
Internally, it is translated to this.
Fizz-buzz
Fizz-buzz is another standard programming test. The goal is to write a program
that iterates from 1 to 100. If the number is divisible by 3, "Fizz" is written;
if the number is divisible by 5, "Buzz" is written, otherwise the number is
written.
fn main
[
for i := 1 to 101 do [
if i mod 15 = 0 then write(h[1], "Fizz Buzz ");
else if i mod 3 = 0 then write(h[1], "Fizz ");
else if i mod 5 = 0 then write(h[1], "Buzz ");
else write(h[1], ntos(i) + " ");
]
write(h[1], nl);
]
The for
statement iterates a variable over a given range. The
starting value is inclusive, the ending value is exclusive — thus, if we
want to iterate from 1 to 100, we need to specify 1 and 101. The operator
mod
is an arithmetic remainder, the if
statements test
if the value is divisible by 15, 3 and 5. If it is not divisible by these
numbers, we print the number; the ntos
function converts a number
to a string.
You can see that it looks very much like a procedural language.
Statements
Ajla has the following statements:
var x := 10;
- Creates a variable and assigns a value to it. The type is inferred, if we
don't want to infer the type, we use "
var x : int := 10;
"
x := 20;
- Modifies the variable.
const x := 10;
- Creates a constant and assigns a value to it. A constant can't be modified.
-
if condition then statement;
if condition then statement; else statement;
- The "if" statement.
for i := 0 to 10 do statement;
- The "for" statement. It iterates from 0 to 9.
for i in [ 10, 20, 30, 40 ] do statement;
- The "for in" statement can iterate over a collection. In this example it
iterates over a list that holds four values: 10, 20, 30, 40.
while condition do statement;
- The "while" statement.
break;
- Exits the current "for" or "while" loop.
continue;
- Starts a new iteration of the current "for" or "while" loop.
return expression;
- Exits the current function and returns the expression.
goto label;
- The "goto" statement. It doesn't break purity, so it is allowed.
The if
and while
statements can accept multiple
expressions separated by a comma — in this case, the expressions are
evaluated from the first to the last and if one of them is evaluated as
false
, the evaluation is terminated and the conditional branch is
not taken.
The assignment also accepts multiple expressions — for example, "x,
y := y, x
" will swap the variables x
and y
.
Types
Ajla has the following primitive types:
int
- A general integer number with arbitrary length. It is implemented as a
32-bit or 64-bit signed number. If some arithmetic operation overflows, it is
converted to a long integer (using the gmp library) and the arithmetic is done
with arbitrary precision.
On 32-bit architectures, int
is 32-bit. On 64-bit architectures,
int
is 64-bit. On 64-bit architectures when the
"--ptrcomp
" switch is used, int
is 32-bit. Note that
these cases are semantically equivalent, because overflows are handled
transparently.
int8
, int16
, int32
,
int64
, int128
- These types behave in the same way as the "int" type. The difference is in
their implementation.
int8
is implemented as an 8-bit integer and
if it overflows, long integers using the gmp library are used.
int16
is implemented as a 16-bit integer (with overflows handled by
the gmp library), etc. If the compiler that was used to build Ajla doesn't
support 128-bit integers, int128
is equivalent to
int64
.
sint8
, sint16
, sint32
,
sint64
, sint128
- This is a signed integer with a given size. If some arithmetic operation
overflows, it is wrapped modulo the size. If the compiler doesn't support
128-bit integers, emulation using gmp is used.
uint8
, uint16
, uint32
,
uint64
, uint128
- This is an unsigned integer with a given size. If some arithmetic operation
overflows, it is wrapped modulo the size. If the compiler doesn't support
128-bit integers, emulation using gmp is used.
real16
, real32
, real64
,
real80
, real128
- A floating point number with a given size. If the compiler has 128-bit
floating point numbers and doesn't have 80-bit floating point numbers, then
real80
is an alias for real128
. If the compiler has
neither 80-bit nor 128-bit floating point numbers, a slow software emulation is
used.
Floating point constants can have a suffix that specifies the type — 'h'
for real16, 's' for real32, no suffix for real64, 'l' for real80, 'q' for
real128.
bool
- A Boolean type — it can hold values
true
or
false
.
type
- Ajla can pass types to functions as well as other values. We use the keyword
type
to specify that an argument is a type.
The following types are defined in the standard library:
byte
- An alias for
uint8
.
char
- An alias for
int32
.
real
- An alias for
real64
.
rational
- A rational number — with an integer numerator and denominator. It is
declared as:
record rational [
num den : int;
]
fixed_point(base, digits)
- A fixed point number with the specified base and the specified number of
digits after the dot. The number of digits before the dot may be large —
if it doesn't fit, the gmp library is used.
decimal(digits)
- An alias for
fixed_point(10, digits)
.
sint(bits)
- A signed integer with the specified number of bits. If some arithmetic
operation overflows, it is wrapped modulo the size.
uint(bits)
- A unsigned integer with the specified number of bits. If some arithmetic
operation overflows, it is wrapped modulo the size.
floating(ex_bits, sig_bits)
- An arbitrary-precision floating-point number. The exponent has
ex_bits
and the mantissa has sig_bits
.
Ajla has the following composite types:
list(t)
- A list of elements where each element has a type
t
. Lists can
be appended or sliced.
array(t, [ 10, 20, 30 ])
- An array with arbitrary number of dimensions. In this example, it has three
dimensions with the sizes 10, 20 and 30 elements. Arrays can't change their size
after they are created.
record [ element1 : type1; element2 : type2; element3 : type3; ...
]
- Record — it groups different types into a single type.
option [ element1 : type1; element2 : type2; element3 : type3; ...
]
- An option holds only one of the specified types. In this example, it can
hold either an element of the type
type1
or an element of the type
type2
or an element of the type type3
.
There's an operator "is
" that tests if the option holds a specified
value. For example "o is element2
" returns "true
" if
the option holds the value "element2
".
There's an operator "ord
" that returns the ordinal number of the
value that the option holds (starting from 0). For example, if "o
"
holds the value "element3
", then "ord o
" returns the
value 2.
The following composite types are defined in the standard library:
bytes
- An alias for
list(byte)
.
string
- An alias for
list(char)
.
maybe(t)
- It can hold either the value of type
t
or nothing. It is
declared as:
option maybe(t : type) [
j : t;
n;
]
tuple2(t1, t2)
- A tuple holding 2 values of types
t1
and t2
.
tuple3(t1, t2, t3)
- A tuple holding 3 values of types
t1
, t2
and
t3
.
tuple4(t1, t2, t3, t4)
- A tuple holding 4 values of the specified types.
tuple5(t1, t2, t3, t4, t5)
- A tuple holding 5 values of the specified types. If you need larger tuples,
you must declare them on your own with a
record
type.
treemap(key_type, value_type)
- A key-value store with the specified key type and value type. It is
implemented as an AVL tree.
treeset(key_type)
- A set containing values of the specified type. It is implemented as an AVL
tree.
heap(key_type)
- A binary heap that can quickly insert an element or extract the lowest
element. It is implemented as a list.
unit_type
- This type may hold only one value — unit_value.
bottom_type
- This type can't hold any value, it can only hold exceptions. It is used for
functions that never return (for example for message loops). It is declared as:
option bottom_type [
]
Operators
Operator | Priority | Description |
Unary + | 1000 | It just returns the passed value |
Unary - | 1000 | Negation |
* | 2000 | Multiplication |
/ | 2000 | Floating point division |
div | 2000 | Integer division |
mod | 2000 | Integer remainder |
+ | 3000 | Addition (or append when applied to lists) |
- | 3000 | Subtraction |
x +< y | 3000 | Append a value y to the list x |
shl | 4000 | Bit shift left |
shr | 4000 | Bit shift right |
rol | 4000 | Bit rotation left |
ror | 4000 | Bit rotation right |
x bts y | 4000 | Set y -th bit in x |
x btr y | 4000 | Clear y -th bit in x |
x btc y | 4000 | Invert y -th bit in x |
x bt y | 4000 | Test if y -th bit in x is set |
Unary bswap | 4000 | Reverse bytes in a number |
Unary brev | 4000 | Reverse bits in a number |
Unary bsf | 4000 | Finds the lowest set bit |
Unary bsr | 4000 | Finds the highest set bit |
Unary popcnt | 4000 | Count the number of set bits |
Unary is_negative | 5000 | Test if a real number is negative |
Unary is_infinity | 5000 | Test if a real number is infinite |
Unary is_exception | 5000 | Test if a value is an exception |
= | 6000 | Test for equality |
<> | 6000 | Test for non-equality |
> | 6000 | Test if the first argument is greater |
>= | 6000 | Test if the first argument is greater or equal |
< | 6000 | Test if the first argument is less |
<= | 6000 | Test if the first argument is less or equal |
not | 7000 | Logical negation |
and | 8000 | Logical and |
xor | 9000 | Logical exclusive or |
or | 10000 | Logical or |
If we pass different types to an operator, the second argument is converted to a
type of the first argument. For example 2.5 + 1
will return a
floating point value 3.5
. 1 + 2.5
will return an
integer value 3
.
Clauses
Every Ajla source file consists of clauses. This is the list of the clauses:
fn
- Declares a function. For example:
fn maximum(a b : int) : int := select(a < b, a, b);
or
fn maximum(a b : int) : int
[
if a < b then
return b;
else
return a;
]
operator
- Declares an operator with a given priority. For example, this declares a
unary postfix operator "
!
" that calculates a factorial:
operator postfix ! 1000 (n : int) : int
[
var v := 1;
for i := 2 to n + 1 do
v *= i;
return v;
]
const
- Declares a constant. A constant is a function that has no arguments. For
example:
const ten := 10;
const hello := `Hello`;
type
- Declares a type. For example
type byte := uint8;
declares the
type "byte
" as an alias to "uint8
".
record
- Declares a record. For example:
record person [
name : string;
surname : string;
age : int;
]
option
- Declares an option. For example:
option bool [
false;
true;
]
uses
- Imports a unit from the standard library or from the program directory. For
example "
uses heap;
" imports the file
"stdlib/heap.ajla
".
define
- Defines a macro. See for example "
define int_instance
" from
stdlib/system.ajla
.
Function, const and type declarations may be prefixed with:
private
- This declaration is only usable in the unit where it appears. It will not be
imported to other units.
implicit
- If you pass less arguments to a function than what was specified in the
function header, the compiler will attempt to infer the remaining arguments. The
"
implicit
" keyword makes this function a candidate for inferring.
conversion
- This function converts one type to another. If there is a type mismatch, the
compiler will scan all the "
conversion
" functions and try to
resolve the mismatch automatically by adding the appropriate conversion.
Automatic parallelization
Let's have a look at this program that does Fibonacci number calculation. It is
deliberately written in an inefficient recursive way.
fn fib(n : int) : int
[
if n <= 1 then
return n;
else
return fib(n - 2) + fib(n - 1);
]
fn main
[
var x := ston(args[0]);
var f := fib(x);
write(h[1], "fib " + ntos(x) + " = " + ntos(f) + nl);
]
If you run this program with some higher value, for example 45, you will notice
that all the cores are busy. That's because Ajla does automatic parallelization.
How does automatic parallelization work? We should parallelize only functions
that take long time. If we parallelized every function call, the overhead of the
parallelization would cause massive slowdown.
Ajla scans the stack every tick (by default, the tick is 10ms, it can be changed
with the --tick
argument). If some function stays on the stack for
two ticks, it took long enough and it can be parallelized. For example, suppose
that "Frame 4" in this diagram is there for 2 timer ticks. The stack is broken
down into two stacks and both of these stacks are executed concurrently. The
function "Frame 3" in the upper stack needs some return value, but we don't know
the return value yet (the return value would be returned by the topmost function
in the lower stack) — so we return a structure called thunk to the
upper stack. If the lowermost function in the upper stack attempts to evaluate
the thunk, it waits for the lower stack to finish (in this situation,
parallelization is not possible). If the lowermost function in the upper stack
doesn't attempt to evaluate the thunk, both stacks run concurrently.
Automatic parallelization can be disabled with the "--strict-calls
"
switch.
Caching
Let's have a look at the Fibonacci number example again:
fn fib~cache(n : int) : int
[
if n <= 1 then
return n;
else
return fib(n - 2) + fib(n - 1);
]
fn main
[
var x := ston(args[0]);
var f := fib(x);
write(h[1], "fib " + ntos(x) + " = " + ntos(f) + nl);
]
We added a ~cache
specifier to the function fib. Because Ajla is
purely functional, every function will return the same value if the same
arguments are supplied. Thus, we can cache the return values. This is what the
~cache
specifier does.
Now, you can see that you can pass large values to the function and the function
will complete quickly. That's because the Ajla virtual machine remembers what
value was returned for what argument and if you call the function again with the
same argument, it will just return a cached value.
In this example, the caching just turned an algorithm with O(2n)
complexity to an algorithm with O(n log n) complexity. The cache is
implemented as a red-black tree, so operations on it have logarithmic
complexity.
Functions
Function call specifiers
Ajla has the following function call specifiers:
~normal
- Default — attempt to parallelize after two timer ticks
~strict
- Don't attempt to parallelize
~spark
- Parallelize immediately
~lazy
- Evaluate when needed (like in Haskell)
~inline
- Inline the function (i.e. insert it into the caller)
~cache
- Cache the results
~save
- Cache the results and save them to
~/.cache/ajla/
The specifiers may be specified either at function declaration or at function
call. If different specifiers are specified at function declaration and at
function call, the specifier from the function call wins.
Nested functions
Functions may be nested. The nested function has access to variables of the
parent function that were visible at the point where the nested function was
declared. If the variable is later changed in the parent function, the change is
not promoted to the nested function. If the variable is later changed in the
nested function, the change is not promoted to the parent function. This is an
example of a nested function:
fn main
[
fn sum(a b : int) : int
[
return a + b;
]
write(h[1], ntos(sum(10, 20)) + nl);
]
Nested functions can't be recursive.
Lambda functions
Lambda functions are anonymous functions that are declared inside an expression
in the parent function. Like nested functions, they may use parent function
variables that were visible when the lambda function was declared. This is an
example of lambda functions:
fn main
[
var l := [ 10, 15, 20, 25, 30, 35, 40 ];
l := list_filter(l, lambda(x : int) [ return not x bt 0; ]);
var add := 1;
l := map(l, lambda(x : int) [ return x + add; ]);
var m := map(l, lambda(x : int) [ return ntos(x); ]);
write(h[1], list_join(m, nl));
]
We start with a list of seven elements: 10, 15, 20, 25, 30, 35, 40. The function
"list_filter
" takes a list and a function that returns a Boolean
value and returns the elements for which the function returned
"true"
. In this example, it selects even numbers (the operator
"bt 0
" tests if bit 0 is set). So, the list has now only four
elements: 10, 20, 30, 40. The function "map
" takes a list and a
function, applies the function to every element of the list and returns the list
of the results. In this example, the first "map
" function will add
1 to every element of the list. The next "map
" takes a list and
applies the "ntos
" function to every element of the list —
i.e. it converts the list of numbers to the list of byte strings. The
"list_join
" function joins the byte strings and separates them with
the second arguments — that is a newline. The program will print this:
11
21
31
41
Currying
Currying is the operation where we take a function, pass fewer arguments to the
function than what was specified in the function header and create a new
function that takes the remaining arguments. In Ajla, currying is done by
passing empty arguments from the right end of the argument list.
fn main
[
fn sum(a b : int) : int
[
return a + b;
]
var add_ten := sum(10,);
write(h[1], ntos(add_ten(20)) + nl);
]
For example, here we have a function "sum
" that takes two arguments
and returns their sum. If we write "sum(10,)
", we create a new
function that takes one argument and adds the value 10 to the argument. We
assign this new function to the variable "add_ten
". Finally, we
call "add_ten(20)
", which returns the value 30.
Lists
list(t)
represents a list type whose elements have a type of
t
. All the elements in a list must have the same type.
var l := list(int).[ 10, 20, 30, 40 ];
- Creates a list with four members — 10, 20, 30, 40.
var l := [ 10, 20, 30, 40 ];
- Creates a list with four members — 10, 20, 30, 40. We can omit the
type of the list — the type will be derived from the type of the first
member.
var l := empty(int);
- Creates an empty list of integers.
var l := fill('a', 10);
- Creates a list with 10 elements equal to 'a'.
var l := sparse('a', 1000000000);
- Functionally, it is equivalent to
fill
. But unlike
fill
, sparse
creates a compressed list that consumes
little memory even when it is very large. If you modify the compressed list, it
will be stored as a b+tree with consecutive runs of the same value compressed
into a single b+tree node. Sparse lists are slower than flat lists because the
virtual machine has to walk the b+tree on every access.
var l := [ 10, 20, 30, 40 ] + [ 50, 60, 70, 80 ];
- Append two lists.
var l := [ 10, 20, 30, 40 ] <+ 50;
- Append one value to a list.
var m := l[3];
- Pick a member at index 3. The indices start from 0.
l[3] := 100;
- Modify the list. If the list has a reference count different from 1, the
copy of the list is created and modified. If the list has a reference count 1,
it is modified in place.
var m := l[2 .. 4];
- Take a slice of the list, starting with member 2 (inclusive) and ending with
member 4 (exclusive).
var m := l[ .. 4];
- Take a slice of the list, starting at the beginning of the list and ending
with member 4 (exclusive).
var m := l[4 .. ];
- Take a slice of the list, starting with member 4 (inclusive) and ending at
the list end.
var a := len(l);
- Get a length of the list.
var b := len_at_least(l, 10);
- Returns true if the length is 10 or more elements.
var b := len_greater_than(l, 10);
- Returns true if the length is greater than 10 elements.
len_at_least
and len_greater_than
are useful when
dealing with infinite lists. We cannot use "if len(l) >= 10 then
...
" on an infinite list, because it would attempt to evaluate the whole
list and get into an infinite loop and memory hog. If we use "if
len_at_least(l, 10) then ...
", the virtual machine will attempt to
evaluate the first 10 entries and it returns true
without
attempting to evaluate further entries.
for i in [ 10, 20, 30, 40 ] do ...
- Iterate over a list. The loop body will be executed 4 times, with
i
being 10, 20, 30 and 40.
Infinite lists
This is an example that creates an infinite list, iterates over it and prints
the result.
fn inf_list~lazy(i : int) : list(int)
[
return [ i ] + inf_list(i + 1);
]
fn main
[
var l := inf_list(0);
for e in l do
write(h[1], ntos(e) + nl);
]
Note that we must not use len(list)
because it would force
evaluation of the whole list — such evaluation never finishes and it blows
memory.
Infinite lists can be created with these functions:
infinite(10)
- Creates an infinite list containing the values 10.
infinite_repeat([ 1, 2, 3])
- Creates an infinite list containing the values 1, 2, 3, 1, 2, 3, 1, 2, 3 ...
etc.
infinite_uninitialized(int)
- Creates an infinite lists with all members being exceptions. It may be
useful to create associative arrays. You can test if a member is uninitialized
with the function
is_uninitialized
.
Arrays
array(t, shape)
represents an array type whose elements have a type
of t
. shape
is a list of integers that represents
dimensions of the array.
var a := array_fill(1, [ 3, 3, 3 ]);
- Creates a three-dimensional array and fill it with value 1.
var a := array_sparse(1, [ 3, 3, 3 ]);
- Functionally, it is equivalent to array_fill. But it creates a compressed
array.
m := a[0, 1, 2];
- Pick a value at a give index.
a[0, 1, 2] := 100;
- Modify a value at a give index.
list_to_array
- Converts a list to an array.
array_to_list
- Converts an array to a list.
Note: arrays are just syntactic sugar for lists. Internally, the virtual machine
treats arrays as if they were lists.
Strings
Ajla has two kinds of strings. Byte strings are represented by the type
"bytes
" which is an alias for "list(byte)
" which is an
alias for "list(uint8)
". Character strings are represented by the
type "string
", which is an alias for "list(char)
"
which is an alias for "list(int32)
".
Character constants are specified using single quotes, for example
'C'
. Byte constants can be specified using quotation marks, for
example "hello"
. String constants are specified using backquotes,
for example `Hello`
. String constants are always considered as
UTF-8, regardless of the system locale — so that if the user moves the
source file between systems with different locales, we get consistent result.
For byte strings, the characters are stored system-defined locale. It is usually
UTF-8, but it may be different, depending on the operating system and the
"LANG
" variable.
The character strings are stored in Unicode. They use the "int32
"
type — that is arbitrary-precision integer. If there are Unicode combining
characters, they are not stored as a separate character, they are superimposed
to the character they belong to. In the unit charset
, there is
"const combining_shift : char := 21
— that means that a
combining character is shifted by 21 bits to the left and added to the base
character. The reason why is it done this way is to make sure that text editors
can treat each "char
" as one visible character and they don't have
to deal with combining characters in their logic.
The unit charset
(stdlib/charset.ajla
) contains the
conversion routines between ascii, utf-8, locale-specific encoding and strings.
If we want to write or read strings, we need to convert them to or from bytes
using the system locale. The system locale is obtained with the function
locale_init
or locale_console_init
. These functions
are almost equivalent, the only difference is on Windows 9x, where locale_init
returns the ANSI character set and locale_console_init returns the OEM character
set. On Windows NT, both of these functions return UTF-8 locale and the Ajla
runtime will translate UTF-8 names to UTF-16 names that are used by the Windows
NT syscalls. On Unix-based systems, both of these functions return the character
set as set by the variables "LC_ALL
", "LC_CTYPE
" or
"LANG
" and there is no translation of byte strings when they are
passed to syscalls.
For example, this program converts my name to the system locale and prints it:
uses charset;
fn main
[
var loc := locale_console_init(env);
write(h[1], string_to_locale(loc, `Mikuláš Patočka`) + nl);
]
The first statement loads the current locale based on the environment variables
being set. The function string_to_locale
will covert the
string
to bytes
represented by the current locale. It
will work not only on UTF-8 system, but on ISO-8895-2 system as well. If the
system locale doesn't have the characters 'á', 'š' or 'č',
they are converted to appropriate ascii characters.
Exceptions
Because Ajla can parallelize or reorder function calls, exceptions as we know
them from Java or C++ wouldn't be useful because they could be triggered at
random points. Exceptions in Ajla are implemented differently. Exception is just
a special value that can be stored in any variable.
For example "var x := 0 div 0;
" will store the "invalid
operation" exception into the variable x
.
If we don't use the variable x
, the exception is quietly discarded.
If we perform arithmetic using the exception, the exception is propagated. For
example, if we execute "var y := x + 1;
", the variable
y
will hold the exception as well. There is one exception to this
rule — the operators "and
" and "or
" don't always
propagate exception if one of the arguments is known. "false and
exception
" or "exception and false
" evaluates to
"false
". "true or exception
" or "exception or
true
" evaluates as "true
".
If we attempt to perform a conditional branch that depends on the exception
value, the current function is terminated and the exception is returned to the
caller. For example "if x = 3 then something;
" will terminate the
current function.
There's an operator "is_exception" that returns true if the argument is an
exception and that returns false otherwise. It allows us to "handle" the
exception. For example, we could write this code to report the exception to the
user:
if is_exception x then
write(h[1], "Exception occurred" + nl);
else
write(h[1], "There's no exception, the value is " + ntos(x) + nl);
Floating point "NaN" values are treated like exceptions —
is_exception
will return true
if the value is a NaN.
Every exception contains three values:
- Class
ec_sync
, ec_async
, ec_syscall
or
ec_exit
ec_sync
- The exception happened due to execution of the program. For example, invalid
numeric calculation or index out of array/list size.
ec_async
- The exception happened due to conditions not related to the program. For
example, memory allocation failure falls into this category.
ec_syscall
- The exception happened because some syscall failed.
ec_exit
- The exception holds the return value that should be returned when the
program exits.
- Type
- This is an exception code. Exception types are listed in the file
stdlib/ex_codes.ajla
— see the constants
"error_*
".
- Code
- This is auxiliary value. It's meaning depends on the exception type.
- Type:
error_system
- Code is one of the
system_error_*
values.
- Type:
error_errno
- Code is the
errno
value.
- Type:
error_os2
- Code is the OS/2 error number.
- Type:
error_os2_socket
- Code is the OS/2 socket error number.
- Type:
error_win32
- Code is the Windows error number.
- Type:
error_h_errno
- Code is the
h_errno
error number.
- Type:
error_gai
- Code is the
getaddrinfo
return value.
- Type:
error_subprocess
- Code is the subprocess exit number, if the code is negative, it is the
signal number that terminated the subprocess.
- Type:
error_exit
- Code is the return value that should be returned from the current process.
Note that because different systems (POSIX, OS/2, Windows) have different error
codes, Ajla tries to translate common error codes to one of the
system_error_*
values. For example, a "file exists" error gets
translated to system_error_eexist
, so that the program that tests
for it can be portable. However, not all error codes could be translated and if
an unknown error code is received, it is reported as error_errno
,
error_os2
or error_win32
depending on the operating
system.
Additionally, exceptions may contain an optional error string and an optional
stack trace, so that the user can determine where did the exception happen.
Operators that examine exceptions
is_exception
- Returns true if the argument is an exception.
exception_class
- Returns class of an exception.
exception_type
- Returns type of an exception.
exception_aux
- Returns auxiliary value of an exception.
exception_string
- Returns a string representing the type and aux values, you can use it to
display the exception to the user.
exception_payload
- Returns the raw string attached to the exception.
exception_stack
- Returns the stack trace attached to the exception.
Function that manipulate exceptions
The unit exception
(located in the file
stdlib/exception.ajla
) contains the following functions:
exception_make
- Makes an exception with the given class, type and code and optional stack
trace.
exception_make_str
- Makes an exception with the given class, type, code and string and optional
stack trace.
exception_copy
- Copies the exception from a variable
s
that has a type
src_type
to the return value that has a type dst_type
.
The variable s
must hold an exception, if not, invalid
operation exception is returned.
Statements that manipulate exceptions
eval expression
- Evaluate a given expression (or more expressions separated by a comma), and
discards the result. It may be used to print debugging messages, for example
eval debug("message")
. The debug
statement writes the
message to the standard error handle.
xeval expression
- Evaluate a given expression (or more expressions separated by a comma). If
the result is non-exception, the result is discarded. If the result is an
exception, the current function is terminated and the exception is returned as a
return
value.
abort
- Terminate the current function with with
ec_sync
,
error_abort
.
abort expression
- Evaluate a given expression (or more expressions separated by a comma). If
the result is an exception, the current function is terminated and the exception
is returned as a return value. If the result is non-exception, the current
function exits with
ec_sync
, error_abort
. It may be
used with the statement "internal
" to terminate the whole process
if some internal error happens — abort internal("this shouldn't
happen")
.
keep variables
- Doesn't evaluate the variables, it just marks the variables as live, so
that the optimizer won't discard them.
Syntax errors
Note that syntax errors are also treated as exceptions — if the function
with syntax error is never called, the error is ignored; if it is called, the
exception is returned as a return value. In this program, we define a function
"syntax_error
" that contains a syntax error:
fn syntax_error : int
[
bla bla bla;
]
fn main
[
var q := syntax_error;
if is_exception q then [
write(h[1], "Exception happened" + nl);
]
]
This program will write: "Exception happened
".
Reporting exceptions lazily when the function is called may not be useful during
program development — when you are developing a program, it is recommended
to use the "--compile
" flag. It will attempt to compile all the
functions in the program and it will report an error if any of them fails.
I/O
We have already seen the function write
to perform an I/O. Let's
have a look at other I/O functions. I/O functions take and return the value of
type world
, this ensures that they are properly sequenced and that
they are not reordered or executed in parallel. I/O functions specify a handles
on which the I/O is to be performed, handle
represents a handle to
a file (or pipe, character device, block device, socket). dhandle
represents a handle to a directory. Handles are automatically closed when the
handle variable is no longer referenced by the program.
The function main
receives a dhandle
argument that is
the handle to the current working directory and a list of handle
values that represents the standard input, output and error streams.
Handles may be manipulated in three modes:
- Read mode
- The handle is being read sequentially
- Write mode
- The handle is being written sequentially
- Block mode
- You can perform read and write operations on arbitrary offsets in the file.
This mode only works for files and block devices.
You shouldn't mix these modes on a single file because it may result in bugs
— for example, some operating systems have the functions
pread
and pwrite
and they use them when doing I/O on
the handle in the block mode. However, other operations systems don't have these
functions, so they will lock the file handle, perform lseek
,
perform read
or write
and unlock the file handle. If
you mixed block mode with read mode, the read mode would read from invalid file
offsets that were set up by the block mode.
The I/O functions are defined in the unit io
. This unit is
automatically included in the main program. If you need I/O in other units, you
must import the io
unit explicitly.
fn ropen(w : world, d : dhandle, f : bytes, flags : int) : (world,
handle);
- This function will open a file in a read mode and return a handle to the
file.
w
is the world token, d
is the base directory
that is used for file name lookup, f
is the file name,
flags
is one of the open_flag_*
flags. For this
function, only the flag open_flag_no_follow
is allowed.
fn read(w : world, h : handle, size : int) : (world, bytes);
- Read the specified number of bytes from the handle. If end-of-file is
detected, it returns less bytes. If not enough bytes is available (in case of a
pipe or a character device), the function will sleep until the specified number
of bytes is read.
fn read_partial(w : world, h : handle, size : int) : (world,
bytes);
- Read the specified number of bytes from the handle. If not enough bytes are
available, the function returns less bytes. If no bytes are available, the
function sleeps until at least one byte is returned.
fn wopen(w : world, d : dhandle, f : bytes, flags : int, mode : int) :
(world, handle);
- Opens a file in a write mode and return a handle to it.
flags
contain one or more of open_flag_append
,
open_flag_create
, open_flag_must_create
,
open_flag_no_follow
. open_flag_append
specifies that
we want to append to the file rather than overwrite it,
open_flag_create
specifies that we want to create the file if it
doesn't exist, open_flag_must_create
specifies that we want to fail
with an error if the file exists, open_flag_no_follow
suppresses
the dereferencing of the last symlink in a file name. mode
represents the permissions of the file if it is created, it may be
open_mode_ro_current_user
, open_mode_ro_all_users
,
open_mode_rw_current_user
, open_mode_read_all_users
,
open_mode_default
or other value.
fn write(w : world, h : handle, s : bytes) : world;
- Writes the bytes to the write handle.
fn wcontiguous(w : world, h : handle, size : int64) : world;
- Allocates a contiguous space for
size
bytes. Only some of the
operating systems (Linux and OS/2) support preallocation of file data. If the
operating system doesn't support it, the function returns with success and does
nothing.
fn pipe(w : world) : (world, handle, handle);
- Creates a pipe and returns two handles. The first handle is used for reading
from the pipe and the second handle is used for writing to the pipe.
fn bopen(w : world, d : dhandle, f : bytes, flags : int, mode : int) :
(world, handle);
- Opens a file in a block mode.
flags
is a combination of
open_flag_*
. mode
represents the permissions of the
file if it is created.
fn bread(w : world, h : handle, position : int64, size : int) :
(world, bytes);
- Read bytes from the specified position. If end-of-file is encountered, the
function returns less bytes.
fn bwrite(w : world, h : handle, position : int64, s : bytes) :
world;
- Write bytes to the specified position. If we write beyond file end, the file
is extended.
fn bsize(w : world, h : handle) : (world, int64);
- Returns the size of the file.
fn bdata(w : world, h : handle, off : int64) : (world, int64);
- Skips over a hole in the file. Returns a next offset where some data are
allocated. Some filesystems do not support holes; for them this function returns
off
.
fn bhole(w : world, h : handle, off : int64) : (world, int64);
- Skips over data in the file. Returns a next offset where there is a hole in
the file. Some filesystems do not support holes; for them this function returns
fn bsetsize(w : world, h : handle, size : int64) : world;
- Truncates a file to the specified size or extends it.
fn bcontiguous(w : world, h : handle, pos : int64, size : int64) :
world;
- Allocates a contiguous space at the specified offset. Some operating systems
do not support file preallocation, for them, this function return success
without doing anything.
fn bclone(w : world, src_h : handle, src_pos : int64, dst_h : handle,
dst_pos : int64, size : int64) : world;
- Clones a byte range from
src_h
starting at src_pos
to dst_h
starting at dst_pos
. Only some filesystems
support cloning, if the filesystem doesn't support it, an error is reported.
fn droot(w : world) : dhandle;
- Returns a handle to the root directory. On Windows or OS/2, it returns a
handle to
C:\
.
fn dnone(w : world) : dhandle;
- Returns an invalid directory handle. It is useful if you want to open a file
and you know that the file path is absolute — in this case, you can pass
dnone()
to ropen
, wopen
,
bopen
or dopen
.
fn dopen(w : world, d : dhandle, f : bytes, flags : int) : (world,
dhandle);
- Open a directory that is relative to an existing directory. The only allowed
flag is
open_flag_no_follow
.
fn dread(w : world, d : dhandle) : (world, list(bytes));
- Reads the directory and returns the list of directory entries.
fn dpath(w : world, d : dhandle) : (world, bytes);
- Returns a path that the directory points to. Note that we cannot return a
path for file handles, because there may be multiple names referring to a single
inode.
fn dmonitor(w : world, d : dhandle) : (world, world);
- Monitor the specified directory for changes. If the operating system doesn't
support directory monitoring, exception is returned. If the operating system
supports directory monitoring, the first value is returned immediately and the
second value is returned when the directory changed.
There are more functions that manipulate files and directories, see the file
stdlib/io.ajla
.
I/O error handling
If some I/O operations fails, an exception is stored into the resulting
world
. If you attempt to perform more I/O with the world tag that
is an exception, no I/O is performed, and the exception is returned back in the
world tag. Consequently, you don't need to test for exception after every I/O
operation — you can perform several I/O operations with no exception
checking between them and check exceptions only once at the end of the sequence.
If you attempt to write to a file and the array that is being written contains
an exception, the data up to the exception is written and then the exception is
returned.
If you need to recover from the exception, you need to take a world value that
existed before the exception happened and use it as a current world value
— there's a function recover_world
that does it.
This is an example program that shows how exceptions could be recovered. It
takes two arguments, coverts both of them to integer numbers, tries to divide
them and writes the result.
fn main
[
var old_w := w;
var x1 := ston(args[0]);
var x2 := ston(args[1]);
write(h[1], ntos(x1) + " / " + ntos(x2) + " = " + ntos(x1 div x2) + nl);
if is_exception w then [
var msg := exception_string w;
recover_world(old_w);
write(h[1], "An exception " + msg + " happened and was recovered" + nl);
]
]
If the second argument is zero, you get this output "1 / 0 = An exception
Invalid operation happened and was recovered
". The msg
variable will capture the exception message. Then, we recover the world tag, so
that it points to an older value before the exception happened. As we recovered
it, we can write to the standard output again.
If you don't pass any arguments to this programs, you get "An exception
Index out of range happened and was recovered
" — here, the
exception happens when we attempt to evaluate "args[0]
" and
"args[1]
".
Lazy functions for I/O
In io.ajla
there are functions with _lazy
suffix. They
do not take world as an argument and do not return it. They are intended to be
used in situations where the data accessed are not supposed to change during
program execution. If the files and directories don't change, we don't need to
sequence the I/O using the world variable.
For example, the function read_lazy
will read a handle and return a
list of bytes, reading more data as they are needed. This program will read
lines from standard input, echoing the lines back, and prints "read all
the lines, exiting...
" when the user terminates the input stream with
Ctrl-D.
fn main
[
var lines := list_break_to_lines(read_lazy(h[0]));
for line in lines do [
write(h[1], "read a line: """ + line + """." + nl);
]
write(h[1], "read all the lines, exiting..." + nl);
]
The list_break_to_lines
functions takes a list of bytes, breaks it
to separate lines and returns list of lists of bytes representing the lines.
Units
Larger program can be broken up into multiple units. Unit starts with the
"unit
" keyword and the unit name (the unit name must match the file
name), then, there's an interface section listing all public functions and
types. Then there's the "implementation
" keyword and then there are
private functions and types and implementations of the public versions.
The "uses
" keyword will import a unit and make all public functions
and types visible. You can import units from your program or from the standard
library. The standard library is located in the directory "stdlib
".
Some of these units are declared with "private unit
" keywords,
these cannot be imported by your program, they can only be imported by other
units in the standard library.
Every Ajla source file automatically imports the unit "system.ajla
.
The main program source file also imports "io.ajla
" and
"treemap.ajla
".
Let's have look at the Fibonacci number calculation and let's
move the calculation to a separate unit. Now, we have a file
"fibunit.ajla
" with
this content:
unit fibunit;
fn fib(n : int) : int;
implementation
fn fib(n : int) : int
[
if n <= 1 then
return n;
else
return fib(n - 2) + fib(n - 1);
]
And a file "fib.ajla
" with this content:
uses fibunit;
fn main
[
var x := ston(args[0]);
var f := fib(x);
write(h[1], "fib " + ntos(x) + " = " + ntos(f) + nl);
]
The interface section in the "fibunit
" file specifies that the unit
exports one function, "fib
". The file "fib.ajla
"
imports the unit using the "uses fibunit;
" statement.
For larger programs, units may be located in different directories, when you
import them, you use dot as a directory separator. For example "uses
ui.curses;
" imports the unit from the file
"stdlib/ui/curses.ajla
".
Name clashes
If you declare a unit with a name that's already present in the standard
library, your unit will take precedence and it will be imported instead of the
unit in the standard library. However, if some file in the standard library
imports a unit that's defined both in the standard library and in your program,
the version from the standard library will be used.
The same rule applies when function or type names clash. Your program will use a
function that's declared in your program, however the standard library will use
a version that's declared in the standard library.
In new version of Ajla, the standard library can be extended with new units,
functions and types. However, it shouldn't break existing programs because they
will preferentially reference units, functions and types that are declared in
them.
Hiding the implementations of types
Just like you can hide function implementations in the implementation section,
you can hide type implementations as well. For example, see the unit
heap
from the standard library (i.e. the file
stdlib/heap.ajla
). In the interface section, there's
type heap(key : type, cls : class_ord(key));
In the implementation section, there's
type heap(key : type, cls : class_ord(key)) := list(key);
When you import the unit using "uses heap
", the compiler will read
the statements up to the "implementation
" keyword and then stop.
The compiler will see that "heap
" is a type that has one
"key
" parameter and one "cls
" parameter, but it won't
know how this type is defined. So, it will allow you to pass the variables with
the "heap
" type back and forth between functions, but it won't let
you modify these variables directly.
When the compiler will be parsing the heap.ajla
file directly, it
will process the line "type heap(key : type, cls : class_ord(key)) :=
list(key);
". From this point on, it knows that "heap
" is
implemented as a list and it will allow all list operations to be performed on
the "heap
" type.
We can hide implementations of types from common code, so that the
implementations may be changed without disrupting the program. For example, we
could change the implementation of heap from a list to a binary tree and
existing code could use the same interface functions.
Type classes
Type classes allow us to write functions that can operate on different types.
For example, let's have a look at this function that multiplies matrices
containing double-precision numbers:
fn matmult(const n : int, a b : array(real64, [n, n])) : array(real64, [n, n])
[
var result := array_fill(0., [n, n]);
for i := 0 to n do
for j := 0 to n do
for k := 0 to n do
result[i, j] += a[i, k] * b[k, j];
return result;
]
The variable n
is the size of the matrices — note that if we
want to use a variable in a type definition, we must declare it with a
const
specifier, so that the variable cannot be changed. If we
changed n
, it would no longer match the size of the arrays.
a
and b
are two matrices — they are square
arrays of real64
with both dimensions equal to n
. The
function return another square array of real64
.
What if we want to multiply matrices containing single-precision numbers? We
could copy the code and replace real64
with real32
,
but copying code is considered antipattern. Or, we can use type classes:
fn matmult~inline(t : type, implicit cls : class_unit_ring(t), const n : int,
a b : array(t, [n, n])) : array(t, [n, n])
[
var result := array_fill(cls.zero, [n, n]);
for i := 0 to n do
for j := 0 to n do
for k := 0 to n do
result[i, j] += a[i, k] * b[k, j];
return result;
]
In this second example, we can see new arguments. The t
argument
represents a type. The cls
argument represents operations that can
be done on the type t
. We need to perform addition and
multiplication on the type t
, so we use the "unit ring" class that
has these operations. The arguments n
, a
and
b
are similar to the previous example.
Type class is a record that is parameterized by a type and that contains
constants or function references. We can look at the standard library
(stdlib/system.ajla
) for the definition of
class_unit_ring
:
record class_unit_ring(t : type) [
add : fn(t, t) : t;
zero : t;
neg : fn(t) : t;
subtract : fn(t, t) : t;
multiply : fn(t, t) : t;
one : t;
]
The function matmult
can accept any type for which
class_unit_ring
exists — it accepts floating-point numbers,
integers, rational numbers and fixed-point numbers. You can declare
class_unit_ring
for your own type and then, you can pass this type
to the function matmult
as well. In order to improve performance,
we should define the function matmult
with "~inline
".
Without "~inline
", the compiler would do indirect function call
through the class_unit_ring
record for every multiplication and
addition.
Note that if we pass less arguments than what was specified in the function
prototype, Ajla attempts to infer the remaining arguments. So, you can write
this:
var m1 : array(real32, [ 10, 10 ]);
var m2 : array(real32, [ 10, 10 ]);
... fill m1 and m2 with some data ...
var m3 := matmult(m1, m2);
The first argument (the type) is inferred from the type in the arguments
m1
and m2
. The second argument (the class) is inferred
from the standard library. The third argument (the dimension) is inferred from
the dimensions of arguments m1
and m2
.
Standard type classes
This diagram shows some of the default type classes defined in the standard
library:
Magma, monoid, group, unit ring and division ring come from abstract algebra.
Magma has one binary operation. Monoid has one binary operation and a zero
element. Group is a monoid that has inverse element. Unit ring is a commutative
group that has multiplication and a "1" element. Division ring is a unit ring
that has a reciprocal.
Real number class has additional mathematical functions, you can look at
class_real_number
in stdlib/system.ajla
for the list
of them. Integer number has operations that can be done on integers (see
class_integer_number
), fixed integer number has all the operation
of integer number, and adds operations that can be only done on integers that
have fixed size — rol
, ror
, bswap
and brev
.
class_eq
represents types for which equality is defined.
class_org
represents types where we can compare elements.
class_logical
represents types with the "and
",
"or
", "xor
" and "not
" operations.
For example, this is a prototype of the function list_sort
that
sorts a list:
fn list_sort(t : type, implicit c : class_ord(t), l : list(t)) :
list(t);
It takes a type, an ordered class and a list of elements of type t
.
It returns the sorted list. Because we specified that the type has an ordered
class, we can use comparison operators =
, <>
,
<=
, >=
in the function body. If you need to sort
a list with a custom comparison operation, you can define your own
"class_ord
" containing a pointer to your comparison functions and
pass it as an argument to the "list_sort
" function.
Note that standard operators are defined using type classes. For example the
multiplication operator is defined as "operator * 2000 ~inline (t : type, c
: class_unit_ring(t), val1 val2 : t) : t := c.multiply(val1, val2);
", so
that we can use the operator on any type that has class_unit_ring defined.
Threads
Ajla uses world-passing to sequence I/O. If we need to create more threads, we
can split the variable "w
" to more variables and the virtual
machine will execute them concurrently.
This is an example of a loop that prints numbers from 1 to 10 and waits one
second between them:
fn main
[
for i := 1 to 11 do [
w := sleep(w, 1000000);
w := write(w, h[1], ntos(i) + nl);
]
]
Now, we convert this example to two loops that run concurrently:
fn main
[
var w1, w2 := fork(w);
for i := 1 to 11 do [
w1 := sleep(w1, 1000000);
w1 := write(w1, h[1], "thread 1: " + ntos(i) + nl);
]
for i := 1 to 11 do [
w2 := sleep(w2, 1000000);
w2 := write(w2, h[1], "thread 2: " + ntos(i) + nl);
]
w := join(w1, w2);
]
The "fork
" function takes a "world
" argument and
splits it into two world arguments — "w1
" and
"w2
". The first loop will iterate from 1 to 10 and sequence the I/O
using "w1
" and the second loop will iterate from 1 to 10 and
sequence the I/O using "w2
". The "join
" function takes
two world arguments, evaluates them both, and if one of them is an exception,
the exception is returned. If both of them are not exceptions, a valid
"world
" variable is returned. The output of this program is:
thread 1: 1
thread 2: 1
thread 1: 2
thread 2: 2
thread 1: 3
thread 2: 3
thread 1: 4
thread 2: 4
thread 1: 5
thread 2: 5
thread 1: 6
thread 2: 6
thread 1: 7
thread 2: 7
thread 1: 8
thread 2: 8
thread 1: 9
thread 2: 9
thread 1: 10
thread 2: 10
What is happening here? The loops are not really run in parallel, they are
executed sequentially — Ajla can't parallelize statements within a single
function. The first "sleep
" function is executed and it waits for 1
second. But before it returns, two timer ticks elapse and, as we have seen in
the automatic parallelization section,
the "main
" function resumes execution and the "w1
"
variable is pointing to a thunk. Next, we execute the "write
"
function, but because the variable "w1
" is not known yet, the
"write
" function blocks. After two timer ticks, we return back to
the "main
" function, and we go to second iteration of the first
loop. "w1
" now points to a thunk that references the
"write
" function and this thunk references the
"sleep
" function. In the second iteration, we do exactly the same
steps as in the first iteration — and so on up to the tenth iteration: we
have gone through the first loop without printing anything, just creating a
chain of thunks.
The second loop is executed as the first loop — it generates a chain of
thunks referring functions "sleep
" and "write
" without
printing anything. Then, we go to the "join
" function — it
waits on "w1
". As it waits longer than two timer ticks, it is
parallelized too and the variable "w
" will be pointing to a thunk
that references the "join
" function.
Now, "w
" is returned from the "main
" function and the
virtual machine will attempt to evaluate it. The evaluation will force the
execution of the "sleep
" and "wait
" functions,
resulting in numbers being printed to the standard output.
We can use the "~spark
"
specifier to make sure that the functions are
parallelized immediately and that they don't wait for two timer ticks before the
parallelization starts. This makes this example execute faster:
fn main
[
var w1, w2 := fork(w);
for i := 1 to 11 do [
w1 := sleep~spark(w1, 1000000);
w1 := write~spark(w1, h[1], "thread 1: " + ntos(i) + nl);
]
for i := 1 to 11 do [
w2 := sleep~spark(w2, 1000000);
w2 := write~spark(w2, h[1], "thread 2: " + ntos(i) + nl);
]
w := join(w1, w2);
]
Infinite loops
Beware of infinite loops:
fn main
[
while true do [
w := write(w, h[1], "Hello World!" + nl);
]
return w;
]
This will not print anything, because the dead code elimination pass will remove
the "w
" variable as well as the function "write
" that
sets it because the variable can't be returned from the function. A proper way
how to write it is to use a "xeval w
" statement in an infinite
loop, so that the loop is terminated if the "write
" function fails:
fn main
[
while true do [
w := write(w, h[1], "Hello World!" + nl);
xeval w;
]
return w;
]
Another possibility how to fix it is to test the "w
" variable for
exception in the loop condition.
fn main
[
while not is_exception w do [
w := write(w, h[1], "Hello World!" + nl);
]
return w;
]
Note that the "w
" variable may be omitted, as shown in the
Hello World section.
Other thread functions
any
- Start evaluating both values and wait until any of them becomes evaluated.
Returns "
false
" if the first value is evaluated and
"true
" if the second value is evaluated. If both of them are
evaluated, "false
" is returned.
any_list
- Start evaluating all the values in the list and wait until any of the
becomes evaluated. Returns the index of the first value that is evaluated.
is_ready
- Returns "
true
" if the value is evaluated.
never
- Blocks and never completes.
atomic_enter
- Increases the atomic count — when the atomic count is non-zero, the
thread will not be killed.
atomic_exit
- Decreases the atomic count — when the atomic count is zero, the thread
may be killed if its return value is not used.
Message queues
We can use message queues to communicate between threads. They are defined in
the file "stdlib/msgqueue.ajla
".
msgqueue_new
- Creates a new message queue holding entries of the specified type.
msgqueue_send
- Sends a message to the message queue. Every message has a tag and a value.
The tag may be used to filter messages on the receive side.
msgqueue_replace
- Replace the content of the queue with a given message.
msgqueue_receive
- Waits until the queue is non-empty and returns the first message.
msgqueue_receive_tag
- Waits until the queue has at least one message with a given tag and returns
this message.
msgqueue_receive_nonblock
- If the queue is non-empty, return the first message, otherwise return the
exception "
error_not_found
".
msgqueue_receive_tag_nonblock
- If the queue contains a message with the specified tag, return it, otherwise
return the exception "
error_not_found
".
msgqueue_peek_nonblock
- Return a message without removing it from the queue. If the queue is empty,
the exception "
error_not_found
" is returned.
msgqueue_peek_tag_nonblock
- Return a message with the specified tag without removing it from the queue.
If there is no such message, the exception "
error_not_found
" is
returned.
msgqueue_wait
- Wait until the queue becomes non-empty.
msgqueue_is_nonempty
- Returns true if the queue is non-empty.
msgqueue_any
- Wait until any of the two message queues becomes non-empty.
Note that there is one gotcha when using message queues — when you insert
a record containing a message queue to the message queue itself, the message
queue will leak. Ajla uses reference counts to track memory and in this
situation, there will be circular reference dependence that prevents the message
queue from being freed. The memory leak will be detected and resolved when Ajla
exits, however while it is running, there is no way how to detect it and free
the leaked message queue.
This is an example program that uses a message queue to read characters from the
keyboard:
uses ui.termcap;
uses ui.event;
fn main
[
var tc := termcap_init(d, env);
var loc := locale_console_init(env);
var q := msgqueue_new(event);
var kbd_thread := event_get_keyboard(h[0], tc, loc, q);
while not is_exception w do [
var tag, e := msgqueue_receive(q);
if e is keyboard then [
write(h[1], "keyboard event " + ntos(e.keyboard.key) + ", " + ntos(e.keyboard.flags) + nl);
if e.keyboard.key = 'q' then break;
]
keep kbd_thread;
]
]
var tc := termcap_init(d, env);
- Initialize the termcap database and store it to the variable
"
tc
".
var loc := locale_console_init(env);
- Initialize the locale according to environment variables.
var q := msgqueue_new(event);
- Create a message queue of events. "
event
" is defined in the
file "stdlib/ui/event.ajla
".
var kbd_thread := event_get_keyboard(h[0], tc, loc, q);
- Create a thread that reads the keyboard from handle 0. "
tc
" is
the termcap database, "loc
" is the current locale and
"q
" is the message queue, where events will be sent.
while not is_exception w do
- Loop while there is no exception.
var tag, e := msgqueue_receive(q);
- Read the message queue; wait if it is empty.
if e is keyboard then
- If the event is a keyboard event, take this branch.
write(h[1], "keyboard event " + ntos(e.keyboard.key) + ", " +
ntos(e.keyboard.flags) + nl);
- Print the keyboard event.
if e.keyboard.key = 'q' then break;
- Exit if the user pressed 'q'.
keep kbd_thread;
- This statement prevents the optimizer from removing the thread. Without this
statement, the optimizer would conclude that the variable
"
kbd_thread
" is not used anymore, it would free it and that would
terminate the "event_get_keyboard
" thread.
FFI
FFI (foreign function interface) allows us to call functions that are defined in
some of the system libraries. It is the only unsafe part of Ajla — it
cannot be memory-safe because C pointers aren't safe as well. FFI only works if
the library "libffi
" was present when Ajla was compiled. If not,
the FFI functions return the exception "error_not_supported
".
This is an example program that uses FFI to print the string "Hello World!" to
the standard output, using the "write
" function imported from the
standard library.
uses ffi;
fn main
[
var str := "Hello World!" + nl;
var destr := ffi_destructor_new();
var message := ffi_destructor_allocate(destr, len(str), 1, false);
ffi_poke_array(message, str);
var wrt := ffi_create_function("", "write", ffi_error.e_errno, 3,
ffi_type.t_ssize, [ffi_type.t_sint, ffi_type.t_pointer, ffi_type.t_usize ]);
var xh := ffi_handle_to_number(destr, h[1]);
var r, e := ffi_call_function(wrt, [ xh, message, len(str) ]);
ffi_destructor_destroy(destr);
if r = -1 then [
write(h[2], "error " + ntos(e) + " occurred" + nl);
exit(1);
] else if r <> len(str) then [
write(h[2], "wrote only " + ntos(r) + " bytes" + nl);
exit(1);
]
]
Ajla functions may be terminated any time, so in order to reliably free memory,
the FFI interface introduces so-called destructors. The destructor may contain
function calls and/or file handles and/or allocated memory. When the destructor
variable is freed, the stored calls are performed and then the allocated memory
is freed and the associated file handles are closed (if no one else refers to
them).
var str := "Hello World!" + nl;
- The string to be written.
var destr := ffi_destructor_new();
- This line allocates a destructor. So far, it is empty, so no operation is
done when the destructor is freed.
var message := ffi_destructor_allocate(destr, len(str), 1,
false);
- This line allocates memory from the destructor and returns a pointer to the
allocated memory.
len(str)
is the size of the allocated memory,
1
is the alignment of the memory, false
specifies that
the memory doesn't have to be cleared. When the destr
variable is
freed, the allocated memory is automatically freed as well.
ffi_poke_array(message, str);
- This copies the string to the pointer returned by the previous function.
Note that the function
ffi_poke_array
may be used to overwrite
arbitrary memory in the process.
var wrt := ffi_create_function("", "write", ffi_error.e_errno, 3,
ffi_type.t_ssize, [ffi_type.t_sint, ffi_type.t_pointer, ffi_type.t_usize
]);
- This statement loads a pointer to the function "
write
" from the
standard library. The first argument is a library name (empty string for libc),
the second argument is the name of the function, the third argument specifies
that we are interested in the errno
value returned by the function,
the fourth argument is the number of fixed arguments, the fifth argument is the
return type, the sixth argument is the list containing types for the three
arguments — the first argument has type signed integer, the second
argument has type pointer, the third argument has type unsigned size_t.
var xh := ffi_handle_to_number(destr, h[1]);
- This statement converts an Ajla handle to a system handle. The result will
be "1" because we are writing to the standard output. A reference to the handle
is stored in the destructor, so that it will be dropped when the destructor is
dropped.
var r, e := ffi_call_function(wrt, [ xh, message, len(str)
]);
- Call the "
write
" function, r
is the return value,
e
is the errno
value.
ffi_destructor_destroy(destr);
- This function indicates that the
destr
variable should be freed
at this point. If we didn't use ffi_destructor_destroy(destr)
, the
destr
variable would be freed when it goes out of scope (i.e. after
the call to ffi_handle_to_number
) and the function
ffi_call_function
would access invalid memory and invalid handle.
- if r = -1 then ...
- The rest of the program prints an error if the write failed.
Ajla uses both interpreter and machine code generator when running the program.
The interpreter handles all the constructs of the language and the machine code
generator handles only the common constructs. For example, if we add two
integers, it is translated to the instruction "add" followed by the instruction
"jo" that jumps to the interpreter on overflow. The interpreter will perform the
overflowed operation on long integers using the gmp library.
In order to get decent performance, you must make sure that the hot spot of the
program is running in the machine code and not in the interpreter. You shouldn't
use long integers, lazy evaluation, exceptions, sparse lists, infinite lists in
the hot spot, because these constructs cause escape from the machine code to the
interpreter. There are functions "list_flatten
" and
"array_flatten
" that convert sparse list or array to a flat
structure — you can use these functions before the hot spot to make sure
that the hot spot can access the array without escaping to the interpreter.
There are following profiling parameters:
--profile=function
- It will count how many times each function is called and the time spent in
the function and it will display the functions sorted by the time.
--profile=escape
- It will show locations where escape from the machine code to the interpreter
happened. It shows the function, the line number, the number of escapes and the
opcode that triggered the escape.
--profile=memory
- It will show where memory was allocated.
--profile
- Enable "function", "escape" and "memory".
If you enable "--debug=leak
", Ajla will maintain a list of all
memory blocks that are allocated. You can dump this list at various points in
your program to see where is it allocating most memory. You can use:
eval report_memory_summary("hello");
- Report the summary of allocated memory, for example "
DEBUG MESSAGE:
allocated memory at hello: 689030 / 4054 = 169
".
eval report_memory_most("hello");
- Report locations where most memory was allocated, for example this
"
DEBUG MESSAGE: pcode.c:3298 284314 / 234 = 1215
" means that
there is 284314 bytes in 234 blocks allocated at pcode.c:3298.
eval report_memory_largest("hello");
- Report the largest allocated blocks, for example this
"DEBUG MESSAGE:
pcode.c:3298 35378"
means that there is a block of 35378 bytes being
allocated at pcode.c:3298.
Other options
--compile
- Compile the whole program without running it. It is useful if you need to
check the source code for syntax errors. Without this switch, the program is
being compiled as it runs and syntax errors in unused parts of the program are
not reported.
--nosave
- Do not save and load compiled code. This option also inhibits saving and
loading cache of functions with the
~save
specifier.
--ptrcomp
- Use pointer compression — pointers will have only 32 bits and the
maximum allocatable memory is 32GiB. It reduces memory consumption slightly.
--strict-calls
- Turn off auto-parallelization. It is useful if you want to get longer stack
trace for debugging purposes. Normally, the stack trace is chopped at a point
where auto-parallelization happens.
--system-malloc
- Use system malloc instead of Ajla malloc. It reduces memory consumption, and
it decreases performance slightly.
--thread-tick
- Use a thread for timer ticks instead of using a signal.
--threads=n
- Use the specified number of threads. By default, Ajla detects the number of
hardware threads in the system and uses this value.
--tick=n
- The timer tick in microseconds. By default, it is 10000. If you specify too
small value, you should enable
--thread-tick
as well. If the tick
value is smaller that the time it takes to process a signal, the signal would be
hammering the worker threads so heavily, that they can't make any progress.
Hacking Ajla
Configure options
--enable-threads=pthread
- Compile with pthreads (default, if pthreads are detected).
--enable-threads=win32
- Compile with Windows threads.
--disable-threads
- Compile without threads.
--disable-computed-goto
- Don't use computed goto. By default, computed goto is used if the compiler
supports it.
--enable-bitwise-frame
- Use bitwise tags on the stack rather than byte tags. It may or may not
improve performance depending on the actual workload. By default, bitwise tags
are disabled, because they perform slightly worse.
--enable-debuglevel=0
- No debugging at all.
--enable-debuglevel=1
- Default option. This enables debugging assertions in
non-performance-critical parts of the code. It also enables
"
--debug
" command line flags that turn on debugging of various
subsystems of Ajla.
--enable-debuglevel=2
- This enables all debugging assertions even in performance-critical parts of
the code.
--enable-debuglevel=3
- This enables all debugging checks that degrade performance seriously.
If you need fine-grained control of debugging, you can modify the file
"debug.h
".
Debugging options
If you configured Ajla with "--enable-debuglevel=1
",
"--enable-debuglevel=2
" or "--enable-debuglevel=3
",
the following options are available:
--debug=magic
- Put a magic value at the start every memory block. The magic value is
verified when reallocating or freeing the block. If there is a mismatch,
internal error is reported and the core is dumped.
--debug=redzone
- Put a redzone value at the end of every memory block and verify it when
reallocating or freeing the block.
--debug=fill
- Fill the allocated and freed blocks with a byte pattern.
--debug=leak
- Maintain a list of allocated blocks and test for memory leaks when Ajla
exits. It will display a file and line of the allocation that leaked.
--debug=memory
- Enable "magic, redzone, fill, leak".
--debug=mutex-errorcheck
- Set the pthread attribute PTHREAD_MUTEX_ERRORCHECK on mutexes.
--debug=mutex
- Check the correct usage of mutexes. Also, enable
"
mutex-errorcheck
".
--debug=cond
- Check the correct usage of condition variables.
--debug=thread
- Check the correct usage of threads.
--debug=tls
- Check the correct usage of thread-local storage.
--debug=handles
- Check the correct usage of handles.
--debug=objects
- Enable "mutex-errorcheck, mutex, cond, thread, tls, handles".
--debug
- Enable all debugging options.
Rebuilding the standard library
The standard library and the compiler source code is located in the directories
"stdlib
" and "newlib
". These directories have
identical content. The file "builtin.pcd
" contains compiled
standard library and the compiler itself.
You shouldn't modify the content of the directory "stdlib
" because
the directory would not match the file "builtin.pcd
" and it would
result in crashes.
If you need to modify the standard library, you should modify files in the
directory "newlib
" and then run the script
"./scripts/update.sh
" or "./scripts/update.sh all
"
— this will rebuild "builtin.pcd
" and then copy the content
of "newlib
" to "stdlib
" to make sure that it matches
newly generated "builtin.pcd
".