In Elk, there exists a one-to-one relationship between Scheme primitives and C functions: each Scheme primitive--whether predefined or user-defined--is implemented by a corresponding C function. This includes special forms, which are treated as a special kind of primitives in Elk. Extensions and applications use the function Define_Primitive() to register a new Scheme primitive with the interpreter, supplying its name and the C function that implements it. In case of dynamically loadable extensions or application modules, the calls to Define_Primitive() are placed in the extension initialization functions that are called automatically as the object file is loaded. Define_Primitive() is declared as
void Define_Primitive((Object (*func)()), const char *name,
int minargs, int maxargs,
enum discipline disc);
Define_Primitive() creates a Scheme variable of the specified name in the current (i.e. the caller's) lexical environment and binds it to the newly created procedure. Each C function that implements a primitive has a return type of Object and, for a calling discipline of EVAL, zero or more arguments of type Object which are bound to the evaluated arguments passed to the Scheme primitive when it is called. The calling discipline must be one of the following:
Figure @(defprim) shows a simple example for defining a new
Scheme primitive.
#include "scheme.h"
Object p_vector_reverse(Object vec) {
Object tmp, *s, *t;
Check_Type(vec, T_Vector);
for (s = VECTOR(vec)->data, t = s+VECTOR(vec)->size; --t > s; s++)
tmp = *s, *s = *t, *t = tmp;
return vec;
}
void elk_init_vector(void) {
Define_Primitive(p_vector_reverse, "vector-reverse!", 1, 1, EVAL);
}
Figure 2: Defining a new Scheme Primitive
The primitive vector-reverse! defined by the example extension reverses the elements of a Scheme vector in place and returns its argument (note the final exclamation mark indicating the destructive operation). Check_Type() is a simple macro that compares the type field of the first argument (an Object) with the second argument and signals and error if they do not match. This macro is used primarily for type-checking the arguments to Scheme primitives. A call to the macro Check_Mutable() with the vector as an argument could have been inserted before the loop to check whether the vector is read-only and to automatically raise an error if this is the case. The example code forms a complete extension including an extension initialization function and could be linked with the interpreter, or loaded dynamically into the interpreter as follows:
% cc -c -I/usr/elk/include vec.c; makedl vec.o vec.o % scheme > (load 'vec.o) > (define v '#(hello word)) v > (vector-reverse! v) #(world hello) > v #(world hello) >
Consider the non-destructive version of the primitive
vector-reverse shown in Figure @(vecrev1), which returns a new
vector instead of altering the contents of the original vector.
Object p_vector_reverse(Object vec) {
Object ret;
int i, j;
Check_Type(vec, T_Vector);
ret = Make_Vector(VECTOR(vec)->size, False);
for (i = 0, j = VECTOR(vec)->size; --j >= 0; i++)
VECTOR(ret)->data[i] = VECTOR(vec)->data[j];
return ret;
}
Figure 3: Non-destructive Scheme primitive vector-reverse
The code in Figure @(vecrev1) is identical to that shown in Figure @(defprim), except that a new vector is allocated, filled with the contents of the original vector in reverse order, and returned as the result of the primitive. Make_Vector() is declared by Elk:
Object Make_Vector(int size, Object fill);
Although the C function may look right, there is a problem when it comes to garbage collection. To understand the problem and its solution, it may be helpful to have a brief look at how the garbage collector[note 3] works (the following description presents a simplified view; the real algorithm is more complex). In Elk, a garbage collection is triggered automatically whenever a request for heap space cannot be satisfied because the heap is full, or explicitly by calling the primitive collect from within Scheme code. The garbage collector traces all ``live'' objects starting with a known root set of pointers to reachable objects (basically the interpreter's global lexical environment and its symbol table). Following these pointers, all accessible Scheme objects are located and copied to a new heap space in memory (``forwarded''), thereby compacting the heap. Whenever an object is relocated in memory during garbage collection, the contents of the pointer field of the corresponding C Object is updated to point to the new location. After that, any constituent objects (e.g. the elements of a vector) are forwarded in the same way.
As live objects are relocated in memory, all pointers to an object need to be updated properly when that object is forwarded during garbage collection. If a pointer to a live object were not in the root set (that is, not reachable by the garbage collector), the object would either become garbage erroneously during the next garbage collection, or, if it had been reached through some other pointer, the original pointer would now point to an invalid location.[note 4] This is exactly what happens in the example shown in Figure @(vecrev1).
The call to Make_Vector() in the example triggers a garbage collection if the heap is too full to satisfy the request for heap space. As the Object pointer stored in the argument vec is invisible to the garbage collector, its pointer field cannot be updated when the vector to which it points is forwarded during the garbage collection started inside Make_Vector(). As a result, all further references to VECTOR(vec) will return an invalid address and may cause the program to crash (immediately or, worse, at a later point). The solution is simple: the primitive just needs to add vec to the set of initial pointers used by the garbage collector. This is done by inserting the line
GC_Link(vec);
Object p_vector_reverse(Object vec) {
Object ret;
int i, j;
GC_Node;
GC_Link(vec);
Check_Type(vec, T_Vector);
ret = Make_Vector(VECTOR(vec)->size, False);
for (i = 0, j = VECTOR(vec)->size; --j >= 0; i++)
VECTOR(ret)->data[i] = VECTOR(vec)->data[j];
GC_Unlink;
return ret;
}
Figure 4: Non-destructive Scheme primitive vector-reverse, corrected version
Appendix A lists the C functions which can trigger a garbage collection. Any local variable or argument of type Object must be protected in the manner shown above if one of these functions is called during its lifetime. This may sound more burdensome than it really is, because most of the ``dangerous'' functions are rarely or never used from within C/C++ extensions or applications in practice. Most primitives that require calls to GC_Link() use some function that creates a new Scheme object, such as Make_Vector() in the example above.
To simplify GC protection of more than a single argument or variable, additional macros GC_Link2(), GC_Link3(), and so on up to GC_Link7() are provided. Each of these can be called with as many arguments of type Object as is indicated by the digit (separate macros are required, because macros with a variable number of arguments cannot be defined in C). A corresponding macro GC_Node2, GC_Node3, and so on, must be placed in the declarations. Different GC_Link*() calls cannot be mixed. All local variables passed to one of the macros must have been initialized. GC protection is not required for ``pointer-less'' objects such as booleans and small integers, and for the arguments of primitives with a variable number of arguments (as described in section @(ch-varargs)). Section @(ch-gcglobal) will describe how global (external) Object variables can be added to the root set.
Here is how the implementation of the primitive cons uses GC_Link2() to protect its arguments (the car and the cdr of the new pair):
Object P_Cons(Object car, Object cdr) {
Object new_pair;
GC_Node2;
GC_Link2(car, cdr);
new_pair = allocate heap space and initialize object;
GC_Unlink;
return new_pair;
}
There are a few pitfalls to be aware of when using ``dangerous'' functions from within your C/C++ code. For example, consider this code fragment which fills a Scheme vector with the program's environment strings that are available through the null-terminated string array environ[]:
Object vec = new vector of the right size; int i; GC_Node; GC_Link(vec); for (i = 0; environ[i] != 0; i++) VECTOR(vec)->data[i] = Make_String(environ[i], strlen(environ[i]));
for (i = 0; environ[i]; i++) {
Object temp = Make_String(environ[i], strlen(environ[i]));
VECTOR(vec)->data[i] = temp;
}
Object obj; ... GC_Link(obj); ... some_function(obj, P_Cons(car, cdr));
temp = P_Cons(car, cdr); some_function(obj, temp);
Primitives with a variable number of arguments are registered with the interpreter by calling Define_Primitive() with the calling discipline VARARGS and with different values for minargs and maxargs. The special symbol MANY can be given as the maximum number of arguments to indicate that there is no upper limit on the primitive's number of actual arguments. The C/C++ function implementing a primitive with a variable number of arguments is called with two arguments: an integer count that specifies the number of actual arguments, and the Scheme arguments as an array of Objects (that is, a pointer to Object). The objects passed as the argument vector of VARARGS primitives are already registered with the garbage collector; calls to GC_Link() are not required. As an example for a primitive with an arbitrary number of arguments, here is the definition of a simplified variant of append! (which does not handle empty lists):
Object p_append_set (int argc, Object *argv); {
int i;
for (i = 0; i < argc-1; i++)
(void)P_Set_Cdr (P_Last_Pair (argv[i]), argv[i+1]);
return *argv;
}
Define_Primitive(p_append_set, "append!", 0, MANY, VARARGS);
Besides implementing primitives with an indefinite maximum number of arguments, the VARARGS discipline is frequently used for primitives with an optional argument. For example, a primitive encapsulating the UNIX open() system call, which has two fixed arguments (filename, flags) and an optional third argument (the mode for newly created files, i.e. calls with the flag O_CREAT), could be defined as follows:
Object p_unix_open(int argc, Object *argv) {
char *name = get_file_name(argv[0]);
int flags = get_flags(argv[1]);
mode_t mode;
if (flags & O_CREAT) {
if (argc < 3)
error--too few arguments
mode = get_mode(argv[2]);
...
Define_Primitive(p_unix_open, "unix-open", 2, 3, VARARGS);