9.  Defining New Scheme Primitives  

      In Elk, there exists a one-to-one relationship between Scheme primitives and C functions: each Scheme primitive--whether predefined or user-defined--is implemented by a corresponding C function. This includes special forms, which are treated as a special kind of primitives in Elk. Extensions and applications use the function Define_Primitive() to register a new Scheme primitive with the interpreter, supplying its name and the C function that implements it. In case of dynamically loadable extensions or application modules, the calls to Define_Primitive() are placed in the extension initialization functions that are called automatically as the object file is loaded. Define_Primitive() is declared as

void Define_Primitive((Object (*func)()), const char *name,
                      int minargs, int maxargs,
                      enum discipline disc);
The arguments are:
func
a pointer to the C function implementing the new primitive;
name
the name of the primitive as a null-terminated C string;
minargs
the minimum number of arguments accepted by the primitive;
maxargs
the maximum number of arguments (identical to minargs in most cases);
disc
the calling discipline (usually EVAL).

      Define_Primitive() creates a Scheme variable of the specified name in the current (i.e. the caller's) lexical environment and binds it to the newly created procedure. Each C function that implements a primitive has a return type of Object and, for a calling discipline of EVAL, zero or more arguments of type Object which are bound to the evaluated arguments passed to the Scheme primitive when it is called. The calling discipline must be one of the following:

EVAL
The primitive expects a fixed number of arguments; minargs and maxargs must be identical[note 2] .
VARARGS
The primitive has a variable number of arguments, and the underlying C function is called with an argument count and an array of arguments. Defining primitives with a variable number of arguments will explained in more detail in section @(ch-varargs).
NOEVAL
The arguments are passed as a Scheme list of unevaluated objects--a single argument of the type Object. Primitives using this discipline will then use Eval() as described in section @(ch-funcall) to evaluate some or all of the arguments. NOEVAL is only rarely used (with the exception of the built-in special forms of Elk); extensions and applications mostly use macros as a more convenient way to defined new syntactical forms.

Figure @(defprim) shows a simple example for defining a new Scheme primitive.


#include "scheme.h"

Object p_vector_reverse(Object vec) {
	Object tmp, *s, *t;

	Check_Type(vec, T_Vector);
	for (s = VECTOR(vec)->data, t = s+VECTOR(vec)->size; --t > s; s++)
		tmp = *s, *s = *t, *t = tmp;
	return vec;
}

void elk_init_vector(void) {
	Define_Primitive(p_vector_reverse, "vector-reverse!", 1, 1, EVAL);
}
Figure 2: Defining a new Scheme Primitive


      The primitive vector-reverse! defined by the example extension reverses the elements of a Scheme vector in place and returns its argument (note the final exclamation mark indicating the destructive operation). Check_Type() is a simple macro that compares the type field of the first argument (an Object) with the second argument and signals and error if they do not match. This macro is used primarily for type-checking the arguments to Scheme primitives. A call to the macro Check_Mutable() with the vector as an argument could have been inserted before the loop to check whether the vector is read-only and to automatically raise an error if this is the case. The example code forms a complete extension including an extension initialization function and could be linked with the interpreter, or loaded dynamically into the interpreter as follows:

% cc -c -I/usr/elk/include vec.c; makedl vec.o vec.o
% scheme
> (load 'vec.o)
> (define v '#(hello word))
v
> (vector-reverse! v)
#(world hello)
> v
#(world hello)
>

9.1.  Making Objects Known to the Garbage Collector  

      Consider the non-destructive version of the primitive vector-reverse shown in Figure @(vecrev1), which returns a new vector instead of altering the contents of the original vector.


Object p_vector_reverse(Object vec) {
	Object ret;
	int i, j;

	Check_Type(vec, T_Vector);
	ret = Make_Vector(VECTOR(vec)->size, False);
	for (i = 0, j = VECTOR(vec)->size; --j >= 0; i++)
		VECTOR(ret)->data[i] = VECTOR(vec)->data[j];
	return ret;
}
Figure 3: Non-destructive Scheme primitive vector-reverse


      The code in Figure @(vecrev1) is identical to that shown in Figure @(defprim), except that a new vector is allocated, filled with the contents of the original vector in reverse order, and returned as the result of the primitive. Make_Vector() is declared by Elk:

Object Make_Vector(int size, Object fill);
size is the length of the vector, and all elements are initialized to the Scheme object fill. In the example, the predefined global variable False is used as the fill object; it holds the boolean Scheme constant #f (any Object could have been used here).

      Although the C function may look right, there is a problem when it comes to garbage collection. To understand the problem and its solution, it may be helpful to have a brief look at how the garbage collector[note 3] works (the following description presents a simplified view; the real algorithm is more complex). In Elk, a garbage collection is triggered automatically whenever a request for heap space cannot be satisfied because the heap is full, or explicitly by calling the primitive collect from within Scheme code. The garbage collector traces all ``live'' objects starting with a known root set of pointers to reachable objects (basically the interpreter's global lexical environment and its symbol table). Following these pointers, all accessible Scheme objects are located and copied to a new heap space in memory (``forwarded''), thereby compacting the heap. Whenever an object is relocated in memory during garbage collection, the contents of the pointer field of the corresponding C Object is updated to point to the new location. After that, any constituent objects (e.g. the elements of a vector) are forwarded in the same way.

      As live objects are relocated in memory, all pointers to an object need to be updated properly when that object is forwarded during garbage collection. If a pointer to a live object were not in the root set (that is, not reachable by the garbage collector), the object would either become garbage erroneously during the next garbage collection, or, if it had been reached through some other pointer, the original pointer would now point to an invalid location.[note 4] This is exactly what happens in the example shown in Figure @(vecrev1).

      The call to Make_Vector() in the example triggers a garbage collection if the heap is too full to satisfy the request for heap space. As the Object pointer stored in the argument vec is invisible to the garbage collector, its pointer field cannot be updated when the vector to which it points is forwarded during the garbage collection started inside Make_Vector(). As a result, all further references to VECTOR(vec) will return an invalid address and may cause the program to crash (immediately or, worse, at a later point). The solution is simple: the primitive just needs to add vec to the set of initial pointers used by the garbage collector. This is done by inserting the line

GC_Link(vec);
at the beginning of the function before the call to Make_Vector(). GC_Link() is a macro. Another macro, GC_Unlink, must be called later (e.g. at the end of the function) without an argument list to remove the object from the root set again. In addition, a call to GC_Node (again without an argument list) must be placed in the declarations at the beginning of the enclosing function or block. Figure @(vecrev2) shows the revised, correct code.


Object p_vector_reverse(Object vec) {
	Object ret;
	int i, j;
	GC_Node;

	GC_Link(vec);
	Check_Type(vec, T_Vector);
	ret = Make_Vector(VECTOR(vec)->size, False);
	for (i = 0, j = VECTOR(vec)->size; --j >= 0; i++)
		VECTOR(ret)->data[i] = VECTOR(vec)->data[j];
	GC_Unlink;
	return ret;
}
Figure 4: Non-destructive Scheme primitive vector-reverse, corrected version


      Appendix A lists the C functions which can trigger a garbage collection. Any local variable or argument of type Object must be protected in the manner shown above if one of these functions is called during its lifetime. This may sound more burdensome than it really is, because most of the ``dangerous'' functions are rarely or never used from within C/C++ extensions or applications in practice. Most primitives that require calls to GC_Link() use some function that creates a new Scheme object, such as Make_Vector() in the example above.

      To simplify GC protection of more than a single argument or variable, additional macros GC_Link2(), GC_Link3(), and so on up to GC_Link7() are provided. Each of these can be called with as many arguments of type Object as is indicated by the digit (separate macros are required, because macros with a variable number of arguments cannot be defined in C). A corresponding macro GC_Node2, GC_Node3, and so on, must be placed in the declarations. Different GC_Link*() calls cannot be mixed. All local variables passed to one of the macros must have been initialized. GC protection is not required for ``pointer-less'' objects such as booleans and small integers, and for the arguments of primitives with a variable number of arguments (as described in section @(ch-varargs)). Section @(ch-gcglobal) will describe how global (external) Object variables can be added to the root set.

      Here is how the implementation of the primitive cons uses GC_Link2() to protect its arguments (the car and the cdr of the new pair):

Object P_Cons(Object car, Object cdr) {
	Object new_pair;
	GC_Node2;

	GC_Link2(car, cdr);
	new_pair = allocate heap space and initialize object;
	GC_Unlink;
	return new_pair;
}

      There are a few pitfalls to be aware of when using ``dangerous'' functions from within your C/C++ code. For example, consider this code fragment which fills a Scheme vector with the program's environment strings that are available through the null-terminated string array environ[]:

Object vec = new vector of the right size;
int i;
GC_Node;

GC_Link(vec);
for (i = 0; environ[i] != 0; i++)
	VECTOR(vec)->data[i] = Make_String(environ[i], strlen(environ[i]));
(Make_String() creates and initializes a new Scheme string.) The body of the for-loop contains a subtle bug: depending on the compiler used, the left hand side of the assignment (the expression involving vec) may be evaluated before Make_String() is invoked. As a result, a copy of the contents of vec might be, for instance, stored in a register before a garbage collection is triggered while evaluating the right hand side of the assignment. The garbage collector would then move the vector object in memory, updating the--properly GC-protected--variable vec, but not the temporary copy in the register, which is now a dangling reference. To avoid this, the loop must be modified along these lines:
for (i = 0; environ[i]; i++) {
	Object temp = Make_String(environ[i], strlen(environ[i]));
	VECTOR(vec)->data[i] = temp;
}
A related pitfall to watch out for is exemplified by this code fragment:
Object obj;
...
GC_Link(obj);
...
some_function(obj, P_Cons(car, cdr));
Here, the call to P_Cons()--just like Make_String() above--can trigger a garbage collection. Depending on the C compiler, the properly GC-protected object pointer obj may be pushed on the argument stack before P_Cons() is invoked, as the order in which function arguments--just like the operands of the assignment operator--are evaluated is undefined in the C language. In this case, if a garbage collection takes place and the heap object to which obj points is moved, obj will be updated properly, but the copy on the stack will not. Again, the problem can be avoided easily by assigning the result of the nested function call to a temporary Object variable and use this variable in the enclosing function call:
temp = P_Cons(car, cdr);
some_function(obj, temp);

9.2.  Primitives with Variable-Length Argument Lists  

      Primitives with a variable number of arguments are registered with the interpreter by calling Define_Primitive() with the calling discipline VARARGS and with different values for minargs and maxargs. The special symbol MANY can be given as the maximum number of arguments to indicate that there is no upper limit on the primitive's number of actual arguments. The C/C++ function implementing a primitive with a variable number of arguments is called with two arguments: an integer count that specifies the number of actual arguments, and the Scheme arguments as an array of Objects (that is, a pointer to Object). The objects passed as the argument vector of VARARGS primitives are already registered with the garbage collector; calls to GC_Link() are not required. As an example for a primitive with an arbitrary number of arguments, here is the definition of a simplified variant of append! (which does not handle empty lists):

Object p_append_set (int argc, Object *argv); {
	int i;

	for (i = 0; i < argc-1; i++)
		(void)P_Set_Cdr (P_Last_Pair (argv[i]), argv[i+1]);
	return *argv;
}
The corresponding call to Define_Primitive() would read:
Define_Primitive(p_append_set, "append!", 0, MANY, VARARGS);

      Besides implementing primitives with an indefinite maximum number of arguments, the VARARGS discipline is frequently used for primitives with an optional argument. For example, a primitive encapsulating the UNIX open() system call, which has two fixed arguments (filename, flags) and an optional third argument (the mode for newly created files, i.e. calls with the flag O_CREAT), could be defined as follows:

Object p_unix_open(int argc, Object *argv) {
	char *name = get_file_name(argv[0]);
	int flags = get_flags(argv[1]);
	mode_t mode;

	if (flags & O_CREAT) {
		if (argc < 3)
			error--too few arguments
		mode = get_mode(argv[2]);
		...
The call to Define_Primitive() could then be written as:
Define_Primitive(p_unix_open, "unix-open", 2, 3, VARARGS);


Markup created by unroff 1.0,    September 24, 1996,    net@informatik.uni-bremen.de