The d2c Dylan-to-C compiler

make(<class>) is not implemented.
slot-initialized?(), applicable-method?() and sorted-applicable-methods() are not implemented.
class and each-subclass slot allocation are not supported.
The keyword clause to "define class" for defining initargs (e.g. keyword foo:, ...) are parsed but ignored.
The special syntax for aref, element, and singleton is supposed to look up the name in the context of the operation. Instead, we always look it up in the Dylan library.
In macro templates, unhygienic name references (e.g. ?=name) are not implemented.
In macro property list patterns, the key default must be either a literal constant or a variable reference instead of any kind of expression as the DRM claims or a basic-fragment as Moon suggests.
The ``for'' macro is supposed to evaluate the types interleaved with the init expressions, but it does not. In fact, in some cases, it will evaluate the type expression each time though the loop.
Anything involving the runtime creation of classes is not supported. This means that the expressions in all class superclass lists must obviously be compile-time constants.
Many violations of ``define sealed domain'' are not detected.

Command-line arguments

    d2c -gMndT {-tdirectory} {-Ldirectory}*
        {-Dfeature}* {-Ufeature*}
        lid-file

d2c compilation is driven by a LID (Library Interchange Description) file describing the library contents and various compilation options. It serves a similar purpose to a make file (but we use make too.) See the description of the LID file format below.

In operation, d2c reads and processes the dylan files, generating .c and .s files and a temporary .mak makefile. It then runs gmake on this makefile.

d2c recognizes these switches:

-g: Dump definitions needed to support debugging with gdb/dig.
-Ldirectory: Add directory to the library search path.
-Dfeature -Ufeature: Define or undefine a feature for #if conditional compilation. Features may also be specified in the LID file by the "Features:" option.
-M: Generate dependency info to be included in makefiles. This is included in higher level makefiles such as those generated by the Perl gen-makefile script used to compile the runtime system and compiler.
-ppathname: Used to specify the location for the platform description database file. The default is "$DYLANDIR/etc/platforms.descr".
-no-binaries: Inhibits compilation of the generated C code for cross-compilation. You can later compile by running make on the "cc-unit prefix-files.mak" or by using the Makegen created "cc_files" target.
-Ttarget: Generate code for the given target machine. Normally defaults to this platform. See the platforms.descr file for the names of the supported platforms. Often used with -no-binaries.
-d: Compiler debug mode (for debugging this compiler)

LID file format

A LID file is composed of entries of the form "keyword: value", similar to mail headers and to the Dylan file header format. Currently d2c expects the list of source files to appear as the "main body" of the LID file, after the header and a blank line. In the Harlequin LID format, there is a "Files: " entry which is used instead, and which we do not yet support.

d2c recognizes these LID entries:

Library: dylan-library-name: The Dylan name for the library that we are defining. There must be a corresponding "define library" somewhere in the source for this library.
Unit-prefix: c_legal_identifier_fragment: This prefix is used to make the C translation of names in this library unique w.r.t. any other libraries that might be used. This defaults to the library name, so only needs to be specified if the library name contains illegal C name characters (such as "-").
Unique-ID-base: decimal-integer: Unique class identifiers for classes defined in this library are assigned sequentially starting with the specified integer. This should always be specified, but you won't get a sensible error if it is missing. The base should be sufficiently far from the base for any other library so that class IDs won't overlap. You will get a compile-time error if overlap occurs. A good base for user code would be 30000.
Executable: result-file-name: Specifies that we are building a runnable application rather than a library. The executable is generated with the specified name.
Entry-Point: dylan-module:dylan-variable: When generating an executable, this LID option specifies which dylan function is called as the main entry point. You can also have no main entry point, in which case the program exits after running all of the top level forms. This entry-point function is called with two arguments, argc (an integer) and argv (a raw pointer). Note that this is incompatible with Mindy, and rather brutal as well. You can get the Mindy semantics of calling Extensions:Main by using the Extensions module in your main module and then specifying: "Entry-Point: mymodule:%main" in the LID file. The %Main function parses the arguments and then calls Main.
Linker-options: various "ld" flags: This option specifies flags which must be passed to ld when linking against this library. This is primarily used when a foreign library is called via one of the undocumented callout mechanisms. For example, Dylan.lid specifies "-lm" so that it can use the math library. This dependency is automatically propagated to users of the library.
Features: {feature | ~feature}*: The argument is a space-separated list of features or misfeatures. If the token begins with "~", then the rest of the token is interpreted as a feature to remove. Otherwise, the token is added as a feature.

Unrecognized LID entries are quietly ignored. This handles any comment-like LID entries such as "Author: ", etc. d2c also recognizes // as a comment-to-end-of-line sequence (equivalent to whitespace.)

Here is a sample LID file:

rcs-header: $Header: /afs/cs.cmu.edu/project/gwydion-9/dylan/docs/htdocs/RCS/d2c.html,v 1.3 97/06/04 19:50:59 ram Exp Locker: ram $
library: my-program
unit-prefix: myprog
unique-id-base: 30000
executable: mp
entry-point: main:%main

myprog-exports.dylan
myprog.dylan

Extensions and Libraries

d2c has been written so that most of the Dylan code can be shared between Mindy and d2c. Dylan extensions (such as conditional compilation) that are implemented by both Mindy and d2c are described here. The common libraries are documented here.

Mindy compatibility notes

In d2c the "main" entry is specified by the Entry-point: LID file option, and can be called whatever you want. This differs from Mindy where there is a standard variable "dylan:extensions:main". However you can the mindy semantics by using %main, see Entry-Point:.

The "define library" and "define module" forms for a library must be in a separate file which is the first file specified in the LID file. This file should specify "Module: Dylan-User" in its header.

In general, d2c is much more picky about all sorts of errors. It enforces type-related stuff to a much greater degree, and is even in some cases more picky about syntax.

Unlike Mindy, d2c does implement macros. In some cases this means that syntax error messages are not as good. If you can't figure out a d2c syntax error, try Mindy.

The d2c runtime does not automatically force output on *standard-output* on process exit, so you may need to add explicit calls to force-output.

d2c is missing the TK and Inspector libraries. Also, the d2c Random library is missing the more exotic functionality of the Mindy Random library, like random-gaussian() and random-exponential().

Mindy has some Dylan extensions which d2c does not implement. The most significant omission is threads.

Debugging D2C code with DIG

Dig is a wrapper for GDB which incorporates some specialized domain knowledge concerning the Dylan language and the D2C compiler. For the most part, you will seem to still be debugging the generated C code with GDB. However, some commands have been modified to allow a richer interface to Dylan objects and functions.

Dig currently only compiles on HP/Ux and Win32, and is probably not useful on the latter. Feel free to get it working on other platforms. dig is not strictly necessary in any case, but it does sugarcoat some of the naming issues.

DIG Commands

In general, every GDB command still exists within DIG (although you may not be able to abbreviate it as expected -- for example, the "interactive" command shadows the "info" command, so that you must type at least "inf"). However some commands have been modified or added to facilitate debugging of Dylan code. The commands below reflect only the added capabilities.

print

This command extends the existing print command by allowing it to print Dylan values and call Dylan functions. This actually comprises three different special capabilities:

DIG applies heuristics to translate Dylan variable names into their C equivalents. It guess right most of the time, but the results are not guaranteed. Sometimes there are several possibilities -- in this case it will ask you for clarification.
If the expression value is a Dylan object then DIG invokes "print" to provide a meaningful description of the object. (The choice of which "print" to use depends upon the setting of *warning-output*.) Because DIG calls a function within your program, the program must be running before Dylan values may be printed.
If the expression contains a Dylan function call, then DIG invokes that Dylan function. Arguments of the form "foo: bar" are translated correctly.

find

prints the translation of a Dylan variable name into its C equivalent.

break

if you specify a Dylan generic function, then DIG will set breakpoints in all of that function's methods. Let me know if this feature proves to be useful.

interactive

by default, anything typed into DIG is assumed to be a DIG command. If you need to provide input to your program, you must use this command to toggle the "interactive mode". In interactive mode, you may type data into your process. However, any type-ahead may produce strange and unpredictable results.

prompt

change DIG's command prompt. (You should not use gdb's native "set prompt" command. Bad things will happen.)

quit

does about what you'd expect, but does some extra clean-up work.

gdb

passes the following text to GDB verbatim. This allows you, for example, to use GDB's "print" command instead of DIG's.

DIG Gotchas

Because of the challenges of name translation, the "print" and "break" commands may be noticeably slower than you expect. Since translations are cached, it will get faster as you go along.
If you try to print something that looks like a Dylan object, but isn't valid, DIG will encounter a "beg fault" and have to recover. This has the annoying side-effect of changing the "current frame" to be at the bottom of the call stack, regardless of where you were before.
Like GDB, DIG has problems with optimized code. Variables may be re-used (or eliminated), function calls may be inlined, and things will generally be less predictable. In addition, some Dylan functions may disappear if you do not pass the "-g" switch to D2C.
There are probably many other gotchas, but I don't know about them. Please tell me about anything not mentioned above.

Environment

These environment variables are used by D2C:

DYLANDIR: The root of the installed Gwydion tree. In the default configuration, this defaults to "/usr/local" on Unix and "c:\dylan" on win32. This variable in turn establishes the defaults for DYLANPATH and the "platforms.descr" file.
DYLANPATH: The search path for dylan libraries. Directories in the DYLANPATH are searched after any directories specified by explicit -L options. If not set, this defaults to ".:$DYLANDIR/lib/dylan" (".;%DYLANDIR%\lib\dylan" on Win32). If set, the value must include the directory where the "Dylan" library is to be found.
PATH: d2c expects to find make and the C compiler in PATH. On Unix we use the gnu tools gmake, gcc, and ldb. Other compilers can work, but at a minimum this requires a new platform description in "$DYLANDIR/etc/platforms.descr". You must also have some of the GNU-win32 tools to run d2c on Windows, though make and Visual C++ are normally used for compilation. To build d2c, you also need perl and the various scripts in the tools/" directory.
The gnu assembler must be used in conjunction with the generated code from gcc. If you somehow end up running the HP/UX "as" with gcc, it will produce many errors about STAB entries, etc.
CCFLAGS: This variable holds the flags passed to the C compiler. The default is platform specific, but always includes "-I$DYLANDIR/include". If you do set this variable, you must also specify the Dylan system include directory. The default optimization flags for gcc are " -g -O4 -finline-functions". You can roughly halve the size of the executable by omitting the -g, but at the cost of debuggability. Leaving out the other optimize flags will speed compilation at the cost of runtime speed.

Function Representations

In d2c, there are two distinct things that may be thought of as "the function". The first is the actual C code d2c generates for the function. The second is an actual Dylan object (a "function object"), which is a general instance of <function>. Function objects are not created if the compiler can prove it isn't necessary (which is usually the case for functions that aren't exported from a library), where "necessary" means that the function might be stored into a variable, passed to another function, or otherwise used as a first-class value.

The actual C code comes in three pieces, or entries, with each entry being a separate C function. At a call site, the compiler can either know exactly what function is being called or it might not have a clue (e.g. inside map where it calls the passed in function). So to keep from having to pay the penalty of runtime checking everything all the time instead of just when necessary, we generate multiple entry points for each function.

The main entry is the entry that is used when the compiler can determine that everything is fine. It doesn't have to check any argument types or figure out what values correspond to what keywords.

The general entry is the entry that is used in a random call where the compiler can't tell anything. It checks the argument types, decodes the keywords, and then calls the main entry.

The generic entry is like the general entry, except that it is only used when the method was invoked via some generic function. The generic function dispatch stuff already guarantees that they argument types are okay, so it only has to decode the keywords before calling the main entry.

Note:: the various entries are just different pieces of C code. There is no dylan object that correspond to them. If the compiler can prove a given entry won't be used, d2c will omit that entry.

Naming

d2c generates C code, and thus must come up with a unique, legal C identifier for each thing that is to be referenced. (We say "thing" because it isn't necessarily a Dylan object.) d2c starts by computing a unit prefix ("unit" being synonymous with "library"). The unit prefix can be user specified; if not specified, it defaults to the library name in all lowercase.

Character Set Translation

Since Dylan allows characters in identifiers that C does not, we must translate these punctuation characters into alphanumeric sequences. Because Dylan is case-insensitive we also fold all alphabetic characters to lowercase. This frees the uppercase characters to be used to represent the extra characters. We translate characters which aren't legal in C as following:

' ' => "BLANK"
'!' => "D"
'%' => "PCT"
'$' => "C"
'&' => "AND"
'*' => "V"
'+' => "PLUS"
'-' => "_"
'/' => "SLASH"
'<' => "LESS"
'=' => "EQUAL"
'>' => "GREATER"
'?' => "QUERY"
'^' => "RAISE"
'_' => "X_"
'|' => "OR"
'~' => "NOT"
otherwise => "Xhex code"

As a special case to deal with the Dylan <class> naming convention, the brackets are stripped off of the variable name, and CLS_ is prefixed to the name. So <list> becomes CLS_list instead of LESSlistGREATER.

Basic Name Translation

A basic name is a module binding (like "define module" or "define constant"). The C name is formed by concatenating the unit prefix, module name and variable name, separated by uppercase Z's:

unix prefixZmodule nameZbasic name

Derived Name Translation

d2c needs to create C names for many global definitions which are related to some Dylan variable but which are not the actual Dylan value of that variable. These derived names are created by adding suffixes to the basic name:

GF name_METH: Some method on the base name.
GF name_DISCRIM: Discriminator for a generic function.
method name_GENERAL: Default entry for a method.
method name_GENERIC: Method entry used by GF dispatch.
method name_MAIN: The actual body of a method.
method name_INT_local method name: Local method inside named method.
method name_INT_method: Some method form inside the named method.
slot name_DEFER: Deferred evaluation of a slot type.
slot name_INIT: Slot init function.
slot name_SETTER: Method used to implement setting a slot.
slot name_GETTER: Method used to implement setting a slot
var name_TYPE: Holds the type of a variable when the type isn't constant.
var name_VAR: The actual value of a Dylan variable.
class name_MAKER: Internal constructor for a class.
LINE_542: Some function resulting from compiling line 542.
UNKNOWN: As above, except we don't where it came from.
some name_542: The 542'nd distinct instance of name. _VAR names are guaranteed never to have this uniquifier suffix.

As you might infer from the preceding, some suffixes can be combined, but except for _INT_ not to an arbitrary depth. Some examples:

	dylanZdylan_visceraZCLS_type    /* <type> */

        /* general-entry for maker for <type-error> */
	dylanZdylan_visceraZCLS_type_error_MAKER_GENERAL

	/* signal{<condition>} internal search */
	dylanZdylan_visceraZsignal_METH_INT_search_MAIN

	dylanZdylan_visceraZVdebuggerV_VAR     /* *debugger* */

For local variables, we simply add "L_" to the front of the name. This may result in a non-unique name. In which case, a uniquifier is appended to the end; see the "some name_542" rule above.

Object representations

In general, d2c picks the most specific representation that it can be sure will work. For instance, if d2c is sure that a given object is an <integer>, then it will use the C type "long" to represent the object. If, however, d2c only knows for sure that the object is an <object>, d2c will use the descriptor_t representation, even if it later turns out the <object> is in fact an <integer>.

known dylan type c type

<integer> long

<single-float> float

<double-float> double

<extended-float> long double

<raw-pointer> void *

no data word heapptr_t

<object> descriptor_t

known dylan type	c type
<integer>	long
<single-float>	float
<double-float>	double
<extended-float>	long double
<raw-pointer>	void *
no data word	heapptr_t
<object>	descriptor_t

Note:: There isn't a dylan type that can describe the set of objects that use the heapptr_t representation. See above.
Note 2:: If a functional (see below) class has exactly one data slot that can be magically represented, it is also magically represented in the same way. <character> falls into this category.

An immediate representation is one where the actual data is directly there. As opposed to a pointer representation where the actual data lives in the heap and is referenced via a pointer.

The general representation is the fully general representation that can be used to represent any Dylan object. It consists of a heap pointer and a data word. (i.e. descriptor_t) The heap pointer representation is used to represent anything that doesn't need the data word. (i.e. heapptr_t)

"Boxed" and "unboxed" are somewhat vague terms that one will often hear on the 'net. Unboxed data is the raw data, the good stuff, with no overhead. The drawback is that if you don't know the type of the raw data (is it an integer, a character, or a float?), it's just a bunch of bits. Boxing means to add meta-data (the type of the data) so that the data can be interpreted unambiguously. The d2c immediate representation is an unboxed representation, while the d2c general representation is a boxed representation. Depending on the situation, the heap pointer representation might be considered either boxed or unboxed.

Special adjectives

inline: Methods can be declared inline. If a method is inline, then the body of the method is duplicated at all valid call sites. This allows optimization of the called code based on the calling context.
movable: Methods and generic functions can be declared movable. A movable function is one that doesn't depend on when it happens. In other words, in can't depend on any global state, just the arguments.
Plus is an example of a movable function: 2 + 2 is always 4 no matter when. movable implies flushable; see below.
flushable: Methods and generic functions can be declared flushable. This means that the function may depend on global state, but cannot change any global state. Extracting the value of a slot is a flushable operation (assuming the slot is guaranteed to be initialized). If the result of a flushable function isn't used, the call can be dropped.
functional: Classes can be declared functional. The slots of functional classes have to be constant (and in fact, default to constant). Furthermore, equality (==) is defined in terms of the slot values, not the pointer identity of the heap representation of the object. Actually, currently you have to define a functional-== method that checks to see if two instances of a functional class are the same yourself. So you could intentionally get it wrong, and then strange things would happen. But the idea is that the instances will be == iff the object-class for them both is the same and all slots are ==. Functional classes may have subclasses, but be sure to define functional-== methods accordingly.

Optimizations

d2c performs the following optimizations:

common sub-expression elimination (CSE), optimistic type inferencing, compile time method selection, inlining, code motion.

Generated Files

*.c: d2c generates a .c file for each .dylan file it processes.
unitprefix-inits.c: contains code for performing various initializations for this particular library. This includes executing any top-level expressions contained in the library.
unitprefix-heap.s: contains the initial heap image for this particular library.
cc-unitprefix-files.mak: A makefile which contains rules for compiling all the .c and .s files, and linking either an executable or a library, depending on which the LID file specifies.
library.lib.du: (Only generated when not building an executable) Contains various information about the library that d2c needs to remember.
inits.c: (Only generated when building an executable) It invokes each library's initialization routines, then calls the entry-point function (if any).
heap.s: (Only generated when building an executable) It contains initial heap information that is of a global nature. For instance, because all symbols with the same value are ==, the literal #"foo" in the String-extensions library is == to the literal #"foo" in the Streams library, and so cannot go in either library's unitprefix-heap.s. Currently symbols are the only object with this property.

Compile Time Constants

In Dylan, the type in a type declaration is an expression just like any other. This means that in general, the compiler can't tell what a type declaration means without running the program. That won't work, so what d2c actually does is recognize simple type expressions and then exploit the obtained type information for optimization and type inference.

If the type expression isn't an obvious compile-time constant, then d2c gains no useful information from it, but is required to go to extra work to implement a run-time check. For this reason, your code will compile much better if you only use type expressions which d2c can recognize as being constant. The notion of compile time constant lies at the heart of many of d2c's optimizations.

To start with, some terminology:

ctype: The compile-time representation of a Dylan type. If a type is not constant, it is represented as an <unknown-ctype>, which frustrates any attempt at further type inference.
ct-value: A compile-time value. The phrase "compile time constant" is synonymous. <ct-value> is not a subclass of <ctype>, nor the other way around. However, there are many classes which are subtypes of both <ctype> and <ct-value>.
EQL-value: A value is an eql-value if members of its class can be compared with ==. <class>es, <integer>s, <character>s, and <symbol>s are eql-values; in d2c non-class <type>s are not.
eql-ct-value: A ct-value which is also an eql-value.

If we can figure out a ct-value equivalent to an expression parse, we say that expression is ct-evaluable. An expression is ct-evaluable if any of the following holds:

The expression is a literal.
The expression is a body (e.g. the guts of a begin/end), and each expression in the body is ct-evaluable. (A body whose last component expression is ct-evaluable, but which also has non-ct-evaluable expressions in it, might have side effects. Thus, the body can't be ct-evaluable because that might suppress the side effects.)
The expression is a reference to a binding, and that binding is to a module binding rather than a local binding, and the definition it is bound to is ct-evaluable (see below).
The expression is a function call, and the arguments to the function are all ct-evaluable, and the function itself is a direct reference to a module binding which is a function (ie, we know which function to invoke), and the function has built-in support, and the function call meets the conditions of that specific function (see below).

Some definitions are ct-evaluable, and some are not:

A "define variable" definition is never ct-evaluable, because it is variable.
A "define constant" is ct-evaluable if the type constraint is ct-evaluable and the initial value is ct-evaluable.
A "define generic" is ct-evaluable if all specializers, all result types, and the type constraints on #rest types are ct-evaluable.
A "define method" doesn't usually define a module binding; define generics do. However, if the define method introduces an implicit generic, that implicit generic is ct-evaluable.
"define function" in d2c is equivalent to "define method".
"define class" is ct-evaluable if all its superclasses are ct-evaluable. (Note that d2c currently will puke if a class has a non-ct-evaluable superclass, so effectively all classes in d2c are ct-evaluable.)
"define macro", "define module", and "define library" don't produce module bindings.

For function calls to most functions with built-in support, simply knowing that the arguments and the function are ct-evaluable is enough to make the function call ct-evaluable. These functions include:

type-union, false-or, subclass, direct-instance, negative, abs, \+, \-, \*, ash, \^, logior, logxor, logand, lognot

limited(<integer>) and limited(<collection>) are also ct-evaluable if their arguments are ct-evaluable.

However, two functions are different. In addition to requiring that the arguments and the function are ct-evaluable, they impose additional constraints:

singleton(obj) is ct-evaluable only if obj is an eql-ct-value.

one-of(obj1, obj2, ...) is ct-evaluable only if every arg is an eql-ct-value.

The most common way to construct a type which is not ct-evaluable is to create a singleton of a non-class type. For instance, although

        type-union(<foo>, <bar>)

is ct-evaluable,

        singleton(type-union(<foo>, <bar>))

is not.

[ Gwydion home page | mail to gwydion-group ]