Melange Interface Generator

Introduction
A Concrete Example
Basic Use
- 3.1. Loading and Finding Objects
Importing Header Files
Specifying Object Names
- 5.1. Mapping functions
- 5.2. Prefixes
- 5.3. Explicit Renaming
- 5.4. Anonymous types
Type Definitions
- 6.1. Implicit class definitions
- 6.2. Specifying class inheritance
Translating Object Representations
- 7.1. Specifying low level transformations
- 7.2. Specifying high level transformations
Other File Options
Function Clauses
Struct and Union Clauses
Pointer Clauses
Constant Clauses
Variable Clauses
Appendix I --
Low level support facilities
- I.i. Predefined types
- I.ii. Locating native C objects
- I.iii. Pointer manipulation operations
Appendix II --
Static linking mechanisms
Appendix III --
Differences from Creole
Appendix IV --
Known limitations
Appendix V --
Proposed modifications
- V.i. Enumeration clauses
- V.ii. Inheritance of "map" and "equate" options
- V.iii. Remerging of the "equate:" and "map:" options
The Melange Interface Generator

Robert Stockton
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 15213
04 June 1997
Abstract
The Melange interface generator provides a mechanism for providing access to native C code. It is modeled upon Apple Computer’s Creole, and shares Creole’s goals of automatically providing full support for a foreign interface based upon existing interface descriptions. It also, like Creole, provides mechanisms for explicitly adapting these interfaces to provide a greater match between C and Dylan data models.
Melange, however, differs from Creole in a number of significant ways. This document, therefore, provides a gentle introduction to Melange without attempting to build upon any existing descriptions of Creole.
1. Introduction

Melange is an automatic interface generator which provides transparent access to both functions and data defined or generated by existing C libraries. It allows users to import "interfaces" [In fact, a C header file may contain arbitrary C code which Melange is unprepared to handle. By convention, however, ".h" files con tain only "interface declarations" -- type declarations, function prototypes, global variable declarations, and "preprocessor constants". Since Melange can meaningfully process all of these, it is capable of handling the vast majority of header files which will be encoun tered in practice.] from existing C header files, controlled by the contents of a "define interface" top-level form which may be included in the same file as arbitrary Dylan code. The user may use the functions and data specified by this interface as if they were native Dylan objects, and may export them to other modules.
Melange provides reasonable interpretations for the various sorts of C declarations which may appear in a header file, as well as mechanisms for explicitly modifying the default interpretations when necessary. For example, users may:
specify rules for the translation of foreign names
explicitly specify new names for specific objects or routines
specify parameter passing conventions or mutability of foreign objects
specify mappings or equivalences between "foreign" data and native equivalents
choose to import only a subset of the declarations in the header file
All of these customizations, as well as the name of the C header file, are specified by a "define interface" clause. See the next section for an example.
The basic model for interface importation is based upon that used within Apple Computer’s "Creole" interface generator. There are, however, significant differences in some of the details. (For instance, the "equate", "map", and "object-file" directives used in the above example are unique to Melange. Likewise, Creole’s "type" directive would not be accepted by Melange) You should, therefore, not expect Creole interface declarations to work within Melange without some modification.
2. A Concrete Example

In order to get a feel for using Melange, it is probably best to start with a concrete example. This section contains a complete program which will use native C libraries to list the contents of some directories.For now, you should simply skim this example to get a general overview of Melange’s capabilities. These will be described in more detail in later sections.
We will first begin with an "interface file" which contains a mixture of basic Dylan code and "define interface" forms which will be processed by Melange. We will name this file "dirent.intr".

module: Junk
synopsis: A poor imitation of "ls"
define library junk
  use dylan;
  use streams;
end library junk;
define module junk
  use dylan;
  use extensions;
  use extern;
  use streams;
  use standard-io;
end module junk;
define interface
  // This clause is more complex than it needs to be, but it does
  // demonstrate a lot of Melange’s features.
  #include "/usr/include/sys/dirent.h",
    mindy-include-file: "dirent.inc",
    equate: {"char /* Any C declaration is legal */ *" => <c-string>},
    map: {"char *" => <byte-string>},
    // The two functions require callbacks, which we don’t support.
    exclude: {"scandir", "alphasort", "struct _dirdesc"},
    seal-functions: open,
    read-only: #t,
    name-mapper: minimal-name-mapping;
  function "opendir", map-argument: {#x1 => <string>};
  function "telldir" => tell, map-result: <integer>;
  struct "struct dirent",
    prefix: "dt-",
    exclude: {"d_namlen", "d_reclen"};
end interface;
define method main (program, #rest args)
  for (arg in args)
    let dir = opendir(arg);
    for (entry = readdir(dir) then readdir(dir),
         until entry = null-pointer)
      write-line(entry.dt-d-name, *standard-output*);
    end for;
    closedir(dir);
  end for;
end method main;



We will then process this file through Melange to produce a file of pure Dylan code. If Melange is contained in the file "melange.dbc", we would use the following command line:

mindy -f melange.dbc dirent.intr dirent.dylan



This command will process "melange.intr" and write a file named "dirent.dylan". In this case, it will also silently write a file named "dirent.inc", whose use will be explained later.



You can compile "dirent.dylan" normally, via mindycomp, but in order to execute it, you must make sure that the Mindy interpreter will be able to load the appropriate routines from the library containing the "dirent" routines. Ideally, you would simply let Mindy load the appropriate code dynamically. However, this is presently only available for a few machines. Therefore, we will follow a messier approach and build a new version of the interpreter which is aware of the desired functions.



Move to the build directory for the Mindy interpreter, and edit the Makefile so the "EXTERN-INCLUDES" line mentions "your_dir_path/dirent.inc" and then run "make mindy". In this case, this is all that is required to build a new interpreter which is aware of the dirent routines.



You can now put it all together, invoking the new interpreter on the compiled program, with:

mindy -f dirent.dbc .



This should print a list of all files in the current directory.



Because of the difficulty of relinking the interpreter for each new library, it is expected that administrators will build a set of "standard" library interfaces which are prelinked into the interpreter and exported as general Dylan library interfaces. In the future, as Melange (and the Gwydion environment) are extended to support better linking and loading capabilities, it should become easier to incorporate C libraries on an "as-needed" basis.




3. Basic Use



Although the "define interface" form provides a fairly rich sublanguage for specifying interfaces, it is often sufficient to use just the "minimal" form. For example, if "gc.h" contained the following code:


typedef char bool;
typedef struct obj obj_t;
typedef char *str;
extern obj_t alloc(obj_t class, int bytes);
extern void scavenge(obj_t *addr);
extern obj_t transport(obj_t obj, int bytes);
extern void shrink(obj_t obj, int bytes);
extern void collect_garbage(void);
extern bool TimeToGC;
#define ForwardingMarker ((obj_t)(0xDEADBEEF))



then you could import it by creating a file named "class.intr" which includes arbitrary Dylan code and the following: 


define interface
   #include "gc.h";
end interface;



You would then run "melange class.intr" [Or possibly " mindy -f melange.dbc class.intr ", depending upon the installation on your particular machine.] , which would produce a file of Dylan code which contains approriate definitions for the classes "<bool>", "<obj>", "<obj_t>", and "<str>"; the variable "TimeToGC"; and the functions "alloc", "scavenge", "transport", "shrink", and "collect_garbage". (The constant "ForwardingMarker" will be excluded because it is not a simple literal.)



If you were running under the HPUX operating system and the named functions were already linked into Mindy, then this might be all that you would need. After compiling the resulting Dylan file, you could call the functions as if they had been written in native Dylan code, and you can access the "TimeToGC" variable by calling the function with that name. For example you might run:


if (TimeToGC() ~= 0)
   collect_garbage();
end if;



This code fragment points out some of the hazards of "simple" imports. Melange has no way of knowing that "bool" should correspond to Mindy’s <boolean> class, so you are stuck with a simple integer. Likewise, the system wouldn’t be able to guess that "char *" shoudl correspond to the Mindy class "<c-string>". We will explain in later sections how "map:" or "equate:" options may be used to provide this information to Melange.


3.1. Loading and Finding Objects



As mentioned above, the include directive in the previous example will only work for files which have been previously linked into Mindy. There are extra facilities available to handle other situations.



If your machine is one for which we support dynamic loading [Currently support is primarily for HPUX machines, but some work has been done on Macintoshes and ELF systems. Contact us for more details.] , and you wish to load some declared objects from a shared library, you can add one or more "object-file:" options to the "#include" clause, as in the following:


define interface
   #include "gc.h",
      object-file: "/usr/lib/mindy/gc.sl";
end interface;



This would cause the code from "gc.sl" to be loaded into Mindy at run-time and make its functions and objects available just as they were in the previous example.



If you are running on a non-HPUX machine, you will have to statically link Mindy with the appropriate library and a list of mappings from names to addresses. This can be accomplished most easily by following these steps:

1. Add a "mindy-include-file:" option to your interface definition. This specifies the name of an "interface description file" which will be written by Melange, and which can later be linked into Mindy along with the appropriate library. 
2. Run Melange on the source file in the normal manner. You may wish to move the newly created interface description file into your Mindy build directory.
3. Change the Makefile in the Mindy build directory, by adding the imported library to LIBS and the interface description file to EXTERN-INCLUDES. 
4. Run "make" to rebuild Mindy with the new library information.
5. Compile and run the generated Dylan code as normal.



A typical interface definition for this approach might be:


define interface
   #include "gc.h",
      mindy-include-file: "/usr/local/mindy-build/gc.inc"
end interface;



4. Importing Header Files



You import C definitions into Dylan by specifying one or more header files in an "#include" clause. This may take one of two different forms:


define interface
   #include "file1.h";
end interface;



or


define interaface
   #include {"file1.h", "file2.h"};
end interface;



Melange will parse all of the named files in the specified order, and produce Dylan equivalents for (i.e. "import") some fraction of declarations in these files. By default, Melange will import all of the declarations from the named files, and any declarations in recursively included files (i.e. those specified via "#include" directives in the ".h" file) which are referenced by imported definitions. It will not, however, import every declaration in recursively included files. This insures that you will see a complete, usable, set of declarations without having to closely control the importation process. If you wish to exert more control over the set of objects to be imported, you may do so via the "import", "exclude", and "exclude-file" options..



If you only need a small set of definitions from a set of imported files, you can explicitly specify the complete list of declarations to be imported by using the "import:" option. You could, for example, say:


define interface
   #include "gc.h",
      import: {"scavenge", "transport" => move};
end interface;



This would result in Dylan definitions for "scavenge", "move", "<obj_t>", and "<obj>". The latter types would be dragged in because they are referenced by the two imported functions. Again, if you equated "obj_t" to <object> then neither of the types would be imported. The second import in the above example performs a renaming at the same time as it specifies the object to be imported. Other forms specify global behaviors. "Import: all" willcause Melange to import every "top level" definition which is not explicitly excluded. "Import: all-recursive" causes it to import definitions from recursively included files as well. "Import: none" restricts importation to those declarations which are explicitly imported or are used by an imported declaration.



You may also use the "import:" option to specify importation behavior on a per-file basis. The options 


import: "file.h" => {"import1", ...}
import: "file.h" => all
import: "file.h" => none



work like the options described above, except that they only apply to the symbols in a single imported file.



The "exclude:" and "exclude-file:" options may be used to keep one or more unwanted definitions from being imported. For example, you could use:


define interface
   #include "gc.h",
      exclude: {"scavenge", "transport"},
      exclude-file: "gc1.h";
end interface;



This would prevent the two named functions and everything in the named file from being imported, while still including all of the other definitions from "gc.h". Note that these options should be used with care, as they can easily result in "incomplete" interfaces in which some declarations refer to types which are not defined. This could result in errors in the generated Dylan code. (The "import: file => none" option described above is a safer way of achieving an effect similar to "exclude-file:"



You may also prevent some type declarations from being imported by using the "equate:" option (described in a later section). If, for example, you equated "obj_t" to <object>, then Melange would ignore the definition for "obj_t" and simply assume that the existing definition for <object> was sufficient.



You may have any number of "import:", "exclude:", and "exclude-file:" options, and may name the same declarations in multiple clauses. "Exclude:" options take priority over "import:"s. If no "import:" options are specified, the system will import all non-excluded symbols, just as if you had said "import: all".




5. Specifying Object Names



Because naming conventions differ between C and Dylan, Melange attempts to translate the names specified in C declarations into a form more appropriate to Dylan. This involves


Adding angle brackets around type names.
Adding dollar signs at the beginning of constant names.
Translating (non-initial) underlines into hyphens.
Adding "struct-name$" prefixes to slot accessors.




In many cases, this default behavior will be precisely what you want. However, Melange provides mechanisms for specifying different translations for some or all of the declarations.


5.1. Mapping functions



The translations described above are provided by calls to a built-in "name mapping function" named "minimal-name-mapping-with-structure-prefix". You may specify other mapping functions via a "name-mapper:" option. Our example interface might then look like this:


define interface
   #include "gc.h",
      object-file: "/usr/lib/mindy/gc.o",
      name-mapper: c-to-dylan;
end interface;



Table 1 describes the four standard mapping function



Table 1: 




minimal-name-mapping-with-structure-prefix

Provides the translations described above.



minimal-name-mapping

Same as above, but excludes the "struct-name$" prefixes.



c-to-dylan

Like miinimal-name-mapping, but:
Adds hyphens to reinforce "CaseBased" word separation. 
Adds "get-" prefixes to slot accessors.



identity-name-mapping

Does no translation.



s that are provided by Melange.



Users may link new mapping functions into Melange. In the Mindy implementation, this is done as follows:


Create a new module which imports module "name-mappers" from library "c-parse". 

Define methods on the "map-name" generic function which accepts the following parameters:
•  mapper -- a <symbol> which is typically specialized by a singleton to select a specific name mapper method.
•  category -- a <symbol> which will always be one of: #"type", #"constant", #"variable", or #"function".
•  prefix --  a <string> which is typically prepended to the result string.
•  name -- a <string> which supplies the original C name.
•  sequence-of-classes -- a sequence of simple names for the classes which logically "contain" the given object. For example, if we were processing the declaration "struct str {int size; char *chars", one of the calls to the mapping function would have with namebound to "size" and classes bound to #["str"].
and returns a <string> which will be used as the Dylan name for the declaration.

Compile this module and "link" it into Melange by concatenating it to the end of the melange.dbc.




Mapping functions may call "hyphenate-case-breaks" which performs the same "CaseBased separation" as is done by "c-to-dylan". The trivial "identity-name-mapping" described above might be implemented by:


define method map-name

   (mapper == #"identity-name-mapping", category, prefix, name, classes)
=> (result :: <string>)
   name;
end method map-name;



You may specify different name mappers to be applied to the slots of "container types". This capability is described in a later section. 


5.2. Prefixes



As noted above, the name mapping function is passed a "prefix" argument. By default, it is an empty string, but users may specify a different value by adding a "prefix:" option to the interface definition. For example, we might expand the previous example to:


define interface
   #include "gc.h",
      object-file: "/usr/lib/mindy/gc.o",
      name-mapper: c-to-dylan,
      prefix: "gc-";
end interface;



This would cause Melange to tack "gc-" onto the beginning of every translated symbol. Because the system knows about the "standard" Dylan naming conventions, it can do this intelligently. You would, therefore, get names like "<gc-bool>", "gc-time-to-gc", and "gc-scavenge".



Note that the interpretation of the "prefix" is entirely up to the name mapping routine. Identity-name-mapping, for example, completely ignores the prefix. All of the other standard mapping functions prepend it to the name before adding brackets or dollar signs, but after performing all other transformations.



Facilities for adding "localized" prefixes to slot accessors, enumeration literals, etc. will be described in later sections.


5.3. Explicit Renaming



Although the automatic name mapping described above is sufficient for most objects named within a header file, there are cases in which you might wish to explicitly control the name of one or more specific objects. You can do this through a "rename:" option. This options specifies a list of translations between raw C names and Dylan identifiers. For example, we might have:


define interface
   #include "gc.h",
      object-file: "/usr/lib/mindy/gc.o",
      name-mapper: c-to-dylan,
      prefix: "gc-"
      rename: {"struct obj" => <C-Object>, "collect_garbage" => GC};
end interface;



Note that the "target" of the renaming is an ordinary Dylan variable and is therefore case-insensitive. However, the source is an "alien name", which is (like all C code) case sensitive. Alien names should refer to an object, function, or type in exactly the same way you would refer to them in C. We therefore say "struct obj" instead of simply "obj", and might also say "enum foo" or "union bar". Alien names are actually parsed according to the standard lexical conventions of C, so you may use arbitrary spacing and even include comments if you really wish.



Note that "rename:" options supply names for new objects (and types) that are being imported into Dylan. You cannot, therefore, simply rename "bool" to "<Boolean>" to make it equivalent to the existing type -- this would simply result in a name conflict. For these purposes, you would instead use the "equate" and "map" operations, which will be described later. (In fact, if the C declaration had defined a type name "boolean", you might have to explicitly rename it to something else in order to avoid name conflicts with the existing type. Of course, in the above example, the "gc-" prefix would be sufficient to make the name unique.)


5.4. Anonymous types



The alien names described above can also be used to refer to C’s so-called "anonymous types". You can therefore refer to "char *", "int [23]", or even "int (*) (char *foo)" (i.e. a pointer to function which takes a string and returns an integer) [At present, function types are not fully supported. You should not depend upon them to work as expected.] . The ability to refer to anonymous types is useful because it allows you to use the "rename" option to provide explicit names for such types. Normally Melange would simply generate a an arbitrary "anonymous" identifier for the type. Without knowing the name of this type, you could not define new operations upon it. However, by saying, for example, "rename: {"char * => <char-ptr>"}", you can provide a convenient handle to use in defining new operations.




6. Type Definitions



When Melange encounters a "type definition" [The definition may be implicit, as in "char ** int" or "struct foo *bar". Simply by being present these code fragments supply implicit definitions for "char *", "char **" and "struct foo".]  within a header file, it will typically create a new Dylan class which corresponds to that C type. Usually, this will be a subclass of <statically-typed-pointer>, which encapsulates the raw C pointer value (i.e an object address). Each statically typed pointer class will have exactly the same structure (i.e. a single address), but the class itself can be used to determine what operations are supported on the data. This could include slot accessors for "struct"s and "union"s, dereference operations for "pointer" types, or general information about the object’s size, etc.



There are times when you will find that some of the types defined in a header file are not really "new". It might be that they are completely identical to some type defined in another interface definition, or they might be "isomorphic" to some existing type which has more complete support. Melange provides support for both of these cases. The first case is handled by "equating" the two types, while the second is handled by "mapping" (i.e. transforming) one type into the other.



For example, many header files contain definitions use the types "char *" and "boolean". The declarations of these types don’t provide any semantic interpretations -- "char *" is simply the address of a character, and boolean is nothing but a one-byte integer. However, by equating "char *" to the predefined <c-string> type, we can tell Melange that it is actually a <string> and should inherit all of the operations defined upon <string>s. Likewise, we can map the integral "boolean" values into "#t" and "#f" to get a <boolean>. These integral values will be automatically translated into <boolean>s when they are returned by a C function, and <boolean>s will be translated back into integers when passed as arguments to C functions.


6.1. Implicit class definitions



Unless otherwise specified, new classes will be created for each type defined in a C header file. When the header file provides meaningful names for these types, then Melange will pass those names to the mapping functions to generate names for the Dylan classes. Otherwise, an anonymous name will be generated, limiting your ability to refer to the new type. For example, "struct foo" would typically generate the class "<foo>", while "struct foo ***" might generate the class "<anonymous-107>". In either case, you can explicitly specify the name for the new class by using the "rename:" option described above.



Different sorts of C declarations will yield different sorts of Dylan classes as well as different sets of operations defined upon them. Therefore, we will consider each variety separately:


Primitive types -- The types "int", "char", "long", "short" and their unsigned counterparts are simply translated into <integer>, while "float" and "double" are translated into <float>. However, Melange knows the sizes of each of these types so that pointers and native C "vectors" of them (described below) will work properly. No new types are created for these types.
Pointer types-- Declarations like "int *" or "struct foo ***" generate new subclasses of "<statically-typed-pointer>". Note that "struct foo *" is actually treated as a synonym for "struct foo", and does not get a distinct class, although any extra levels of indirection (i.e. "struct foo **") will generate new pointer classes. Three operations are supported upon pointer classes:

pointer-value (pointer, #key index) => (value)
This function "dereferences" the pointer and returns the value. If index is supplied, then "pointer" is treated as a vector of values and the appropriate element is returned.
content-size (cls) => integer
Returns the size of the value referenced by instances of "cls". If the size is not known, this is 0.
Note that these types are not automatically treated as vectors. You may, however, make them so by using a "superclasses:" option to make them <c-vector>s.

Vector types -- Declarations like "char [256]" are treated almost identically to pointer types, but they are automatically defined as subclasses of <c-vector>, so that all vector operations will be defined on them. However, because many systems depend upon the lack of bounds checking in C, vector types have a default size of "#f". You may explicitly define "size" functions to provide a more accurate size.
Structure types -- Declarations like "struct bar {int a; char *b;}" also generate new subclasses of "<statically-typed-pointer>". Melange will define all of the operations defined for pointer values (described above), as well as accessors for each of the structure slots. Structure objects are always accessed through "pointers" to them. Therefore, unless a non-zero index is specified, "pointer-value" will simply return the object passed to it. (The operation is still defined because non-zero indices can be used for vector access.)
Union types -- Declarations like "union bar {int a, char *b;}" are treated the same as struct declarations, except that the slot accessors all refer to the same areas in memory.
Enumeration types -- Declarations like "enum foo {one, two, three};" are simply aliased to <integer>. However, constants are defined for each of the enumeration literals.
Typedefs -- Declarations like "typedef struct foo bar" simply define new names for existing types.


6.2. Specifying class inheritance



When Melange creates new "<statically-typed-pointer>" classes, it typically creates them as simple subclasses of "<statically-typed-pointer>", with no other superclasses. However, you might sometimes need more control over the class hierarchy. For example, you might wish to specify that a C type should be considered a subtype of the abstract class "<sequence>". You could accomplish this via the following declarations:


define interface
   #include "sequence.h";
   struct "struct cons_cell" => <c-list>,
      superclasses: {<sequence>};
   function "c_list_size" => size;
end interface;
define method forward-iteration-protocol (seq :: <c-list>)
....



Note that the type "<c-list>" will still be a subclass of "<statically-typed-pointer>" -- we have simply added "<sequence>" to the list of superclasses. If "<statically-typed-pointer>" is not explicitly included in the "superclasses:" option, then it will be added at the end of the superclass list.



As demonstrated in the above example, you are still responsible for specifying whatever functions are required to satisfy the contract for the declared superclasses. "<C-list>" will be declared as a sequence, but you must specify a forward iteration protocol before any of the standard sequence operations will work properly.



The "superclasses:" option may currently be used within "struct", "union", and "pointer" clauses.




7. Translating Object Representations



Whenever a native C object is returned from a function or a Dylan object is passed into a C function, it is necessary to translate between the object representations used by the two languages. From Melange’s standpoint, native C objects consist of an arbitrary bit pattern which can be translated to or from a small number of "low level" Dylan types -- namely <integer>, <float>, or any subclass of <statically-typed-pointer>. This translation is handled automatically, although the user may explicitly specify which of the possible Dylan types should be chosen for any given C object type. In some cases, a further translation may take place, converting the "low level" Dylan value to or from some arbitrary "high level" Dylan type. (For example, an <integer> might be translated into a <boolean> or a <character>, and a <c-string> might be translated into a <byte-string>.) These "high level" translations are automatically invoked at the appropriate times, but both the "target" types and the methods for performing the translation must be specified by the user.


7.1. Specifying low level transformations



The target Dylan type for "low level" translations is typically chosen automatically by Melange. Integer and enumeration types are translated into <integer>; floating point types are translated to <float>; and all other types are translated into newly created subclasses of <statically-typed-pointer>. However, you may explicitly declare the target Dylan type for any C type by means of an "equate:" option:


define interface
   #include "gc.h",

      equate: {"char *" => <c-string>};
end interface;



This declaration makes the very strong statement that any values declared in C as "char *" are identical in form to the predefined type "<c-string>" (which is described in Appendix I). The system will therefore not define a distinct type for "char *" and will ignore any structural information provided in the header file. You migh also use an "equate:" option to equate a type mentioned in one interface definition with an identically named type which was defined in an earlier interface definition.



You should use caution when equating two types. Since Melange has no way of knowing when two types are equivalent, it must trust your declarations. No type checking can or will be done, so if you incorrectly equate two types, the results will be unpredictable. In some cases, you may wish to go with the less efficient but slightly safer technique of letting Melange create a new type and then "mapping" that new type into the desired type. (This is described in detail below.)



Note also that two types with identical purposes will not necessarily have identical representations. For example, C’s boolean types are simple integers and are not equivalent to Dylan’s <boolean>. Again, explicit "mapping" may be used to transform between these two representations.



In the current implementation, an "equate:" option only applies within a single interface definition. Other interface definitions will not automatically inherit the effects of the declaration. In future versions, we may add the ability to "use" other interface definitions (just as you would "use" another module withing a module definition) and thus pick up the effects of the "equate: (and "map:") options within those interfaces.


7.2. Specifying high level transformations



Sometimes you may wish to use instances of some C type as if they were instances of some existing Dylan class, even though they have different representations. In this case, you can specify a secondary translation phase which semi-automatically translates between a "low level" and a "high level" Dylan representation. In order to do this, you must provide a "map:" option:


define interface
   #include "gc.h",
      equate: {"char *" => <c-string>},

      map: {"bool" => <boolean>};
end interface;



This clause will cause any functions defined within the interface to call transformation functions wherever the original C functions accept or return values of type "bool". Two different functions may be called:


import-value (high-level-class :: <class>, low-level-value :: <object>)

This function is called to transform result values returned by C functions into a "high level" Dylan class. It should always return an instance of "high-level-class".

export-value (low-level-class :: <class>, high-level-value :: <object>)

This function is called to transform "high level" argument values passed to C functions into the "low level" representations which will be meaningful to native C code. It should always return an instance of "low-level-class".



Default methods, which simply call "as", are provided for each of these functions. This will be sufficient to transform C’s integral "char"s into <character>s, <c-string>s into other <string>s, or one "pointer" type into another. There is also a predefined method which will transform <integer>s into <boolean>s. However, if you wish to perform arbitrary transformations upon the values, you may need to define additional methods for either or both of these functions. For example, the default methods for transforming to and from <boolean> are:


define method export-value (cls == <integer>, value :: <boolean>)
 => (result :: <integer>);
   if (value) 1 else 0 end if;
end method export-value;
define method import-value (cls == <boolean>, value :: <integer>)
 => (result :: <boolean>);
   value ~= 0;
end method import-value;



It is important to note that, unlike "equate:" options, "map:" options don’t prevent Melange from creating new types. You may, in fact, both equate and map the same type. This will cause low level values to be created as instances of the "equated" type and then transformed into instances of the "target" type of the mapping. For example, you might take advantage of the defined transformations between string types by declaring:


define interface
   #include "/usr/include/sys/dirent.h",
      equate: {"char *" => <c-string>},
       map: {"char *" => <byte-string>};
end interface;



This causes the system to automatically translate "char *" pointers into <c-string>s (i.e. a particular variety of statically typed pointer) and then to call "import-value" ot translate the <c-string> into a <byte-string>. If we did not provide the "equate:" option, then we would have to explicitly provide a function to transform "pointers to characters" into <byte-string>s. The "equate:" option lets us take advantage of all of the predefined functions for <string>s, which includes transformation into other string types.




8. Other File Options



There are a few other options that may be specified within an "#include" clause, but which do not fit into any of the above categories. These options are "define:", "undefine:", "seal-functions:" and "read-only:".



The "define:" and "undefine:" options control the C preprocessor definitioins which will be implicitly defined during parsing of the header files. If you specify neither of these options, Melange will use a default set of definitions which correspond to those used by a typical C compiler for the machine you are running on. [At present, the only set of definitions provided will be those appropriate for the HPUX OS. However, it is straightforward to add dif ferent sets of definitions to Melange.]  The define options takes a string containing a single C token and an optional string or integer literal, which will be used as the expansion. (If no literal is specified, the token will be expanded to "1".) The "undefine:" removes one or more of the default definitions. You might, for example, use:


define interface
   #include "gc.h",
      define: {"PMAX", "BSD_VERSION" => "4.3"},

      undefine: {"HPUX"};
end interface;



The "seal-functions:" option controls whether the various imported functions and slot accessors will be sealed or open. By default, functions are sealed, but you may explicitly specify this by using "seal-functions: sealed" or reverse it by using "seal-functions: open". Melange does not support the Creole’s "inline" sealing option.



The "read-only:" option specifies whether setter functions should be defined for slot and object accessors. They will be defined by default, but if you specify "read-only: #t", no setters will be defined.



The effects of the "seal-functions:" and "read-only:" options can be modified for particular container types. We will explain how to do this in a later sections.




9. Function Clauses



Imported functions can be easily invoked, in almost every case, without any additional declarations. However, by exerting explicit control over argument handling, the interfaces to some functions may be made cleaner. This control is exerted via function clauses. The primary purpose of these clauses is to specify additional type information for specific parameters or to specify alternative argument passing conventions. For example, if we had two alternate "read-integers" functions with the following declarations:


int ReadInts1(int **VectorPtr);          /* result is a count of integers */
int *ReadInts2(int *Count);                /* result is a vector of integers */



we might use the following interface definition:


define interface
   #include "readints.h",
      rename: {"int *" => int-vector};
   function "ReadInts1"
      output-argument: 1;
   function "ReadInts2" => Read-Integers-Vector,
   output-argument: Count;
end interface;



This would produce two functions, both of which take 0 arguments but return two values. The first would return an <integer> following by an "<int-vector>", while the second would return the <int-vector> first and the <integer> second. 


let (count :: <integer>, values :: <int-vector>) 

   = Read-Ints1();
let (values :: <int-vector>, count :: <integer>) 

   = Read-Integers-Vector();



The function clause consists of a function name (which is a string), an optional renaming (as illustrated above), and an optional sequence of "options". The options include the following:


seal: -- specifies whether the resulting method should be sealed. Possible values are sealed or open, and the default is taken from the value specified in the initial file clause. (The "default default" is sealed.)
equate-result: -- overrides the default interpretation of the result type. The named type is assumed to be fully defined.
map-result: -- specifies that "import-value" should be called to map the result value to the named type.
ignore-result: -- specifies that the functions result value should be ignored, just as if the function had been declared "void". Although you may specify any boolean literal, the only meaningful value is #t.
equate-argument: -- overrides the default interpretation of some argument’s type. The argument may be specified by name or by position.
map-argument: -- specifies that "export-value" should be called to map the given argument into the named type. Again, the argument may be specified by position or by name.
input-argument: -- indicates that the specified argument should be passed by value. This is the default.
output-argument: -- indicates that the specified argument should be be treated as a return value rather than a "parameter". The effect is to declare that the C parameter will be passed by reference and that the reference variable need not be initialized to any object. This option assumes that the C parameter will have been declared as a "pointer" type, and will strip one "*" off of the argument type. Thus, if the parameter declaration specifies "int **", the actual value returned will have the Dylan type corresponding to "int *". 
input-output-argument: -- indicates that the specified argument should be considered both an input argument and that its (potentially modified) value should be returned as an additional result value. The effect is similar to that of "output-argument" except that the reference variable will be initialized with the argument value.




The following (nonsensical) example demonstrates all of the options, as they might be applied to the functions:


extern struct object *bar(int first, int *second,struct object **third);
extern baz(char first, struct object *second);
define interface
   #include "demo.h";
   function "bar",
      seal: open,
      equate-result: <object>,
      map-result: <bar-object>,
      input-argument: first,   // passed normally
      output-argument: 2,      // nothing passed in, second result value
            // will be <integer>
      input-output-argument: third;   // passed in as second argument, 
            // returned as third result
   function "baz" => arbitrary-function-name,
      seal: sealed,      // default
      ignore-result: #t,

      equate-argument: {second => <object>},
      map-argument: {2 => <baz-object>};
end interface;



10. Struct and Union Clauses



"Struct clauses" and "union clauses" (referred to collectively as "container clauses") are used to specify naming in inclusion of class slots in exactly the same way that the options in the file clause control the handling of global definitions. Like the function clauses described above, they consist of the reserved word "struct" or "union", a string which gives the full C name of the container declaration, an optional renaming, and a list of options. If we have a structure defined by 

typedef struct cons {

   int index;

   struct object *head;

   struct cons *tail;

} cons_cell;



we could use the following interface definition:


define interface
   #include "cons.h";
   struct "struct cons" => <c-list>,
      superclasses: {<sequence>},
      prefix: "c-list-",
      name-mapper: identity-name-mapping,
      exclude: {"index"};
end interface;



Valid options for container clauses include: import:, prefix:, exclude:, rename:, seal-functions:, read-only:, equate:, and map:.These options act like the equivalent options which may be specified in a file clause, but they apply to the slots of a single "class" rather than to globally defined objects. Options specified within a container clause override any global defaults that might have been specified in the "#include" clause.



Container clauses also permit the "superclasses:" option described in section 6.2. 



Although the recommended method for specifying a container type is to use the full C name (i.e. "struct foo"), you may also use an alias defined by a typedef. Thus, in the above example, you could have specified either "struct cons" or "cons_cell", with identical results.




11. Pointer Clauses



"Pointer clauses" modify the definitions of pointer declarations such as "int *" or "struct foo ***", or vector declarations such as "char [256]". Like all such clauses, they may be used to specify renamings for the classes. This is particularly useful for pointer types since they are not automatically assigned user-meaningful names. It also allows specification of the "superclasses:" option described in 6.2. A typical use might be:


define interface
   #include "vec.h";

   pointer "int *" => <int-vector>,
      superclasses: {<c-vector>};
   pointer "struct person **" => <people>,
      superclasses: {<c-vector>};
   pointer "char [256]" => <fixed-string>;
end interface;



This clause is particularly useful for declaring pointer types to be subclasses of <c-vector> so that they can be indexed via "element". (Note that this is not necessary for vector declarations, since they are automatically declared to be <c-vectors>.)




12. Constant Clauses



Constant clauses are used to override the values of constants specified in header files (i.e. "#define MAXINT 27"). The "value:" option, which is the only one supported, specifies a Dylan literal which will be taken as the value of the named constant. A typical use might be:


define interface
   #include "const.h";

   constant "MAXINT" => $maximum-fixed-integer,

      value: 9999999;
end interface;



13. Variable Clauses



Global variables declared within C header files are translated into "getter" functions which retrieve the value of the C variables and optional "setter" functions to modify those values. In effect, they are treated as slots of a "null object" --the getter function takes no arguments and returns the value of the variable, while the setter function takes a single value which will be the new value of the variable. Type mapping takes place for the arguments and results of these functions, just as it would for slot accessors.



Variable clauses support the following options:


getter: -- specifies a Dylan variable name which will be used to hold the getter function.
setter: -- specifies either a Dylan variable name which will be used to hold the setter function, or #f to indicate that there should be no setter function.
read-only: -- specifies whether the variable should be settable. "read-only: #t" is equivalent to "setter: #f".
seal: -- specifies whether the getter and setter functions should be sealed. Possible values are "sealed" or "open", and the default is taken from the "seal-functions:" option in the "#include" clause (or "sealed" if not specified there).
map: -- specifies the high-level type to which the variable should be mapped.
equate: -- specifies the low-level type to which the raw C value should be implicitly converted.




Appendix I -- 



Low level support facilities



The high level functions for calling C routines or for accessing global variables are all built upon a relatively small number of built-in primitives which perform specific low-level tasks. You should seldom have any need to deal with these primitives directly, but they are nonetheless available should you need to make use of them.



To use these types and operations, you should "use" the module "extern" from the "Dylan" library.


I.i. Predefined types
<statically-typed-pointer> [class]



Unless otherwise specified, C pointers are implicitly "equated" to newly created subclasses of <statically-typed-pointer>. This class is contains a single implicit slot which contains the raw C pointer. Because of implementation limitations in Mindy, you may not add any extra slots to subclasses of <statically-typed-pointer>, nor can such a subclass inherit slots from other classes. You may, however, create classes which are subclasses of both <statically-typed-pointer> and other (presumably abstract) classes which have no slots.



The "make" method for takes three keywords. The "pointer:" keyword tells it to initialize the new variable with the given value, which must be a <statically-typed-pointer> or an <integer>. If the no pointer value is specified, space will be allocated based upon the content-size of the specific type and upon the "extra-bytes:" and "element-count:" keywords. These keywords, which default to "0" and "1" respectively, tell how many objects are going to be stored in the memory and how many bytes of extra memory (beyond that specified by "content-size") should be allocated for each element.

<c-vector> [class]



<C-vector> is a subclass of <statically-typed-pointer> which inherits operations from <vector>. Because systems often depend upon C’s lack of bounds checking, the default size for <c-vector>s is "#f". However, subclasses of <c-vector> may provide a concrete size if desired. Types corresponding to declarations such as "char [256]" are automatically declared as subclasses of <c-vector>, but pointer declarations such as "char *" are not.

<c-string> [class]



<C-string> is a subclass of <statically-typed-pointer> which also inherits operations from <string>. It is implemented as a C pointer to a null-terminated vector of characters, and provides a method on forward-iteration-protocol which understands this implementation. This class may, therefore, be used for manipulating C’s native format for "string"s (i.e. "char *"). Note that the "null" string is considered to be a valid empty string. This is somewhat contrary to the semantics of many C operations, but provides a safer model for Dylan code.



The "make" method for <c-string>s accepts the "size:" and "fill:" keywords. 



There are a few surprising properties of <c-strings> which may users should be aware of, both of which result from the "null-terminated" implementation. Firstly, the "size" of the string is computed by counting from the beginning of the string, and is therefore not nearly as efficient as you might expect. Secondly, you should expect odd results if you try to store "as(<character>, 0)" into such a string. Finally, the "element" and "element-setter" methods must scan the string in order to do bounds checking, and may therefore be fairly slow. If you wish to (unsafely) bypass this checking, you must use "pointer-value" instead.

<c-function> [class]



<c-function>s, like <statically-typed-pointer>s, encapsulate a raw "C" pointer. However, <c-function>s also encode information about the calling conventions of the function which is (presumably) located at the given address. They may, therefore, be directly invoked just like any other function.

<foreign-file> [class]



The <foreign-file> class is used to store information about the contents of a particular object file. It is created by "load-object-file", and may be passed as an option to "find-c-function" and "find-c-pointer". (All of these functions are described below.)


I.ii. Locating native C objects



There are several functions provided which search for C functions or variables and return Dylan objects which refer to them. Note that Mindy does not have sufficient information to determine whether any given C object is a function, and therefore it depends upon the user (or, more often, Melange) to provide it with correct information.

load-object-file(files :: <list>, #key symbols) [function]



This function (which is presently only works on HPUX machines) attempts to dynamically load a given object file (i.e. ".o" or ".a") into the current Mindy process and load it’s symbol table to allow its contents to be located by "find-c-pointer" or "find-c-function". If it successfully loads the file, it will return a <foreign-file> encapsulating the symbol table information. Otherwise, it will return #f.



If you are not running on an HPUX machine, you will have to statically link object files into Mindy, as described in Appendix II.

find-c-pointer(name :: string, #key file :: <foreign-file) [function]



This function searches through the symbol table for the object file corresponding to the specified file (or for Mindy itself) and attempts to locate a symbol with the given name. If it finds such a symbol, it converts the corresponding address to a <statically-typed-pointer> and returns it. Otherwise, it returns #f.

find-c-function (name :: <string>, #key file) [function]
constrain-c-function (fun :: <c-function>,  [function]
 specializer :: <list>, rest? :: <boolean>,
 results :: <list>)



The function "find-c-function" is like "find-c-pointer", except that the result is a <c-function> (or #f). The resulting function is specialized to "fun(#rest args) :: <object>". However, it may be constrained to a different set of specializers via "constrain-c-function". This function accepts lists of types for the arguments and for the return values, and a boolean value which states whether optional arguments are accepted. The result declarations are particularly important, since they are used to coerce the raw C result value into an appropriate low level Dylan type. The possible types are <boolean>, <integer>, or any subclass of <statically-typed-pointer>. Note that although a list of result types is accepted, only the first can be meaningful since C does not support multiple return values.



Note: The functions in this section are likely to change drastically in the near future.


I.iii. Pointer manipulation operations



Each <statically-typed-pointer> encapsulates a pointer to some area of memory (i.e. a raw machine address). In itself, this does little good, except as an arbitrary token. However, Mindy provides a number of primitive operations which manipulate the contents of these addresses, or do basic comparisons and arithmetic upon the addresses themselves.

signed-byte-at (ptr :: <statically-typed-pointer>, #key offset) [function]
unsigned-byte-at (ptr :: <statically-typed-pointer>, #key offset) [function]
signed-short-at (ptr :: <statically-typed-pointer>, #key offset) [function]
unsigned-short-at( ptr :: <statically-typed-pointer>, #key offset) [function]
signed-long-at (ptr :: <statically-typed-pointer>, #key offset) [function]
unsigned-long-at (ptr :: <statically-typed-pointer>, #key offset) [function]
pointer-at (ptr :: <statically-typed-pointer>,  [function]
 #key offset, class)



These operations return an object which represents the value stored at the address corresponding to "ptr". The first six operations all return <integer>s -- the different versions are required because the same number may be represented in a variety of formats (differing in length and interpretation of the high-order bit) and because Mindy has no way of determining which might be used in a given situation. The final operation, "pointer-at", returns a new <statically-typed-pointer> encapsulating the address referenced by the origninal pointer. You may use the "class:" keyword to specify that the new object should be an instance of some particular subclass of <statically-typed-pointer>. (Thus, for example "pointer-at(foo, class: <bar>)" would be roughly equivalent to "as(<bar>, pointer-at(foo))".)



The offset parameter (if provided) is added to the integer corresponding to the machine address before the pointer is dereferenced. This is useful, for example, in loading an object from within a C "struct".



Setter functions are provided corresponding to each of the above functions. You can therefore, say 


signed-short-at(ptr) := 32767;
pointer-at(ptr1) := ptr2;

as(cls == <integer>, ptr :: <statically-typed-pointer>) [G.F. Method]
as(cls == <statically-typed-pointer>, ptr :: <statically-typed-pointer>)
 [G.F. Method]
as(cls == <statically-typed-pointer>, int:: <integer>) [G.F. Method]



Method upon "as" are provided for converting from <integer> to any statically typed pointer class and from any statically typed pointer class to <integer> or to another statically typed pointer class.

\+ (ptr :: <statically-typed-pointer>, int :: <integer>) [G.F. Method]
\- (ptr1 :: <statically-typed-pointer, ptr2 :: <statically-typed-pointer>)
 [G.F. Method]
\= (ptr1 :: <statically-typed-pointer, ptr2 :: <statically-typed-pointer>)
 [G.F. Method]



These functions do arithmetic upon the integers corresponding to the given pointers. The following code fragment


let new-ptr = ptr1 + 3;
let difference = ptr2 + ptr3;
let same? = (ptr2 = ptr3)




is equivalent to


let new-ptr = as(ptr1.object-class, as(<integer>, ptr1) + 3);
let difference = as(<integer>, ptr2) - as(<integer>, ptr3);
let same = (as(<integer>, ptr2) = as(<integer>, ptr3));




Appendix II -- 



Static linking mechanisms



Because object file formats vary widely by architecture, Mindy does not support dynamic loading of object files or automatic symbol table look up for all architectures. In the general case, it is necessary to depend upon a less elegant technique for explicitly making certain C objects available. 



Simple instructions for using this mechanism from within Melange are given in section 3.1. This appendix simply provides more information on the underlying mechanism.



In order to make sure that the desired symbols can be located, it is necessary to build an explicit table which maps between the symbol’s name and its address. This table is automatically created by running the "make-init.pl" script [This requires you to have PERL installed on your system.]  upon a list of "interface definition files". This will create two files ",extern1.def" and ",extern2.def", which should then be renamed to "extern1.def" and "extern2.def" respectively. These files are automatically included by "ext-init.c" so that the table will be created after Mindy is rebuilt.



The interface definition files consist of zero or more lines of text, each of which should contain the name of one object. If the object is a function, it should be immediately followed by a set of parentheses. For example, the file which defines the memory allocation routines used by Melange’s support code contains the following four lines:


free()
malloc()
strcmp()
strlen()



The only other step required to make the objects available is simply to ensure that the library which contains them is linked into Mindy. The easiest way to accomplish all of this is to simply modify the Makefile in Mindy’s source directory. If you add the names of the required libraries to LIBS and the names of the interface definition files to EXTERN-INCLUDES, make will do the necessary work for you. You should be sure to leave "../compat/libcompat.a" or "-lm" in LIBS and "malloc.inc" in EXTERN-INCLUDES.




Appendix III -- 



Differences from Creole



It would be difficult to produce an exhaustive list of the differences between Creole and Melange. We can, however, include a brief examination of the most important incompatibilities between the two systems. 

1. Creole’s "type:" options have been replaced by Melange’s "equate:" and "map:" options.
2. Creole’s "access path" options have been replaced by "object-file:" and "mindy-include-file:".
3. The interface to "import-value" and "export-value" differ between the two systems.
4. Melange does not inherit type mappings from other "define interface" forms.
5. Creole does not import definitions from "recursively included" header files, even if they are referenced by definitions which are imported.
6. Creole does not support C vectors or "sub-structures" as first class objects.
7. Melange does not presently support callbacks, "export-temporary-value", "<pascal-string>", "with-stack-structure", "with-stack-block", or "alien-method".
8. Creole will never consider instances of two distinct statically typed pointer classes to be "=", even if they refer to the same address.



Appendix IV -- 



Known limitations



Although mostly complete, the current implementation of Melange is missing a few elements which might be required for some applications. The following capabilites probably should be present, but are not yet supported:

1. Floating point numbers. 
2. Callbacks.
3. Function types. (It is, however, possible to import a function as a simple <statically-typed-pointer> and then manipulate it like any other object.)



Appendix V -- 



Proposed modifications



Although Melange seems to be fairly useful in its present form, we are currently considering a number of ways in which it may be made more useful. This section contains a brief discussion of several potential changes which may be implemented in the future.


V.i. Enumeration clauses



At present, there is no way to modify the default handling of a C enumeration declaration. It is clear that you might wish a mechanism to specify several different explicit options: prefixes for the enumeration constants; respecification of constant values; and, of course, explicit "import:" and "exclude:" options.


V.ii. Inheritance of "map" and "equate" options



There are some cases in which a set of types imported within one interface definition might be used extensively within another. In the present implementation, the two interface definitions would be handled independently and equivalences between types would not be recognized in the abscence of explicit "equate:" options.



One proposed solution would involve the ability to explicitly "use" one interface definition within another. This would result in all identically named types being implicitly equated and all top-level "map:" options being inherited. The "use" clause could support roughly the same syntax as the "use" clauses in library and module definitions.In order to make this work, it would be necessary to assign arbitrary names to interface definitions. This would have the added benefit of making them more consistent with other standard Dylan definition forms. 



If this change were implemented, a typical interface definition might look something like the following:


define interface date
   #include "date.h";
   use time, import: {"struct time"};
end interface date;



A less ambitious version might remain compatible with the current syntax by replacing the interface name with an "interface-name" option, which would default to the root of the file name. Thus,


define interface
   #include "date.h",
      interface-name: "date";
end interface;



would yield the same effect as the previous example.


V.iii. Remerging of the "equate:" and "map:" options



It has been pointed out that the current method of specifying low-level and high-level mappings, while sufficiently expressive, is somewhat verbose and confusing. It would therefore be good to find an alternative notation. 



It has been suggested that definitions like:


define interface
   #include "dirent.h",

      equate: {"char *" => <c-string>},

      map: {"char *" => <byte-string>};
end interface;



might be replaced by something like:


define interface
   #include "dirent.h",
      equate-and-map: {"char *" => <c-string> => <byte-string>};
end interface;



or


define interface
   #include "dirent.h";
   transform "char *",

      low-level: <c-string>,
      high-level: <byte-string>;
end interface;


Mindy Compiler

Mindy Debugger

Mindy Object

Extensions

Streams Library

Standard IO

Print Library

Format Library

Melange Interface

TK Library

Collection extensions

Table Extensions

String extensions

Regular Expressions

Transcendental Library

Time Library

Random Library

Matrix Library










Copyright 1994, 1995, 1996, 1997 Carnegie Mellon University. All rights reserved.
Send comments and bug reports to gwydion-bugs@cs.cmu.edu
minimal-name-mapping-with-structure-prefix	Provides the translations described above.
minimal-name-mapping	Same as above, but excludes the "struct-name$" prefixes.
c-to-dylan	Like miinimal-name-mapping, but: Adds hyphens to reinforce "CaseBased" word separation. Adds "get-" prefixes to slot accessors.
identity-name-mapping	Does no translation.