Wyvern

Wyvern: a Language for Engineering Mobile and Web Applications

Abstract. This document describes the rationale for the Wyvern programming language targeted at potential users of the language. It will grow to include a specification for Wyvern as well.

Motivation

Better programming languages have revolutionized software development--from Fortran, which freed programmers from assembly language, through Java, which brought type safety and garbage collection to the masses, and JavaScript, which made the web come alive. Yet current tools for building applications for the web and for mobile devides, two of the most vibrant software sectors today, are woefully inadequate. The problems are numerous and significant:

Existing industrial or research languages have made progress on some of these problems, but the solutions remain inadequate. For example, Ruby on Rails demonstrates that a single language and platform can, through the judicious use of internal domain-specific languages (DSLs), express a rich variety of artifacts, including code, presentation, navigation structure, and other features. However, developers should not have to give up the safety of typed languages--indeed, types are essential to improving the coordination among the artifacts used to describe a web or mobile application. Furthermore, integration should not be supported only on the server side, as with Ruby on Rails, but across the client and server.

Overview

A Wyvern is a two-legged, winged dragon. The Wyvern language emphasizes security, and just as treasure guarded by a Wyvern ought to be secure, so should be programs written in the Wyvern language.

Goal. The goal of Wyvern is to be an excellent programming language for engineering web and mobile applications. While the area of focus is important, the language is really driven by engineering needs. Engineers understand the need to balance multiple factors: with respect to a language, those factors include developer productivity, assurance of the end product, and run-time efficiency.

Target audience. Wyvern is targetted at software engineers who are developing applications for web and mobile platforms. Today, these developers are likely to be writing code in JavaScript on the client and in languages such as Python or Java on the server. Assurance, productivity, and efficiency are all important to our target audience.

Approach. Wyvern begins with a simple core language with good support for object-oriented programming as well as functional abstractions. It builds on this to address the challenges outlined above through a number of strategies:

Properties

Wyvern's design should have the following properties, which facilitate the overall goal of Wyvern as a language for engineering web and mobile applications. For each property we attempt to provide a basis for judging whether the language design adequately fulfills the property. This basis is ideally objective, but may be subjective in many cases by necessity. The properties are:

Research Goals

Wyvern is intended to be a useful, practical language, but also to be a means to investigate scientific questions. Through the design, implementation, and evaluation of the Wyvern language, we hope to pursue research in the following areas:

Lexical Structure

Goals

The lexical structure of Wyvern is intended to fulfill the following goals:

Design and Rationale

Wyvern is a whitespace-sensitive language. This can be implemented in a fairly simple and clearly specified way, as demonstrated by the Python language and Adams et al.'s POPL 2013 on principled parsing of whitespace-sensitive languages. Many programmers, including ourselves, feel that whitespace sensitivity enhances readability. It definitely avoids issues in matching parentheses and curly braces, and avoids the if statement ambiguity in C. Finally, whitespace indentation levels provide a convenient way to delimit DSLs, while placing few restrictions on the DSL. In particular, anything at all can appear in a DSL as long as it is indented relative to the surrounding text.

As a secondary point, whitespace sensitivity fits nicely with Wyvern's goal to support web programming, as several other languages in this space are whitespace sensitive.

Wyvern provides C-style and single-line comments. Line continuations are as specified in Python and in the C preprocessor (in C, newline characters are significant in macro definitions).

As the Python and C approach to line continuations seemed slightly ad-hoc, we considered alternatives such as allowing a line continuation when the next line was indented a specific amount. However, we are also using indentation to denote blocks and to delimit DSLs. We felt it would be ambiguous, overly restrictive, and/or too confusing to use indentation for two different purposes.

Specification

The input to lexical analysis is a stream of ASCII characters (but see the extensions below). The output of lexical analysis is a stream of tokens of the following kinds:

Comments. Wyvern supports C-style and single-line comments. In a C-style comment, all characters between a starting /* and an ending */ are ignored. C-style comments cannot be nested.

In single-line comments characters from a starting // to the end of the line are ignored. However, in a single-line comment the newline character at the end of the comment is not ignored in the rest of lexical analysis.

Lines and line joining. Our specification for explicit and implicit line joining is taken from part of the Python reference. A physical line is a sequence of characters terminated by an end-of-line sequence. An end-of-line sequence is one of: the ASCII LF character, the ASCII sequence CR LF, or the ASCII CR character.

Two or more physical lines may be joined into logical lines using backslash characters (\), as follows: when a physical line ends in a backslash that is not part of a string literal or comment, it is joined with the following forming a single logical line, deleting the backslash and the following end-of-line character. A line ending in a backslash cannot carry a comment. A backslash does not continue a comment.

Newline characters are ignored inside matching parentheses, square brackets, or curly braces, as in Python. Following the Python spec, implicitly continued lines can carry comments. The indentation of the continuation lines is not important. Blank continuation lines are allowed.

A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored (i.e., no NEWLINE token is generated).

Indentation. Our specification for indentation is adapted directly from Python's. Leading whitespace (spaces and tabs) at the beginning of a logical line is used to compute the indentation level of the line.

A line's indentation is denoted by the sequence of spaces and tabs preceding the first non-blank character of a line. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.

Although the language specification permits both tabs and whitespace in defining indentation, as different editors display tabs in different ways, it is recommended that programmers (and Wyvern editors) not use tabs in Wyvern files.

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.

Before the first line of the file is read, the empty string "" is pushed on the stack; this will never be popped off again. Each strings pushed on the stack will always have the previous string on the stack as a prefix, with at least one whitespace character added. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is longer, it is pushed on the stack, and one INDENT token is generated. If it is shorter, it must be one of the strings occurring on the stack; all strings on the stack that are longer are popped off, and for each string popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each string remaining on the stack that is longer than the empty string.

Whitespace after the first non-whitespace character of a line serves as a delimiter between tokens.

Other tokens. An identifier is a sequence of characters that begins with a letter or underscore, and contains letters, underscores, and digits. An identifier may also be a sequence of operator characters, which include =, <, >, !, ~, ?, :, &, |, +, -, *, /, ^, and %. Operator identifiers may not contain the comment sequences /* or //.

A number is a sequence of digits. A string begins with ", includes any number of non-" characters but no end-of-line sequences, and ends with a ".

Likely extensions

Wyvern Core Language

This section defines the core constructs of the Wyvern language, together with their dynamic semantics and typechecking. We scope the "core constructs" as the constructs that are sufficient to define the compiler and to serve as the seed for the entire standard library (given also the extension interface and the foreign library interface). We describe built-in constants, but their interpretation is delayed to the discussion of the Wyvern Standard Library, below.

Goals

  1. Understandability trumps simplicity and elegance when these conflict. E.g. o.m should generally not have effects, and the only exceptions should be obvious in the type (e.g. m is identifiably a property, not a function). Actually we still need to make this be true!

Types

Types in Core Wyvern consist of object types, function types, tuples, and option types. The 0-ary tuple "Unit" has only one value, written ().

Declarations in wyvern include types, methods, values, and variables. A block of consecutive types and method declarations may be mutually recursive. On the other hand, values and variables are only in scope after their declarations.

An object type in Wyvern is declared with the keyword type, and consists of a set of method and property signatures. An object type can be referred to anywhere in its scope using the name.

A method signature is declared with the keyword meth, and consists of a method name, method arguments, and method result type. The method arguments are a comma-separated list of pairs of each argument's name and type. If a result type is not specified then it is type Unit.

To be implemented later. The method result type may be a simple type, or it may be a tuple. If a tuple, the result type is enclosed in parenthesis and includes a comma-separated list of pairs of a tuple element's name and type. Support first-class tuples in pattern matching and argument passing.

A property signature is declared with the keyword prop (for properties that are only readable) or var (for properties that can be directly written). Property signatures consist of the property name and type.

Future note. We probably want some way of saying that a property is immutable (not just missing a write accessor) but we have postponed the decision about whether this goes in the type system or a specification.

A function type is written A -> B, where A is the argument type and B is the result type. A option type is written T?, and indicates that a value is either present at type T, or the value is null. ? binds more strongly than ->. In the future, when polymorphic types are added, T? may be syntactic sugar for something like Option[T] or T Option.

Example syntax for types is shown below:

type IntCell
    var contents : Int
	
type Stack
    prop top : Int?
    meth push(element : Int)
    meth pop() : Int?
	
type StackFactory
    meth make() : Stack
    meth makeWithFirst(firstElement : Int) : Stack

type ListUtilities
    meth map(f : Int -> Int, l : IntList) : IntList
    // meth map(f : Int -> Int) : IntList -> IntList // curried version
    // meth map(f : Int -> Int)(l : IntList) : IntList // curried version with sugar

Classes, Methods, and Fields

A class consists of a set of methods, fields, and class methods.

A method consists of a method signature, as described above, plus a method body. A method body is either a simgle expression after an = symbol, or a sequence of statements which is indented and starts on the following line. A class method is identical to a method except for the use of the keyword class, and it defines a method on the class object rather than on an object instantiated from the class.

A field consists of the keyword var (for mutable fields) or val (for immutable fields), a name, a type, and an optional initialization expression. If the initialization expression is ommitted, the field is initialized to the empty constant (for numbers, strings, and options), or else must be initialized in any object construction expression.

A class may be ascribed an object type and (separately) a class type. Object type ascription constrains the type of objects generated by the class. From outside the class body, the class appears to have only the elements in the ascribed type. Class type ascription constrains the type of the class itself. From outside the class, the class appears to have only the class methods mentioned (as ordinary methods) in the class type. If multiple class or object types are ascribed, the actual ascription is the type-theoretic intersection of the ascribed types (i.e. it has the union of the members in the ascribed types).

Each class defines a type that contains only that class's implementation. This type is the principal type of the class if no type is ascribed, otherwise, the type is a subtype of the ascribed type but has no additional members.

Future: public/private as syntactic sugar? What is the signature of a package? Can you distinguish (A) type members that refer to a particular type implementation from (B) type members that do not? Then type members are the general syntax. ML abstype? Do we write abstype as "class" in the signature of a package?

To consider: class fields? Seems like a bad practice generally, but some uses are OK (e.g. to support hash-consing a.k.a the Flyweight pattern). Default constructors. Destructors. Type members, bounding, and instantiation. Case of. Comprises. Tagged. Subtyping. Inheritance or delegation. Default method parameters, useful in particular for constructor calls?

In the present design, in order to access a field f (or method m) on the receiver, you must use "this.f" We may allow f to be used directly, but then we must use Newspeak's "lexical search first" rules to avoid capture (see Modules as Objects in Newspeak).

The standard name for class methods that act as constructors or factory methods is make.

Statements and Expressions

Statements start on a line, and formally include all following lines at the same or greater level of indentation. A val or var declaration consists of the keyword val or var, a name, an optional type, and an initialization expression. If the type is absent, it is inferred as the most precise type of the initialization expression. Val declarations define a read-only, let-bound variable, scoped to the statement starting on the following line at the same indentation level. Var declarations define a mutable variable with the same scope.

Expressions include variable reads and assignments, first-class functions, function applications, property reads and assignments, object creations, and method calls.

If a method is called or a property is accessed on null, using the special selector .?, the result is null. We may instead use some other form of syntactic sugar, such as "propNull x x.getOption()" where both x and the result of getOption are option types.

Methods may be called with named parameters: use x:5 syntax (avoid clash with =) or use := for assignment

To consider: where can new be used? Only inside the class? Only if no ascription (which would hide the new "operation") has been used? May want to autogenerate standard constructors in some cases. Notion of a principal constructor used for pattern matching. Rob: can we make "meth m(x) = e" sugar for "val m = fn x => e".

If class Link is defined with a method make, can we use the shorthand Link(0,null) in place of Link.make(0,null)? Constructors must be defined explicitly for each class, there are no defaults as in Java--we anticipate that IDEs will make this less painful

Example syntax for classes, methods, fields, statements, and expressions is shown below:

class StackImpl
    implements Stack
    class implements StackFactory

    var list : Link?

    meth top() = list.data

    meth push(element)
        list = Link(element, list)

    meth pop()
        val result = list.data
        list = list.next
        result

    class meth make() = new StackImpl

    class meth makeWithFirst(firstElement)
        new StackImpl
            list = Link(firstElement, null)


class Link
    val data : Int
    val next : Link?

    class meth make(d:Int, n:Link?) = new Link(data=d, next=n)
	
// a package-level method (method of the package object, if we have one)
meth stackClient()
    val s = StackImpl.Stack()
    s.push(5)
    print(s.top)
    val addOne : Int -> Int = fn(x:Int) => x+1
    print(addOne(s.pop()))

Constants

Built-in constants include integers, decimal numbers, floating-point numbers, strings, characters, and null. The empty constants for each type are as follows: for String, ""; for numbers, 0; for option types, null.

Packages

Signature ascription at the package level. What does "public" mean--public to package, or to file, or to class? Hierarchical signature ascription.

Reflection

Mirrors? Need something that is secure.

Wyvern Module System

Requirements

These requirements list the software engineering properties we want the module system to provide. First, we list the properties supported by the initial design described here: Some requirements we intend to support in future extensions of the module system: Some additional requirements we may eventually want to consider:

Design

A formal description of the design can be found in the Module System section of the core-language document. We choose to model modules essentially as objects that have additional infrastructure that gives each module a name (denoted by a URL) and allows the module to import other modules by URL. Modules include type members, so therefore we must add type members to the public interface of objects. Object type members now therefore include both defs and type declarations. Types can be arrow types, a named type in scope, or a type denoted by a path ending in a type.

In order for the type system to be sound, module paths that lead to a type must be constant--i.e. return the same type component each time they are evaluated (see Harper and Pierce's module system chapter in ATAPL for details). We therefore insist that a path start with an (unchangable) variable, and that every field in the path be constant. Constant becomes an annotation that can decorate a def in a public object (or module) type. A constant def can only be implemented by a val declaration form (i.e. the declaration of a field that cannot be assigned after initialization).

The module declaration construct includes the name of the module, which takes the form of a URL. The module can optionally be ascribed a type, which means that external modules importing it see it as an object of that type, and any members of the module not mentioned in that type are hidden. A module then has a series of import statements and a series of declarations.

Import statements include a URL from which the module is imported. If the URL is relative, the location of the current module is used as the starting point, following the convention used in HTML links. An import may optionally be ascribed a type. If so, that type must be a supertype of the imported module's type, and within the importing module, the imported module has the type ascribed in the import statement. An import may be given a short name, by which it is known from within the module. If no short name is given, the last name in the URL is used (this is the same convention used in the Go language).

Import statements may be used to import modules defined in other languages, such as JavaScript or Java. In the case of JavaScript, the type ascribed is used to give the module a type in Wyvern; this type is not checked (for now). The source language in an import may only be Wyvern, JavaScript, and Java for now, though others may be supported in the future. The source language is inferred from the MIME type of the URL (which in turn is based on the file extension in most URL infrastructures).

Declarations may be annotated as public. If a module is not ascribed a signature explicitly, a signature is generated from the public members of the module. Any member that is not public is implicitly not a part of its type. The same rule holds true for new statements used to initialize arbitrary objects. If a signature is ascribed to a module, either the module must have no members declared public, or all the elements exposed in the signature must be declared public (with subtypes of their types in the signature). Note that we choose private as the default rather than public in order to "nudge" developers to hiding things unless they should be exposed, rather than the other way around.

For now, there is a one-to-one correspondance between modules and files. This will quickly be broken when we support relative import.

Later extensions of the module system will have the following features:


Wyvern picture by Zigeuner —— Picture made for the Blazon Project of French-speaking Wikipedia. CC-BY-SA-3.0.