So far we’ve seen two common ways to prototype a programming language: as a stand-alone interpreter, and as a library (or embedded language) in an existing host language. In these assignments, we’ll see a third common implementation style that involves translating a program that was written in the source language — i.e., the language you are implementing — to a program in some existing target language that has the desired behavior. This implementation style, known as source-to-source translation, can be viewed as a lightweight form of compilation: rather than translating all the way down to machine code, the language implementer gets to leverage the features in a high-level target language, which significantly simplifies the task.
This approach can be seen as a middle ground between the two other approaches we’ve seen in class. Like with an interpreter, the source language has its own syntax, and requires a parser to convert this syntax into some form of abstract syntax trees. Like with an embedded language, it’s often possible to represent features in the source language directly, using counterparts in the target language. For example, it might be possible to translate a function in the source language to a function in the target language that has the same behavior, which then allows function calls in the source language to be implemented simply as function calls in the target language.
Of course, some language features in the source language will not map directly to a semantically-equivalent construct in the target language. In that case, the translation has to construct some code in the target language that will behave as desired. Depending on the complexity of that code, it may be nicer to factor it out to a library so that the target code is as simple and readable as possible. Such a library is similar in spirit to the idea of an embedded language that we saw in the previous assignment, except that this library is meant only for use by the source-to-source translator rather than directly by the programmer.
In Assignments 10 and 11, you’ll “prototype” an object-oriented programming language by translating it to JavaScript. We’ll start out with a basic version of the language in Assignment 10, and throw in some fancier features in Assignment 11.
We’ll start out with a “vanilla” object-oriented language with single inheritance. Our language is dynamically-typed, and has Java-like syntax.
To declare a new class in our language, you use the class
keyword:
Point
class with two
instance variables, x
and y
:
By default, every new class that you declare is a direct subclass of Obj
,
which is the root of the class hierarchy in our language. You can optionally specify a
superclass in a class declaration using the extends
keyword, e.g.,
Our language supports open classes. This means that you can add new methods to a
class without editing its declaration. In fact, the syntax of our language does not even
allow programmers to write methods as part of a class declaration. Here’s how you
add a method called init
with arguments x
and
y
to our Point
class:Point
’s init
method shown above for instances of ThreeDeePoint
:
ThreeDeePoint
’s version of init
takes 3 arguments
whereas Point
’s init
method only takes two arguments
— the former still overrides the latter.
To create a new instance of a class, you use the new
keyword just like in
JavaScript:
new
expression, our language invokes the
init
method on the new instance with the arguments supplied.
Here’s what happens when the expression
new C(
e1,
…,
en)
C
is created;init
method is called with the
arguments provided, i.e.,
newInstance.init(
e1,
… ,
en)
;new
expression.
Our language supports the kinds of statements and expressions that you’ll find in a
typical OO language: the abilities to send a message to an object and access / update the
value of an instance variable, etc. The init
methods above illustrate
assignment to instance variables, for example.
The next section includes a complete list of the statements and expressions in our language. We’ll describe the concrete and abstract syntax of each construct, as well as its expected behavior.
You know the drill. Here’s what the concrete syntax of the base language looks like, and how we’ll represent it as abstract syntax in JavaScript:
Concrete Syntax | JS AST | |
---|---|---|
p ::= | s1 … sn |
null otherwise.
new s1,
…, sn])
|
s ::= |
class C extends S with
x1, …, xn;
def C. m( x1,
…, xn) { s1 …
sm }
var x = e;
x
= e; this. x = e; return e; e
; |
new ClassDecl( C, S,
[ x1, …,
xn])
new MethodDecl( C, m,
[x1, … xn],
[s1, … sm])
new VarDecl( x, e) new VarAssign( x, e) new InstVarAssign( x, e) new Return( e) new ExpStmt( e) |
e ::= |
primValue
x
e1
this this. xnew C( e1, …, en) erecv
. m( e1, …, en) super. m( e1, …, en) |
new Lit( primValue) new Var( x) + , - , * , / ,
% , < , > , == ,
!= }These operators have the same semantics as they do in JavaScript. new BinOp( op, e1, e2)
new This() new InstVar( x) new New( C, [ e1, …, en]) new Send( erecv, m, [ e1, …, en]) new SuperSend( m, [ e1, …, en]) |
x, m, C, S ::= | sum |
"sum" |
primValue ::= |
null
|
if
statements and while
loops. There
is a very good reason for this, and we’ll tell you all about it in Homework 5.
Stay tuned!
Your job is to write a translator from our language to JavaScript. The translator will be
a function called trans
that takes the AST of a program and returns a string
containing the JavaScript code generated from that AST:
eval
feature of JavaScript (though unsurprisingly, our test code does).
It’s up to you to design an appropriate translation strategy, i.e., a mapping from
our language to JavaScript that will give the translated programs the desired behavior.
Here are a few tips to help you get started.
Re-read section about the “class sugar” in our JavaScript Primer.
The desugaring of the class syntax in JavaScript, which we
explained in detail, is almost exactly what
we’re asking you to implement in this assignment. So take another look, see how
classes are represented, how super-sends work, etc. (You’ll have to think a little
bit more about how to represent instance variables, which work differently in our
language.)
Divide and conquer.
Similar to the evaluators you wrote in earlier assignments, it is natural for your
translation to be compositional. That is, the translation of an expression or statement
should be defined in terms of the translations of its subparts (other expressions and
statements). This leads to a nice recursive solution, and it also ensures that you allow
the subparts themselves to be arbitrarily complex. For example, the arguments to a message
sends can be arbitrary expressions, including other message sends.
We have included some unit tests for your translator below. As in the previous
assignments, you can add your own test cases by editing
asst10-tests.js
.
In mainstream “object-oriented” languages like Java and C++, primitive values
like 5
and true
are not real objects. This is unfortunate
because (among other things) it often forces programmers to write code in an unnatural
way. Here are a couple of examples:
getAge()
as a method of
Person
, but you can’t write factorial
as a method of
int
?int
are not classes?!?! In Java,
this means that you can’t use them as type parameters of a generic class /
interface. For example, you can’t have a Set<int>
—
instead, you’re stuck with Set<Integer>
, i.e., a set of
boxed int
s. Is this really something the programmer should have
to deal with?As an aspiring language designer, we hope this lack of uniformity gives you the heebie-jeebies, and we know you can do better! It shouldn’t matter how an integer is represented at the language implementation level. Our job is to help programmers, and we shouldn’t expose them to implementation details that make programming more complicated than it has to be.
In this homework assignment, you will modify your translator to make our language “purely” object-oriented, i.e., a language in which everything is an object. As we’ll see, this has some really nice benefits for expressiveness.
As a first step toward supporting pure OO programming, modify your implementation so that
JavaScript’s primitive numbers, booleans, strings, and null
can be used
as first-class objects. Here’s what you’ll have to do in order to make that
possible:
Num
is the class of all numbers, Null
the class of
null
, and so on.)trans(new BinOp(
op,
e1,
e2))
return the same value as
trans(new Send(
e1,
op,
[
e2]))
.
+
and
-
yet. Fix this by adding methods to Obj
that correspond to
each of the following operators: +
, -
, *
,
/
, %
, <
, >
,
==
, and !=
. These methods should behave just like their
corresponding JavaScript operators. After this, you will have restored the original
semantics of our language, but now you’ll be able to override operators, e.g.:
asst11-tests-part1.js
.
Borrowing from Smalltalk, our language also includes blocks, which are essentially an object-oriented version of lambdas, a.k.a. first-class functions. Here are some examples:
is a block with no arguments. | ||
is a block with two arguments, x and y .
|
||
is a block with one argument whose body consists of multiple statements. |
Concrete Syntax | JS AST | |
---|---|---|
e ::= |
…
{ x1, …, xn | s1 … sm } |
…
new BlockLit([ x1, …, xn], [ s1, …, sm]) |
You evaluate a block by sending it a call
message, to which you can pass the
appropriate arguments. Unlike in a method body, which requires an explicit
return
statement, a block implicitly returns the value of its last statement,
if it’s an expression statement, or null
otherwise. Here are some
examples:
should evaluate to 3 . |
|
should evaluate to 42 . |
|
should result in calling someObj ’s m method, then
someObj ’s n method, and evaluate to the result of
the latter.
|
|
should evaluate to null .
|
Just like lambdas, blocks can reference variables from their surrounding scope. A block also acts as kind of lexical scope: any variable declarations that are made inside a block are not visible outside it. Conveniently, JavaScript’s functions have both of these properties…
So you can (and should!) avoid the need to implement the semantics of closures and lexical
scopes from scratch by translating blocks to plain old JavaScript functions. As with the
treatment of numbers, strings and booleans, you will need to add a class for blocks
(Block
) that supports a call
method.
You’ve probably noticed that our language lacks control structures, e.g., it
doesn’t have if
or while
statements. It turns out we
don’t need any built-in control structures because it’s straightforward for
programmers to define their own, as ordinary methods. This power comes from a combination
of purity (the fact that everything in our language is an object) and support for
open classes (the fact that a programmer can add new methods to any class in the
system).
For example, an if-then-else “statement” can be defined as a method
thenElse
on Bool
s that takes two blocks as arguments, one for
each branch of the conditional. With appropriate implementations for the classes
True
and False
, it is now possible to write conditionals like
the following:
return
Inside a Block
As mentioned earlier, a block implicitly returns the value of its last expression
statement. Sometimes it is more natural for a block to directly return from its enclosing
method — this is especially the case when blocks are used to implement control
structures. In our language, the return
statement inside a block acts as such
a non-local return. For example, here is an implementation of the absolute value
method for Number
s:return
statement is executed in the above code, it returns the
associated value from the abs
method itself, and returns control to the
caller of abs
, rather than just returning from the block. While it may seem
like there are two different kinds of return
in our language, this
isn’t really the case. A return
inside a block means exactly the same
thing as a return
inside a method: return this value from (this
particular activation of) the enclosing method.
One interesting issue is how to treat non-local return
s in cases where the
block is passed around before it is called. In our language, it’s a run-time error
to try to execute a return
from a block whose enclosing method has already
returned. Otherwise, it is OK for a block to execute a return
, regardless of
where on the call stack the enclosing method’s activation record is. For instance,
in the absolute value example above, a return
causes the activation records
for Block
’s call
method and Bool
’s
thenElse
method to be popped off the stack, and the return value is then
associated with the original call to Number
’s abs
method.
Here’s another example:
To implement non-local return
properly, the stack must be
“walked,” popping off stack frames until the right activation record is found.
Hint: Exceptions already walk the stack, so it is natural to use them to
implement non-local returns. The main difficulty is to ensure that a return
is always associated with the correct method invocation.
Here are a few unit tests for this part of the assignment. To add your own test cases,
just edit asst11-tests-part2.js
.
If you're interested in pushing this project further, here are a few ideas you might try:
Class
objects. (Food for thought: what is the
superclass of Class
?)