Comparative Programming Languages

You mean there's something else besides Java? Wow!
But why?

 

Machine Language becomes Assembler

Review: The Programming Process:

At the dawn of computing, programmers skipped the second step!
They knew the machine language coding scheme for a particular computer, and actually wrote object code programs in machine language (binary numbers).
They put the machine code directly into the computer's memory, pushed the "go" button, and the computer executed the program that was in memory.

For example, to save the decimal value "97" in a register (on a standard Intel machine), instead of writing "int al = 97;" as in Java, you would look up the op code for putting a constant into register 0 (B0), convert 97 into hexadecimal, and then write:

10110000 01100001 (Hexadecimal: B0 61)

This was really a pain.
But beyond the humanity of it, programmer productivity can be improved by automating the really boring, repetitive parts, such as looking up instruction numbers and calculating decimal to binary conversions of constants.

So pretty quickly people switched to using assemblers and assembly languages.
An assembler is a program that lets you write in a more human-readable language, but each line of code still corresponds pretty much to a machine language instruction. So to use the same example, instead of "int al = 97;" you would write:

[Intel x86:]
mov al, 97
which gets turned by the assembler into:
[Intel x86:]
10110000 01100001 (Hexadecimal: B0 61)
You'd also have an assembler manual that described the instructions for this computer's assembler:
[Intel x86:]
MOV reg8,imm8      ; B0+r ib

 

The Assembler becomes a Compiler/Interpreter

So for a while, programmers used assembly languages, and were just really happy not to have to write lots of 1s and 0s (well, hexadecimal numbers, really).

But once computers were a little more powerful, the question arose: Can we make life even easier for the programmer?
As an obvious example, you often want to execute a loop with a variable starting with one value, going until it hits a limit, and updating each time through the loop.
In assembly language, this involves at least 4 instructions, and there's lots of opportunities for typos (bugs).
Wouldn't it be nice to have just one line of code do that?

[FORTRAN:]
DO 1, I = 0, 7
<body>
1 CONTINUE

Wow! Much less pain, and more programmer productivity!

A program that converts a more abstract (high-level) language into machine code is called a compiler.
A related idea is that instead of compiling and then executing, you could write a program that just executes the high-level language as it reads it in. A program that does this is called an interpreter. BASIC is an early example of a language that is frequently interpreted.

[BASIC:]
50 FOR I = 1 TO N
60 LET TOTAL = TOTAL + I
70 NEXT I

 

Further Improvements in Programming Productivity

So improving productivity has been a big motivator in the development of programming languages.
An interesting insight: Programmer productivity seems to be roughly constant in the number of lines of code per day.
(Similarly, a procedure should be at most one page of code, so you can see it all at once.)
So a more abstract (higher-level) language, that lets you say more in one line, immediately makes you more productive.

Another (famous) insight: "GO TO Statement Considered Harmful" (Dijkstra 1968).
Structured programming only lets you use well-defined control constructs, won't let you write "rat's nest" or "spaghetti" code.
(You may never have used a language with a GOTO statement.)
An example of reducing the power of a language to keep you out of trouble.

Somewhat tangential, but really helpful: smart source code editors as part of Integrated Development Environments (IDEs).
(We've been using Eclipse to write our Java programs.)
It's really nice to have the editor warn you when you type something illegal!

 

Concept of a High-Level Language

Note that we are now treating the high-level language as data for a program (the compiler/interpreter).
This is the fundamental concept of the "von Neumann architecture" machine.

Since the compiler/interpreter for a high-level language is really just a program someone writes, there can be lots of them... and there are.
Hundreds! Well, actually, thousands (if you include less popular languages).
(If you have a year to spare, you can build your very own new language. The real trick is to get anyone else to really use it.)
I, Dr. Bob, counted up how many programming languages I actually have known. Over fourteen.

But are there good reasons to have so many different languages?
After all, they're all Turing equivalent, right?
Well, there are reasons, some of them good, some of them maybe not so great.

The question is now: How would you like to tell the computer what it should do?
Imagine you could say it however you wanted.
(Well, within reason. English is not a choice (yet). English might not be a good idea anyway, since it's so imprecise; that's why scientists use a lot of math, and mathematicians use formal logic.)

Different people have different ideas of what they'd like in a programming language, so you get different languages.
Some examples of varying purposes/situations and languages designed for them include:

 

Making a new High-Level Language

Cool, I'm in. How do I do it?

Typically, two major steps:
Specify the language's syntax (grammar) and semantics (meaning).
Implement the program: the actual compiler/interpreter.
Sometimes only one of these is done! Some languages are never clearly defined; a few have been defined and never implemented.

Defining your new language

Define your syntax: What your compiler/interpreter actually will receive as input will be just a big string of characters.
Dividing this up into significant "words" like variable names, reserved words (like while), and numbers is called tokenization or lexical analysis.
Example token definition:

integer  ::= [+-]?['0'-'9']+
(This form of notation is called Backus-Naur Form (BNF).)

Once you have tokens, you need to define how they combine into larger meaningful pieces, like arithmetic expressions.
Example piece of syntax:

expr	::= ws factor [ws ('*'|'/') ws factor]
factor	::= ws term [ws ('+'|'-') ws term]
term	::= '(' ws expr ws ')' | '-' ws expr | number
Taking the series of tokens and analyzing it to get useful structures is called parsing.
When carefully defined, programming languages usually use context-free grammars, to make parsing efficient.
IDEs will usually do the parsing while you're typing, which is how they can catch many errors as you type.
(The language Lisp has essentially "no syntax", because you parenthesize everything. This enables seamless user extensions.
Yet it had one of the earliest IDEs, to avoid having to balance all those parentheses yourself!)

Define your semantics somehow: This is often English plus examples. Plus things like the conventions of arithmetic notation.
ML is unusual in having a true semantic definition.

Implementing your new language

Once you have the definition worked out, you typically build a compiler/interpreter that takes source code that complies with your specification, and
produces machine code that does what you asked for, according to the semantics you defined. This is not easy.
One popular technique: attach procedures to the parsing rules.

Note that the language designer needs to figure out how to make all their amazing new features actually work.
And all the features have to work together, in any legal combination.
Seemingly simple things can get tricky:
Even FORTRAN let you have lots of different variable names.
But real machines only have a small number of hardware registers. So you need a symbol table to keep track of the variable names in the program, and where you've actually stashed them.
Serious compilers try hard to do a smart job of register allocation so that if you work on a particular variable a lot, it stays in a register.

Since this gets complicated, break it up into phases:

All of this will give you a basic optimizing compiler for a "normal" language.
But things can get even more complicated.

If you're building a serious new language for commercial use, a big issue is portability: it should work on "all" platforms, and work the same on all of them.
It should also be as highly optimized as possible.

These kinds of concerns led in Java (and then Microsoft's .NET) to the idea of compiling the language into bytecodes that run on a virtual machine.
The front end that converts source code into bytecodes can be the same on all platforms, thus achieving portability.
The Virtual Machine analyzes the bytecode program, and decides whether any particular part of the program should be interpreted, precompiled, or Just-In-Time (JIT) compiled, depending on what would be the most efficient!

 

Other Interesting PL Differences/Ideas

Strong vs. weak typing: Some languages avoid typing, so that one variable can store different types at different times.
Some languages require you to specify a lot of type info to make sure you're doing the right thing.
A few languages try to infer types to give you the advantages of type checking without having to writing a lot of type declarations.

Standard libraries: The language generally will include a bunch of predefined stuff beyond the basic structure of the language (like RobotWindow and String).
A major decision to make is how much and what kind of stuff should you include in your standard libraries. This again depends on your goals:

One-pass versus multi-pass compilers: if you're allowed to textually use something before it is defined, the language cannot be compiled in one pass.
Might be okay, but some languages (Pascal) are careful to avoid that.

Lazy evaluation: Some languages let you work with infinite sets!
"How?!" you may ask.
Why, by only evaluating things that are actually used, "lazily".

Functions as "first-class objects": Some languages let you treat functions as values, and pass them as parameters or stick them in arrays. Woah.

 

Current Programming Language Research

Parallelism: Greater parallelism allows computing speed to increase, but makes programming more complicated. Can we design high-level languages that let you write parallel programs without having to know lots of details about the parallel hardware it will run on?

Security/Reliability/Verifiability: These go together as "Trustworthy Computing". For crucial programs (such as autopilots), can we prove that the code is correct? Can we prove that software you want to download does not include viruses or spyware?

 

Other kinds of "Computer Languages"

We usually think of programming languages as languages for telling the computer what to do.

But we already saw one example of a language for describing things (HTML):

<html>
<p>I'm <tt>avrim.pc.cs.cmu.edu</tt>; my primary user is
	<a href=http://www.cburch.com/>Carl Burch</a>.</p>
</html>

There are also languages for asking questions. The most well-known one is probably SQL. It is used for writing database queries:

SELECT isbn, title, price, price * 0.06 AS sales_tax
FROM Book
WHERE price > 100.00
ORDER BY title;
returns a list of books that cost more than 100.00 with an additional "sales_tax" column containing a sales tax figure calculated at 6% of the price.