Tokens and Java Programs

Advanced Programming/Practicum
15-200


Introduction In this lecture we will learn about the lowest level of the Java language: its tokens. We will learn how to recognize and classify every category of token (which is like classifying English words into their parts of speech). Towards this end, we will employ ourly new learned EBNF skilss to write and analyze descriptions for each category of token. In later lectures we will learn about a programming language's higher level structures: phrases (expressions), sentences (statements), paragraphs (blocks/methods), chapters (classes), and books (packages).

The Family History of Java Before going on to study Java, let's take a brief look, through quotes, at the languages on which Java was based, traveling back over 30 years to do so.
Where it starts: C The earliest precursor of Java is C: a language developed by Ken Thompson at Bell Labs in the early 1970s. C was used as a system programming language for the DEC PDP-7. C began achieving its widespread popularity when Bell's Unix operating system was rewritten in C. Unix was the first operating system written in a high-level language; it was distributed to universities for free, where it became popular. Linux is currently a popular (it is still free!) variant of Unix.

"C is a general-purpose programming language which features economy of expression, modern control flow and data structures, and a rich set of operators. C is not a "very high level" language, nor a "big" one, and is not specialized to any particular area of application."

- B. Kernighan/D. Ritchie: The C Programming Language
(Kernighan & Ritchie designed and implemented C)
From C to C++ "A programming language serves two related purposes: it provides a vehicle for the programmer to specify actions to be executed, and it provides a set of concepts for the programmer to use when thinking about what can be done. The first aspect ideally requires a language that is "close to the machine," so that all important aspects of a machine are handled simply and efficiently in a way that is reasonably obvious to the programmer. The C language was primarily designed with this in mind. The second aspect ideally requires a language that is "close to the problem to be solved" so that the concepts of a solution can be expressed directly and concisely. The facilities added to C to create C++ were primarily designed with this in mind"
- B. Stroustrup: The C++ Programming Language (2nd Ed)
(Stroustrup designed and implemented C++)
Java as a Successor to C++ "The Java programming language is a general-purpose, concurrent, class-based, object-oriented language. It is designed to be simple enough that many programmer can achieve fluency in the language. The Java programming language is related to C and C++ but it is organized rather differently, with a number of aspects of C and C++ omitted and a few ideas from other languages included. It is intended to be a production language, not a research language, and so, as C.A.R. Hoare suggested in his classic paper on language design, the design has avoided including new and untested features.
...
The Java programming language is a relatively high-level language, in that details of the machine representation are not available through the language. It includes automatic storage management, typically using a garbage collector, to avoid the safety problems of explicit deallocation (as in C's free or C++'s delete). High-performance garbage-collected implementations can have bounded pauses to support systems programming and real-time applications. The language does not include any unsafe constructs, such as array accesses without index checking, since such unsafe constructs would cause a program to behave in an unspecified way."
- J. Gosling, B. Joy, G. Steele, G. Bracha: The Java Language Specification

Overview of Tokens in Java: The Big 6 In a Java program, all characters are grouped into symbols called tokens. Larger language features are built from the first five categories of tokens (the sixth kind of token is recognized, but is then discarded by the Java compiler from further processing). We must learn how to identify all six kind of tokens that can appear in Java programs. In EBNF we write one simple rule that captures this structure:

token <= identifier | keyword | separator | operator | literal | comment

We will examine each of these kinds of tokens in more detail below, again using EBNF. For now, we briefly describe in English each token type.

  1. Identifiers: names the programmer chooses
  2. Keywords: names already in the programming language
  3. Separators (also known as punctuators): punctuation characters and paired-delimiters
  4. Operators: symbols that operate on arguments and produce results
  5. Literals (specified by their type)
    • Numeric: int and double
    • Logical: boolean
    • Textual: char and String
    • Reference: null
  6. Comments
    • Line
    • Block
Finally, we will also examine the concept of white space which is crucial to understanding how the Java compiler separates the characters in a program into a list of tokens; it sometimes helps decide where one token ends and where the next token starts.

The Java Character Set The full Java character set includes all the Unicode characters; there are 216 = 65,536 unicode characters. Since this character set is very large and its structure very complex, in this class we will use only the subset of unicode that includes all the ASCII (pronounced "Ask E") characters; there are 28 = 256 ASCII characters, of which we will still use a small subset containing alphabetic, numeric, and some special characters.

We can describe the structure of this character set quite simply in EBNF, using only alternatives in the right hand sides.

lower-case <= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
upper-case <= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
alphabetic <= lower-case | upper-case
numeric     <= 0|1|2|3|4|5|6|7|8|9
alphanumeric <= alphabetic | numeric
special       <= !|%|^|&|*|(|)|-|+|=|{|}|||~|[|]|\|;|'|:|"|<|>|?|,|.|/|#|@|`|_
graphic     <= alphanumeric | special

In the special rule, the bracket/brace characters stand for themselves (not EBNF options nor repetitions) and one instance of the vertical bar stands for itself too: this is the problem one has when the character set of the language includes special characters that also have meanings in EBNF.

White space consists of spaces (from the space bar), horizontal and vertical tabs, line terminators (newlines and formfeeds): all are non-printing characters, so we must describe them in English. White space and tokens are closely related: we can use white space to force the end of one token and the start of another token (i.e., white space is used to separate tokens). For example XY is considered to be a single token, while X Y is considered to be two tokens. The "white space separates tokens" rule is inoperative inside String/char literals, and comments (which are all discussed later).

Adding extra white space (e.g., blank lines, spaces in a line -often for indenting) to a program changes its appearance but not its meaning to Java: it still comprises exactly the same tokens in the same order. Programmers mostly use white space for purely stylistic purposes: to isolate/emphasize parts of programs and to make them easier to read and understand. Just as a good comedian know where to pause when telling a joke; a good programmer knows where to put white space when writing code.


Identifiers The first category of token is an Identifier. Identifiers are used by programmers to name things in Java: things such as variables, methods, fields, classes, interfaces, exceptions, packages, etc. The rules for recognizing/forming legal identifiers can be easily stated in EBNF.

id-start    <= alphabetic | $ | _
identifier <= id-start{id-start | numeric }

Although identifiers can start with and contain the $ character, we should never include a $ in identifiers that we write; such identifiers are reserved for use by the compiler, when it needs to name a special symbol that will not conflict with the names we write.

Semantically, all characters in an identifier are significant, including the case (upper/lower) of the alphabetic characters. For example, the identifier Count and count denote different names in Java; likewise, the identifier R2D2 and R2_D2 denote different names.

When you read programs that I have written, and write your own program, think carefully about the choices made to create identifiers.

  • Choose descriptive identifiers (mostly starting with lower-case letters).
  • Separate different words in an identfier with a case change: e.g., idCount; this is called "camel style", with each capital letter representing a hump.
  • Apply the "Goldilocks Principle": not too short, not too long, just right.
During our later discussions of programming style, we will examine the standard naming conventions that are recommend for use in Java code.

Carefully avoid identifiers that contain dollar signs; avoid

  • homophones (sound alike): aToDConvertor   a2DConvertor
  • homoglyphs (look alike): allOs vs. all0s and Allls vs All1s
      which contain the letter (capital) O, number 0, letter (small) l, letter (capital) I, and number 1
  • mirrors: xCount   countX

Keywords The second category of token is a Keyword, sometimes called a reserved word. Keywords are identifiers that Java reserves for its own use. These identifiers have built-in meanings that cannot change. Thus, programmers cannot use these identifiers for anything other than their built-in meanings. Technically, Java classifies identifiers and keywords as separate categories of tokens.

The following is a list of all 49 Java keywords we will learn the meaning of many, but not all,of them in this course. It would be an excellent idea to print this table, and then check off the meaning of each keyword when we learn it; some keywords have multiple meanings, determined by the context in which they are used.

abstractcontinuegotopackageswitch
assertdefaultifprivatethis
booleandoimplementsprotectedthrow
breakdoubleimportpublicthrows
byteelseinstanceofreturntransient
caseextendsintshorttry
catchfinalinterfacestaticvoid
charfinallylongstrictfpvolatile
classfloatnativesuperwhile
constfornewsynchronized 

Notice that all Java keywords contain only lower-case letters and are at least 2 characters long; therefore, if we choose identifiers that are very short (one character) or that have at least one upper-case letter in them, we will never have to worry about them clashing with (accidentally being mistaken for) a keyword. Also note that in the Metrowerks IDE (if you use my color preferences), keywords always appear in yellow (while identifiers, and many other tokens, appear in white).

We could state this same tabular information as a very long (and thus harder to read) EBNF rule of choices (and we really would have to specify each of these keywords, and not use "...") looking like

keyword <= abstract | boolean | ... | while

Finally, assert was recently added (in Java 1.4) to the original 48 keywords in Java.


Separators The third category of token is a Separator (also known as a punctuator). There are exactly nine, single character separators in Java, shown in the following simple EBNF rule.

separator <= ; | , | . | ( | ) | { | } | [ | ]

In the separator rule, the bracket/brace characters stand for themselves (not EBNF options or repetitions).

Note that the first three separators are tokens that separate/punctuate other tokens. The last six separators (3 pairs of 2 each) are also known as delimiters: wherever a left delimiter appears in a correct Java program, its matching right delimiter appears soon afterwards (they always come in matched pairs). Together, these each pair delimits some other entity.

For example the Java code Math.max(count,limit); contains nine tokens

  1. an identifier (Math), followed by
  2. a separator (a period), followed by
  3. another identifier (max), followed by
  4. a separator (the left parenthesis delimiter), followed by
  5. an identfier (count), followed by
  6. a separator (a comma), followed by
  7. another identifier(limit), followed by
  8. a separator (the right parenthesis delimiter), followed by
  9. a separator (a semicolon)

Operators The fourth category of token is an Operator. Java includes 37 operators that are listed in the table below; each of these operators consist of 1, 2, or at most 3 special characters.

=><!~?:     
==<=>=!=&&||++--    
+-*/&|^% <<>>>>>
+=-=*=/=&=|=^=%= <<=>>=>>=

The keywords instanceof and new are also considered operators in Java. This double classification can be a bit confusing; but by the time we discuss these operators, you'll know enough about programmig to take them in stride.

It is important to understand that Java always tries to construct the longest token from the characters that it is reading. So, >>= is read as one token, not as the three tokens > and > and =, nor as the two tokens >> and =, nor even as the two tokens > and >=.

Of course, we can always use white space to force Java to recognize separate tokens of any combination of these characters: writing >   >= is the two tokens > and >=.

We could state this same tabular information as a very long (and thus harder to read) EBNF rule of choices (and we really would have to specify each of these operators, and not use "...") looking like

operator <=   = | > | ... | >>= | instanceof | new


Types and Literals The fifth, and most complicated category of tokens is the Literal. All values that we write in a program are literals: each belongs to one of Java's four primitive types (int, double, boolean, char) or belongs to the special reference type String. All primitive type names are keywords in Java; the String reference type names a class in the standard Java library, which we will learn much more about soon. A value (of any type) written in a Java program is called a literal; and, each written literal belongs in (or is said to have) exactly one type.

literal <= integer-literal | floating-point-literal | boolean-literal
                | character-literal | string-literal | null-literal

Here are some examples of literals of each of these types.

Literaltype
1int
3.14double (1. is a double too)
trueboolean
'3'char ('P' and '+' are char too)
"CMU ID"String
nullany reference type

The next six sections discuss each of these types of literals, in more detail.


int Literals Literals of the primitive type int represent countable, discrete quantities (values with no fractions nor decimal places possible/necessary). We can specify the EBNF for an int literal in Java as

non-zero-digit     <= 1|2|3|4|5|6|7|8|9
digit                     <= 0 | non-zero-digit
digits                   <= digit{digit}
decimal-numeral <= 0 | non-zero-digit[digits]
integer-literal      <= decimal-numeral
                              | octal-numeral
                              | hexidecimal-numeral

This EBNF specifies only decimal (base 10) literals. In Java literals can also be written in ocal (base 8) and hexidecimal (base 16). I have omitted the EBNF rules for forming these kinds of numbers, because we will use base 10 exclusively. Thus, the rules shown above are correct, but not complete.

By the EBNF rules above, note that the symbol 015 does not look like a legal integer-literal; it is certainly not a decimal-numeral, because it starts with a zero. But, in fact, it is an octal-numeral (whose EBNF is not shown). Never start an integer-literal with a 0 (unless its value is zero), because starting with a 0 in Java signifies the literal is being written as an octal (base 8) number: e.g., writing 015 refers to an octal value, whose decimal (base 10) value is 13! So writing a leading zero in an integer can get you very confused about what you said to the computer.

Finally, note that there are no negative literals: we will see soon how to compute such values from the negate arithmetic operator and a positive literal (writing -1 is exactly such a construct). This is a detail: a distinction without much difference.


double Literals Literals of the primtive type double represent measureable quantities. Like real numbers in mathematics, they can represent fractions and numbers with decimal places. We can specify the EBNF for a double literal in Java as

exponent-indicator   <= e | E
exponent-part           <= exponent-indicator [+|-]digits
floating-point-literal <= digits exponent-part
                                    | digits.[digits][exponent-part]
                                    | .digits[exponent-part]

This EBNF specifies a floating-point-literal to contain various combinations of a decimal point and exponent (so long as one -or both- are present); if neither is present then the literal must be classified as an int-literal. The exponent-indicator (E or e) should be read to mean "times 10 raised to the power of".

Like literals of the type int, all double literals are non-negative (although they may contain negative exponents). Using E or e means that we can specify very large or small values easily (3.518E+15 is equivalent to 3.518 times 10 raised to the power of 15, or 3518000000000000.; and 3.518E-15 is equivalent to 3.518 times 10 raised to the power of -15, or .000000000000003518) In fact, any literal with an exponent-part is a double: so even writing 1E3 is equivalent to writing 1.E3, which are both equivalent to writing 1000. Note this does not mean the int literal 1000!

Finally, all double literals must be written in base 10 (unlike int literals, which can be written in octal or hexadecimal)


boolean Literals The type name boolean honors George Boole, a 19th century English mathematician who revolutionized the study of logic by making it more like arithmetic. He invented a method for calculating with truth values and an algebra for reasoning about these calculations. Boole's methods are used extensively today in the engineering of hardware and software systems.

Literals of the primitive type boolean represent on/off, yes/no, present/absent, ... data. There are only two values of this primtive type, so its ENBF rule is trivially written as

boolean-literal <= true | false

In Java, although these values look like identifiers, they are classified as literal tokens (just as all the keywords also look like identifiers, but are classified differently). Therefore, 100 and true are both literal tokens in Java (of type int and boolean respectively).

Students who are familiar with numbers sometimes have a hard time accepting true as a value; but that is exactly what it is in Java. We will soon learn logical operators that compute with these values of the type boolean just as arithmetic operators compute with values of the type int.


char Literals The first type of text literal is a char. This word can be pronounced in many ways: care, car, or as in charcoal (I'll use this last pronunciation). Literals of this primitive type represent exactly one character inside single quotes. Its EBNF rule is written

character-literal <= 'graphic' | 'space' | 'escape-sequence'

where the middle option is a space between single quotes. Examples are 'X', or 'x', or '?', or ' ', or '\n', etc. (see below for a list of some useful escape sequences).

Note that 'X' is classified just as a literal token (of the primitive type char); it is NOT classified as an identifier token inside two separator tokens!


String Literals The second type of text literal is a String. Literals of this reference type (the only one in this bunch; it is not a primitive type) represent zero, one, or more characters: Its EBNF is written

string-literal <= "{graphic | space | escape-sequence}"

Examples are: "\n\nEnter your SSN:", or "" (the empty String), or "X" (a one character String, which is different from a char).

Note that "CMU" is classified just as a literal token (of the reference type String); it is NOT classified as an identifier token inside two separator tokens!


Escape Sequences Sometimes you will see an escape-sequence inside the single-quotes for a character-literal or one or more inside double-quotes for a string-literal (see above); each escape sequence is translated into a character that prints in some "special" way. Some commonly used escape sequences are

Escape SequenceMeaning
\nnew line
\thorizontal tab
\vvertical tab
\bbackspace
\rcarriage return
\fform feed
\abell
\\\ (needed to denote \ in a text literal)
\'' (does not act as the right ' of a char literal)
\"" (does not act as the right " of a String literal)

So, in the String literal "He said, \"Hi.\"" neither escape sequence \" acts to end the String literal: each represents a double-quote that is part of the String literal, which displays as He said, "Hi."

If we output "Pack\nage", Java would print on the console

Pack
age
with the escape sequence \n causing Java to immediately terminate the current line and start at the beginning of a new line.

There are other ways in Java to write escape sequences (dealing with unicode represented by octal numbers) that we will not cover here, nor need in the course. The only escape sequence that we wil use with any frequency is \n.


The null Reference Literal There is a very simple, special kind of literal that is used to represent a special value with every reference type in Java (so far we know only one, the type String). For completeness we will list it here, and learn about its use a bit later. Its trivial EBNF rule is written

null-literal <= null

So, as we learned with boolean literals, null is a literal in Java, not an identifier.


Bounded Numeric Types Although there are an infinite number of integers in mathematics, values in the int type are limited to the range from -2,147,483,648 to 2,147,483,647. We will explore this limitation later in the course, but for now we will not worry about it.

Likewise, although there are an infinite number of reals in mathematics, values in the double type are limited to the range from
-1.79769313486231570x10308 to 1.79769313486231570x10308; the smallest non-zero, positive value is 4.94065645841246544x10-324. Values in this type can have up to about 15 significant digits. For most engineering and science calculations, this range and precision are adequate.

In fact, there are other primitive numeric types (which are also keywords): short, long, and float. These types are variants of int and double and are not as widely useful as these more standard types, so we will not cover them in this course.

Finally, there is a reference type named BigInteger, which can represent any number of digits in an integer (up to the memory capacity of the machine). Such a type is very powerful (because it can represent any integer), but costly to use (in execution time and computer space) compared to int. Most programs can live with the "small" integer values specified above; but, we will also study this reference type soon, and write programs using it.


Comments The sixth and final category of tokens is the Comment. Comments allow us to place any form of documentation inside our Java code. They can contain anything that we can type on the keyboard: English, mathematics, even low-resolution pictures. In general, Java recognizes comments as tokens, but then excludes these tokens from further processing; technically, it treats them as white space when it is forming tokens.

Comments help us capture aspects of our programs that cannot be expressed as Java code. Things like goals, specification, design structures, time/space tradeoffs, historical information, advice for using/modifying this code, etc. Programmers intensely study their own code (or the code of others) when maintaining it (testing, debugging or modifying it). Good comments in code make all these tasks much easier.

Java includes two style for comments.

  • Line-Oriented: begins with // and continues until the end of the line.
  • Block-Oriented: begins with /* and continues (possibly over many lines) until */ is reached.
    • So, we can use block-oriented comments to create multiple comments within a line
          display(/*Value*/ x, /*on device*/ d);
      In contrast, once a line-oriented comment starts, everything afterward on its line is included in the comment.
    • We can also use block-oriented comments to span multiple lines
      /*
          This is a multi-line comment.
          No matter home many lines
          it includes, only one pair
          of delimiters are needed.
      */

      In contrast, a line-oriented comment stops at the end of the line it starts on.
Technically, both kinds of comments are treated as white space, so writing X/*comment*/Y has the same meaning in Java as writing the tokens X and Y, not the single token XY.

Typically Java comments are line-oriented; we will save block-oriented comments for a special debugging purpose (discussed later).

The EBNF rule for comments is more complicated than insightful, so we will not study here. This happens once in a while.


Program are a Sequence of Tokens built from Characters The first phase, a Java compiler tokenizes a program by scanning its characters left to right, top to bottom (there is a hidden end-of-line character at the end of each line; recall that it is equivalent to white space), and combining selected characters into tokens. It works by repeating the following algorithm (an algorithm is a precise set of instructions):
  • Skip any white space...
  • ...if the next character is an underscore, dollar, or alphabetic character, it builds an identifier token.
    • Except for recognizing keywords and certain literals (true, false, null) which all share the form of identifiers, but are not themselves identifiers
  • ...if the next character is a numeric character, ' or ", it builds a literal token.
  • ...if the next character is a period, that is a seperator unless the character after it is a numeric character (in which case it builds a double literal).
  • ...if the next two characters are a // or /* starting a comment, it builds a comment token.
  • ...if the next character is anything else, it builds a separator or operator token (trying to build the longest token, given that white space separates tokens, except in a char or String literal).
Recall that white space (except when inside a textual literal or comment) separates tokens.

Also, the Java compiler uses the "longest token rule": it includes characters in a token until it reaches a character that cannot be included.

Finally, after building and recognizing each token, the Java compiler passes all tokens (except for comments, which are ignored after being tokenized) on to the next phase of the compiler.


Common Mistakes I have seen the following mistakes made repeatedly by beginning students trying to tokenize programs. Try to understand each of these subtle points.
  • Tokenizing x as a char literal: it is an identifier.
  • Tokenizing 10.5 as two int literals separated by a period: it is a double literal.
  • Tokenizing int as a literal: it is a keyword, that happens to name a type in Java. Tokens like 1 are literals whose type is int; the token int is a keyword.
  • Tokenizing "Hi" as two separators with the identifier Hi in between: it is a single String literal.
  • Tokenizing something like }; as one separator token: it is really two separate separators.
  • Tokenizing something like += as two separate operator tokens (because + and = are operators): it is really one large token (because += is an operator).
  • Forgetting to tokenize parentheses, semicolons, and other separators: everything except white space belongs in some token.
  • Creaing tokens inside comments: each comment is one big token that includes all the characters in the comment.

A Simple Program The following program will serve as a model of Input/Calculate/Output programs in Java. Here are some highlights
  • A large, multi-line (oriented) comment appears at the top of the program. Line-oriented comments appear at various other locations in the program.
  • The Prompt class is imported from the edu.cmu.cs.pattis.cs151xx package.
  • The Application class is declared.
  • Its main method is declared; its body (the statements it executes) is placed between the { and } delimiters.
  • Each simple statement in the body is ended by a semicolon (;) separator.
  • Three variables storing double values are declared.
  • The user is prompted for the value to store in the first two variables.
  • The third variable's value is computed and stored.
  • The third variable's value is printed (after printing a blank line).
Besides just reading this program, practice tokenzing it.

//////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////
//
// Description:
//
//   This program computes the time it take to drop an object (in a vacuum)
// form an arbitrary height in an arbitrary gravitational field (so it can
// be used to calculate drop times on other planets). It models a straight
// input/calculate/output program: the user enters the gravitation field
// and then the height; it calculates thd drop time and then prints in on
// the console.
//
//////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////


import edu.cmu.cs.pattis.cs151xx.Prompt;


public class Application {


  public static void main(String[] args)
    {
      try {

        double gravity;        //meter/sec/sec
        double height;         //meters
        double time;           //sec
		  
		  
        //Input
		  
        gravity = Prompt.forDouble("Enter gravitational acceleration (in meters/sec/sec)");
        height  = Prompt.forDouble("Enter height of drop (in meters)");
		  
		  
        //Calculate
		  
        time = Math.sqrt(2.*height/gravity);
		  
		  
        //Output
		  
        System.out.println("\nDrop time = " + time + " secs");

		  
      }catch (Exception e) {
        e.printStackTrace();
        System.out.println("main method in Application class terminating");
        System.exit(0);  
   }

}

How Experts See Programs In the 1940s, a Dutch psychologist named DeGroot was doing research on chess experts. He performed the following experiment: He sat chess experts down in front of an empty chessboard, all the chess pieces, and a curtain. Behind the curtain was a chessboard with its pieces arranged about 35 moves into a game. The curtain was raised for one minute and then lowered. The chess experts were asked to reconstruct what they remembered from seeing the chessboard behind the curtain.

In most cases, the chess experts were able to completely reconstruct the board that they saw. The same experiment was conducted with chess novices, but most were able to remember only a few locations of the pieces. These results could be interpreted as, "Chess experts have much better memories than novices."

So, DeGroot performed a second (similar) experiment. In the second experiment, the board behind the curtain had the same number of chess pieces, but they were randomly placed on the board; they did not represent an ongoing game. In this modified experiment, the chess experts did only marginally better than the novices. DeGroot's conclusion was that chess experts saw the board differently than novices: they saw not only pieces, but attacking and defending structures, board control, etc.

In this class, I am trying to teach you how to see programs as a programmer sees them: not as a sequence of characters, but at a higher structural level. Tokens is where we start this process.


Problem Set To ensure that you understand all the material in this lecture, please solve the the announced problems after you read the lecture.

If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a CA, a Tutor, or any other student.

  1. Classify each of the following as a legal or illegal identifier. If it is illegal, propose a legal identifier that can take its place (a homophone or homoglyph)
    packAge x12 2Lips
    xOrY sum of squares %Raise
    termInAte u235 $Bill
    x_1 x&Y 1derBoys

  2. What tokens does Java build from the characters ab=c+++d==e. Be sure that you know your Operators.

  3. Classify each of the following numeric literals as int, or double, or illegal (neither); write the equivalent value of each double without using E notation; for each illegal literal, write a legal one with the "same" value.
    5. 3.1415 17
    17.0 1E3 1.E3
    .5E-3 5.4x103 50E-1
    1,024 0.087 .087

  4. What is the difference between 5, 5., five, '5', and "5"? What is the difference between true and "true"?

  5. Write a String literal that includes the characters I've said a million times, "Do not exaggerate!"

  6. How does Java classify each of the following lines
        "//To be or not to be"
        //"To be or not to be"

  7. Does the following line contain one comment or two?
       //A comment //Another comment?

  8. Explain whether X/**/Y is equivalent to XY or X   Y.

  9. Tokenize the following Java Code (be careful): -15

  10. Tokenize the following line of Java code: identify every Java token as either an Identifier, Keyword, Separator, Operator, Literal (for any literal, also specify its type), or Comment. Which (if any) identifiers are keywords?
    int X = Prompt.forInt("SSN",0,999999999); //Filter && use

  11. Choose an appropriate type to represent each of the following pieces of information
    • the number of characters in a file
    • a time of day (accurate to 1 second)
    • the middle initial in a name
    • whether the left mouse button is currently pushed
    • the position of a rotary switch (with 5 positions)
    • the temperature of a blast furnace
    • an indication of whether one quantity is less than, equal to or greater than another
    • the name of a company

  12. This problem (it is tricky, so do it carefully) shows a difficulty with using Block-Oriented comments. Tokenize the following two lines of Java code: identify every token as either an Identifier, Keyword, Separator, Operator, Literal, or Comment. What problem arises?
      x = 0;  /* Initialize x and y to
      y = 1;     their starting values */
    Rewrite the code shown above with Line-Oriented comments instead, to avoid this problem. How can our use of my Java preferences help us avoid this error?

  13. This problem (it is tricky, so do it carefully) shows another difficulty with using Block-Oriented comments. Tokenize the following Java code: identify every token as either an Identifier, Keyword, Separator, Operator, Literal, or Comment. What problem arises?
      /*
        here is an outer
        comment with an
        /* inner comment inside */
        and the finish of the outer
        comment at the end
      */
    Rewrite the code shown above with Line-Oriented comments instead, to avoid this problem. How can our use of my Java preferences help us avoid this error?

  14. Explain why language designers are very reluctant to add new keywords to a programming language. Hint: what problem might this cause in already-written programs?