15-213 (Spring 2005) Section A - Recitation #1

TA: Kun Gao
Email: kgao@cs.cmu.edu
Office Hours: Tuesday 1-2pm, Wednesday, 2-3pm
Location: Doherty Hall 4302D
Section Webpage: http://www.cs.cmu.edu/~kgao/course/15213/

Hi, my name is Kun, the TA for Section A. Recitation is meant for you to ask questions. Please don't hesitate! The more participation, the better the recitation will be. Get to know your fellow students. Two heads are always better than one. But always do your own work. We will run software that is very good at finding cheating in code.

Taking notes can be a pain. I will make each recitation's notes (everything I will write on the board, and then some) available at the end of recitation (try to take some time to review this before the next recitation), so you can concentrate more on thinking and questioning, and less on copying off the board. In return, I ask you all to actively participate in class. This will make class more fun for all of us.

Make sure you frequently access all the course resources (course webpage, autolab, blackboard, course newsgroup, fish machines). Let me know if you have problems with any of these.

First Lab is due Thursday 11:59pm. Come to TA office hours (listed on the course webpage) for help. Get started today (if you haven't already started)!

Bits and Bytes
Everything in a computer is represented by 0's and 1's. This makes the electronics easy to build. However, people don't count with 0's and 1's (at least not normal people). So we need to figure out how to map between the 0's and 1's in computer speak to integers, real numbers, and letters in human speak. This mapping is sometimes not straightforward. Computers only have a limited number of bits to play with (some length w), while numbers are infinite. To become better programmers (to better talk to computers), we need to understand the mappings (how to convert between humanese and computerese), and its nuances (when do the conversions not work as we expect).

Computers talk in binary. But it becomes very inefficient for humans to read thirty-two 0's and 1's on a computer screen. Instead we use hexadecimal. This is just a more compact representation for binary. Hexadecimal is base 16. Conveniently, 4 binary values can also represent 16 different things. Therefore, we use a single character of hexadecimal to represent 4 different binary characters. We need to introduce 6 new characters (if only we had 16 fingers).

0x5 = 0101, 0x9 = 1001, 0xA = 1010, 0xC = 1100, 0xE = 1110
(Hexadecimal on the left, binary is on the right)

There are some basic operations we can do with bits. (&, |, ^, ~, <<, >>), regardless of what they might represent.

Unsigned Integer
We start with representing unsigned integers (0,1,2,3,...) in binary. The straightforward way is to do what we've been doing since grade school with decimals. If we have a binary number with bits [x_(w-1), x_(w-2),...,0], each position has some value. The 0th position has value 2^0, the 1st position has value 2^1, and the w-1 position has value 2^(w-1). The value of a binary number is then x_(w-1)*2^(w-1) + ... +x_0*2^(0).

w = 4, 1010 = 10(ten), 1111 = 15(ten)
w = 6, 011010 = 26(ten), 111111 = 63(ten)
((ten) means base10)

Since w bits can encode 2^w different values, we can represent 0 to 2^w-1 different integers, with 2^w-1 being the largest integer.

Signed Integer/Two's Complement
To represent signed integers (...,-3,-2,-1,0,1,2,3,...) in binary, we use Two's Complement (a particular way of representing signed integers). If we want a negative value out of the bits, the value at each position can't all be positive. We decide to always assign the w-1 bit to be the negative value -2^(w-1). instead, while the rests of the positions have the same significance as in unsigned integer. A binary number with bits [x_(w-1), x_(w-2),...,0] has value x_(w-1)*-2^(w-1) + x_(w-2)*2^(w-2) + ... +x_0*2^(0).

w = 5, 00101 = 5(ten), 11010 = -4(ten)

A simple trick for finding the representation for a negative number -x, is to start with the positive representation for x (since its positive, unsigned and signed representation is the same), invert the bits (0 becomes 1, 1 becomes 0), and then add 1 (0x00000...0001).

7(ten) = 00111,
00111 inverted is 11000
11000 + 00001 = 11001

There is an asymmetry here though. The smallest number is -2^(w-1), but the largest number is 2^(w-1)-1. So while a positive number will always have a complement, the negative number will not (we need more bits to represent it).

Logical shifting >> on a signed integer might change the sign. Therefore, >> shifts are arithmetic shifts (replicates the most significant bit).

Integers in C
C has the construct 'unsigned int' for unsigned integers, and 'int' for integers represented with Two's complement. Things get a little hairy when unsigned and signed integers are mixed. Whenever C sees comparison between a signed and unsigned integer, it will automatically convert the signed integer to mean unsigned. In another words, it will treat the signed integer as an unsigned one
Normal addition, multiplication and such do not perform as expected. We can add or multiple two large numbers, and the result will not be representable. The arithmetic on numbers represented with w bits is arithmetic modulo 2^w. Basically this means we need to discard anything equal to or more than 2^w.

w=4, unsigned arithmetic
7(ten) + 8(ten) = 0111 + 1000 = 1111 (mod 2^4) = 1111 = 15(ten) [ this is the correct result ]
7(ten) + 11(ten) = 0111 + 1011 = 10010 (mod 2^4) = 0010 = 2 (ten) [ this is the incorrect result ]

w=4, signed arithmetic
3(ten) + 5(ten) = 0011 + 0101 = 1000 (mod 2^4) = 1000 = (-8) [ this is the incorrect result ]
-8(ten) + -7(ten) = 1000 + 1001 = 10001 (mod 2^4) = 0001 = 1 (ten) [ this is the incorrect result ]

Floating Point/IEEE Standard
We've just seen how integers are represented with bits. We now try to represent real numbers. Real numbers are a lot trickier. They can be as big as 3.7 x 10^47 or as small as 0.0000002341. We represent real numbers with the IEEE standard by writing it in floating point (like in scientific notation). This standard not only tells us how to represent floating point numbers, but defines special numbers (such as infinity), and the behavior of rounding.

Lets say we have some number. We first want to write in it floating point (base2 of course!), with each floating point having a Sign, Exponent, and Mantissa. Note that we take the extra step of normalizing the numbers as well. Normally, floating points are represented as normalized. This means that there is an implicit 1 in front of the mantissa. Why an implicit 1? Because any number (that is not zero) will have a one, it just depends what is the significance of the one.

½(ten) = +(…+0*2^1+0*2^0+1*2^-1+0*2^-2+...) = +0.1 x 2^0 = +1.0 x 2^-1
-7/8(ten) = -(…+0*2^1+0*2^0+1*2^-1+2*2^-2+1*2^-3+0*2^-4+...) = -0.111 x 2^0 = -1.11 x 2^-1
-17(ten) = -(…+1*2^4+0*2^3+0*2^2+0*2^1+1*2^0+...) = -10001.0 x 2^0 = -1.00001 x 2^4

Now that we know how to write down a floating point number mathematically in base2, we are ready to see how this gets turned into bits. The IEEE standard specifies how many bits to use to represent each component. Obviously, we only need 1 bit for the Sign. However, we need to figure out for the remaining bits, how many go to exponent, and how many go to the mantissa. There is a tradeoff in that the bigger the exponent is, the bigger the number, but the less precise it will be. The IEEE standard for 32 bit words uses 1 bit for sign bits (S), 8 bits for the exponent bits (E), and 23 bits for the mantissa bits (M).

If the Sign is positive, then the S bit=0. If the Sign is negative, then the S bit=1. The exponent bits E is encoded as an unsigned number. We represent negative exponents by having a Bias such that E - Bias = Exponent. For k bits of E, the Bias is 2^(k-1) - 1. For example, for k=8, Bias is 127.

In normalized mode (meaning the Exponent is not all 0's or all 1's), there is an implicit '1' in front of the M. Otherwise, we encode M just as we wrote down the Mantissa. Now we can put together SEM in that order for the binary representation of the number.

Example (for IEEE floating point):
½(ten) = +1.0 x 2^-1
S = 0, E = 01111110 (Exponent(-1) = E(126) - Bias(127)), M = {1}.0000...000
Bits (0 01111110 00...) = 0x3F000000

-7/8(ten) = -1.11 x 2^-1
S = 1, E = 01111110 (Exponent(-1) = E(126) - Bias(127)), M = {1}.11000...000
Bits (1 01111110 1100...) = 0xBF600000

-17(ten) = -1.00001 x 2^4
S = 1, E = 10000011 (Exponent(4) = E(131) - Bias(127), M = {1}.00001
Bits (1 10000011 0000100...) = 0xC1840000

In the special case, denormalized numbers are for representing very small numbers. In this case, the exponent bits is always 0's, and the Exponent = 1-Bias, and there is no implicit 1 in front of the mantissa.

0 00000000 1101000... = + 0.1101 x 2^-126 = + 1.101 x 2^-127
0 00000000 00000010... = + 0.0000001 x 2^-126 = + 1.0 x 2 ^ -133

Finally, there are special cases for +/- 0, +/- infinity, and NaN (not a number).

100000...000 = -0
000000...000 = +0
111111111000...000 = -infinity
011111111000...000 = +infinity
011111111001...1000 = NaN

With respect to rounding, by default IEEE specifies rounding to even numbers so there is no statistical bias. For example, rounding 11.0110{00} gets 11.0110. Rounding 11.0110{10} gets 11.0110, while rounding 11.0111{10} gets 11.1000. Interestingly, the IRS uses rounding up instead of rounding to even.

Group Exercises
1. What does 0x800F0000 represent in IEEE floating point (32 bits)?
2. What number does 0xC1B00000 represent in IEEE floating point (32 bits)?
3. What is the IEEE representation (in hex) for 5.25(ten)?

Lab Example
1. abs(int x) - given number x, return the absolute value of x
2. parity(int x) - given bit pattern x, return the parity of x