15-213 Lecture Q&A Through Test #1


Saturday, Feburary 23, 2002

Question:

Do we need to know how to do problem 6 on the Spring '01 exam? Reading over it, it seemed a little bit unfamiliar, but I wante

Answer:

Nah, last spring had a different schedule than this spring. This question was all about linking and what linkers have to change such that compiled code works. Anyway, I don't think we've even gotten to that in lecture, so don't worry about it.


Saturday, Feburary 16, 2002

Question:

Is it possible to do an arbitrary JMP command? In other words, can I give the assembler an explicit location in memory to jump to? All of the code that is generated by gcc uses relative JMPs (i.e. jump forward e0 bytes).

Answer:

Yes, it is -- and you've already seen it, but probably didn't notice it!

Think back to how we translated loops. We always converted them to a form where they jumped to a label, such as "jmp .L1". That ".L1" is just a human-friendly representation fo an address. After the code is assembled, it is converted into a hard address.


Friday, Feburary 9, 2002

Question:

When an item is pushed onto the stack, when is it "used?"

Answer:

Data on the stack can be used in any number of ways. It might be read off of the stack and removed by a pop operation.

Or, it might be accessed on the stack without removing it. For example, it is very common for a function to read its parameters using something like "movl 16(%ebp), %ebx". Remember that %ebp is the "base pointer" also called the "frame pointer". It holds the address of the bottom (beginning, least recently pushed, high address) of a function's stack frame. Above the base pointer, a function can find its arguments, which were pushed by the caller. So, "movl 16(%ebp), %ebx" says, read the 'double word' (4 bytes) beginning 16 bytes above the base pointer (in the caller's frame) into the %ebx register." This makes the parameter available in %ebx, while leaving it in the stack. (Although we can peek into the stack, we can't take things out of the inside, only from the top by popping).

Values on the stack can also be removed by the "ret" (return) operation. "call" pushes the return address onto the stack before jumping. In a complementary way, "ret" pops this address of of the stack and jumps to it.

(Aside: If you are wondering why a "movl" (move long), moves only 1 word (4 bytes) and "movw" (move word) moves only 1/2 word (2 bytes), it is for backward compatibility. Although the IA32 architecture uses a 32-bit word, these are called "double words" for backward compatibility with the old 16-bit architecture, which had 2 byte words.)


Friday, Feburary 8, 2002

Question:

What is the significance of this assembly instruction?
        cmpl   $0x0,0xfffffffc(%ebp)
    
I understand what it means to add '8' or some smaller number to ebp: you're looking up something in the stack. But what does it mean when you add something like this that will almost certainly cause an overflow?

Answer:

Yes. The instruction will look up some location in the stack. Adding 0xfffffffc to %ebp is the same as add -4 to the value of %ebp. Remember the %ebp points the up bound of the current procedure's stack frame, therefore this instruction will access a memory location from the current stack frame. Find out the structure of the stack frame and you will know what lives in that memory location.


Question:

When an item is pushed onto the stack, when is it "used?"

Answer:

Data on the stack can be used in any number of ways. It might be read off of the stack and removed by a pop operation.

Or, it might be accessed on the stack without removing it. For example, it is very common for a function to read its parameters using something like "movl 16(%ebp), %ebx". Remember that %ebp is the "base pointer" also called the "frame pointer". It holds the address of the bottom (beginning, least recently pushed, high address) of a function's stack frame. Above the base pointer, a function can find its arguments, which were pushed by the caller. So, "movl 16(%ebp), %ebx" says, read the 'double word' (4 bytes) beginning 16 bytes above the base pointer (in the caller's frame) into the %ebx register." This makes the parameter available in %ebx, while leaving it in the stack. (Although we can peek into the stack, we can't take things out of the inside, only from the top by popping).

Values on the stack can also be removed by the "ret" (return) operation. "call" pushes the return address onto the stack before jumping. In a complementary way, "ret" pops this address of of the stack and jumps to it.

(Aside: If you are wondering why a "movl" (move long), moves only 1 word (4 bytes) and "movw" (move word) moves only 1/2 word (2 bytes), it is for backward compatibility. Although the IA32 architecture uses a 32-bit word, these are called "double words" for backward compatibility with the old 16-bit architecture, which had 2 byte words.)


Monday, February 4, 2002

Question:

What do the _init function and .init section do? Are they created by the compiler, or are they actual user-defined functions?

Answer:

Executible programs can be packaged in any number of ways. One of these is known as the Execute and Link Format (ELF). Linux follows this standard. ELF formats are composed of separate pieces or sections. One of these section is called ".init" and another ".fini".

".init" contains code that should be executed before main() is called and ".fini" contains code that should be executed after main returns, if the program exits normally (doesn't terminate or die).

You can think of the code within these two sections as the constructor and destructor for the program, itself. For example, the code that loads shared libraries (library functions that bind at runtime, such as DLLs in Windows and .so's is UNIX) is located wihtin the ".init" section.

In C++, these sections are also used to call global constructors and global destructors, which are, themselves, typically stored in separate sections. ".ctors" and ".dtors".

The _init() function is compiler generated. It lives in the ".init" section and is responsible for initialization discussed above. The complement is true for _fini(), which lives in the ".fini" section and does the tear-down.

I usually don't get quite this "down and dirty" very often, so I don't have a tremendous amount of experience in this area. But, my understanding is that both _init() and _fini() are called by a function within the ".text" area, the program's executible code, called _start(). The big picture is that _start() basically does a little bit of preparation, then calls _init(), then main(), then _fini().

But, let me add a big footnote here that says that ELF only defines the organization of the executible file. I don't think it actually specify all of the details of a program's preamble. As a result, I wouldn't be surprised to see several different compilers, each of which employs ELF, generating slightly different preamble code -- even where the differnces aren't necessarily mandated by the underlying hardware architectures.


Friday, Feburary 1, 2002

Question:

How many conditional flags does a computer have? Basically, all flags have 0 in it, and when certain condition is met, the flag becomes to have 1 ? This means "set flag"?

Answer:

There are 6 condisitonal flags (status flags). They are part of the flags register, which is 32-bits wide and contains 32 flags. Most of these flags aren't really of interest to us. We're basically concerned with four flags CF, SF, ZF, and OF. These flags are typically used with operands like testl and cmpl to control the flow of execution through conditionals, such as if-else.

Well, each of these flags is 1 bit, so it has a value of either 0 or 1, depending on the result of the last mathematical or logical operation. Each mathematical or logical will reset these flags.


Question:

Insturction "setns" checks condtion ~SF. If ~SF is true, does it mean that SF has "1" ? The "setns" sets single bytes based on condition code. So, "setns %al" write "0" to lower single byte of " destination register" when the condtion code is false ?

Answer:

Yes, SF is "1" when ~SF is true.

"setns" is interested in the sign flag, SF. If SF is 1, it sets the byte value to 0, otherwise it sets it to 1. "setne" does exactly the same thing, except that it inspects the zero flag (ZF), not the sign flag.


Question:

The proram uses " movzbl " instruction to fill the high order bytes of the register after the "set" instruction. What is the meaning of changing bit expression of destination register according to the condtional code? Does it mean that the destination register will have either 0x00000000 or 0x000000ff?

Answer:

Since the set instruction only sets the low-order byte of the 4-byte register, the value other bytes is indeterminate. The "movzbl" instruction moves 1 bytes and zeros everything else. As a reuslt, it can be used to initialize the other bytes of the 4 byte destination register to 0.


Thursday, January 30, 2002

Question:

In leal operator, it has two operands - first is the form of memory reference and second is the name of register.

Answer:

Yes, this sounds correct. But, let me repeat it back to you, just to make double-sure.
  • The first operand is the source. The second operand is the destination

  • The first operand operand takes the form of a "scaled, indexed operand", Displacement (Base, Index, scalar), where the result is equal to (Base + (Scalar *Index) + Displacement)

  • This can be used, for example, to compute the address of some element within an array. For example if we have an array of integers begging at address Arr, the address of the fifth element could be named, (Arr, 5, 4). Since the size of an integer is 4 bytes, the 5th element is located 20 bytes past Arr or (Arr + 4*5).

  • Since this instruction just generates a number, but does not dereference it as an address, it can actually be used for simple computation. Any expression of the form (Base + (Scalar *Index)) + Displacement) can be computed this way. The result is simply stored in the destination register.

  • Please note that a displacement of 0 is assumed, if it is not specified. Similarly, a scalar of 1 is assumed, if not specified.

  • In other words, it works just like a movl, except it doesn't dereference the result. Instead, it sticks the result directly into the destination register.


Question:

It means that in the leal operand, (%ebp) means a address of memory?

Answer:

With leal, the second operand must be the name of a register, for example, "%eax". It is not leagal for the second operand, the destination, to be an indirect reference to memory via a register. For example, "(%eax)" is not a legal destination for leal.

For the first operand, something like "(%eax)" is legal, but it doesn't do what you might expect -- at least if you're expecting similar semantics to movl. Instead, it just uses the value of "%eax" -- it does not dereference it to get the value at the named memory location. Remember, this instruction loads an address, not memory.


Question:

Given, "leal (%edx, %ebx), %eax", %edx and %ebx have values and the values are memory address?

Answer:

Maybe, maybe not. The leal instruction doesn't care. Instead, it just "crunches the numbers". This instruction was designed to play with addresses, but in practice, it is much more flexible. In the example above, where "%edx" holds the value of "a" and "%ebx" holds the value of "b", we don't know the type of the result. If "a" and "b" are pointers, the result will be an address. If "a" and "b" are ints, the result will be an "int" (Although, note that overflow doesn't qork quite right if you use leal to add, instead of addl)


Question:

Given, "leal (%edx, %ebx), %eax", %eax will have a memory address decided by %edx + %ebx ?

Answer:

"%eax" will hold the result of adding "%edx" and "%ebx". The type of the result, be it a pointer (addresss), int, etc, depend on the types of the operands. Assembly isn't strongly typed -- it does what you ask with whatever you give it. In this case, "%eax = %edx + %ebx"


Question:

How does "leal(%edx, %ebx)" do actual arithmatic calculation with only memory addresses?

Answer:

Keep in mind here that assemly is not really typed. "leal" just sees the bits in the registers. Adding an int is much the same as adding an address. It just crunches the numbers. It is copletely up to the programmer to interprete the meaning of the result.


Thursday, January 24, 2002

Question:

Regarding unsigned and two's complement integers, why is UMax = 2 * TMax + 1?

Answer:

In Two's complement encoding, for positive numbers the MSB must be a zero since it acts as the sign bit. Therefore you lose the 2^(w-1) numbers you can represent using the MSB. That's why:

UMax = TMax + 2^(w-1) = 2 * TMax + 1


Question:

I am not sure how casting works. For an unsigned integer, is the sign extension bit 0 regardless of the MSB.

Answer:

When extending a number from a short to an int, or from a int to a long, the key is to preserve the value. For unsigned numbers, we may just use a 0 as the extension bit regardless of the MSB. However for signed numbers, the sign extension bit is the MSB. Therefore for negative numbers where the MSB is 1, the sign extension bit is 1. Otherwise consider what would happen if you just extend a short negative number to an int with a 0 sign extension bit. Then our new MSB would be 0, and we converted a neg. number to a positive number in extending it.


Saturday, January 19, 2002

Question:

I'm currently registered for one recitation and would like to attend another. I have a course confliuct. What should I do?

Answer:

We currently have a big space crunch right now. All fo the recitations are full with long waiting lists. We are going to try to load balance these and admit as many people as we can during the first recitation -- one week from Monday.

We will also talk about this at our Monday evening staff meeting. Ideally, we'll develop an electronic way for you to communicate your recitation preference to us prior to Monday's recitation.

For now, please contact the instructor of the section that you want to attend.


Tuesday, January 14, 2002

Question:

Can we use last semester's textbook?

Answer:

No. Please buy a new copy at the bookstore. Your textbook is excellent and very nearly final. But, the last unit was recently revised. The revision was not just cosmetic -- the last unit has been expanded and reorganized.

The good news is that since the textbook is currently being beta tested, you get a really, really good price.


Question:

What is the differences between Program Counter, Register and Register File?

In text book, when it says "register", it points one of the register files? How is the PC updated to point next instruction?

Answer:

Registers are small, named, pieces of storage within the processor. You can think of them as variables that are implemented in hardware.

Typically, the program manages them using loads and stores. "Load value X from memory into register Y", and "Store the value from register Y into main memory at address Z". When the processor performs operations, it generally does so on values stored within registers, because they operate at the same speed as the processor -- no delay.

The Program Counter (PC) is a special-purpose register. It tells the processor which instruction to execute next. It can be set, by the program, like any other register. This is how loops, &c are implemented. Instead of setting the value using load, it is typically done with a "jump" instruction. The other thing that is special about this register is that it is automatically incremented with each executing instruction -- so the processor will execute the next instruction next.

Although each register is conceptually separate, they are usually best implemented as, in effect, one small memory module. This memory module is called the register file. At the most basic level, it is word addressed, just like normal memory. Most registers are one word long (some are two).

When you write assembly code, you refer to registers by name. But, at the machine level, these names are offsets into (addresses within) the register file.

But, please don't get bogged down with the details at this point. The important things to remember are these. Registers are small, fast, named places to store things within the processor. Some are general purpose and are used by the compiler to manipulate values, whereas others are special purpose and have very specific, predetermined roles. The register file is nothing more than the name we give to the collection of all of the registers within the processor.