Official Solution for Exam 1
CS 15-347
Spring 1997


PROBLEM 1. 

Solution: ROW=8, COL=14

-- on entry: r4=&a, r5=i, r6=j
fetch_aval:
  andi  $5,$5,0x0007	-- r5 = i mod 8 = im8
  sll   $6,$6,2		-- r6 = 4*j
  sll   $2,$5,3		-- r2 = 8*im8
  subu  $2,$2,$5	-- r2 = 8*im8 - i = 7*im8
  sll   $2,$2,3		-- r2 = 7*im8 = 56*im8
  addu  $2,$2,$4	-- r2 = &a + 56*im8
  addu  $6,$6,$2	-- r6 = &a + 56*im8 + 4*j	
  lw    $2,0($6) 	
  j     $31

In general, if an integer array a is defined as a[ROW][COL], then
&a[i][j] = &a + i*(4*COL) + 4*j.  The first statement computes the
value of (i mod ROW), thus ROW=8.  In the last statement, the address
(56*im8 + &a) + (4*j) implies that COL = 56/4 = 14.

-----------------------------------------
PROBLEM 2. 

Solution: N >= delta/A

Let t' be the measured running time of the loop and let t be its real
running time. Our goal is |t'-t|/N <= A, which implies delta/N <= A,
which implies N >= delta/A.

-----------------------------------------
PROBLEM 3.  

a. How many bytes are allocated for callsum's stack frame? 40
b. How many bytes are allocated for sum stack frame? 24
c. Where is local variable result stored in callsum's stack frame? sp+24
d. Where is &result stored in callsum's stack frame? sp+20
e. In callsum, which instruction computes &result}. 0x4c: addiu r2,sp,24
f. Which arguments to sum are stored in a register? 
	a0:r4	a1:r5	a2:r6	a3:r7
g. Which arguments to sum are stored in a stack frame?  
	a4:callsum:16	val:callsum:20	

-----------------------------------------
PROBLEM 4. 

a. k(2):c(1):fill(1):l(2):fill(2):i(4)

b. Variables of type struct1 are allocated 12 bytes.

c. Variables of type struct1 requires word alignment (i.e., the 
starting address must be multiple of 4).

d1. k(2):c(1):fill(1):l(2):fill(2):i(4):fill(4)
d2. d(8):fill(8)   

e. Variables of type union1 are allocated 16 bytes  

f. Variables of type union1 require double word alignment (i.e., the
starting address must be a multiple of 8).

-----------------------------------------
PROBLEM 5.

 A___  0x9c:	add.s	$f2,$f2,$f4
 D___  0xa0:	div.s	$f4,$f0,$f6
 L___  0xa4:	lwc1	$f0,0(r3)
 U___  0xa8:	addiu	r5,r5,1
 T___  0xac:	slt	r2,r5,r6
 T___  0xb0:	BNE	r2,r0,0x9c
 L___  0xb4:	addiu	r3,r3,4

-----------------------------------------
PROBLEM 6.

a. Prologue: LOAD 0, DIVIDE 0, LOAD 1
b. Epilogue: ACCUMULATE n-2, DIVIDE n-1, ACCUMULATE n-1
c. LOAD: i+2
d. DIVIDE: i+1
e. ACCUMULATE: i

Here is the picture you wanted to have in mind for this problem:

L0	 | epilogue	
D0 L1    | epilogue
A0 D1 L2 | ith loop iteration 
   A1 D2 | prologue
      A2 | prologue	

-----------------------------------------
PROBLEM 7.

a. The maximum overall speedup is 25/10 = 2.5, which is achieved when
the outer loop requires 0% of the total running time.

b. By Amdahl's Law
 
	S_overall = 1/[(1-F_enhanced) + (F_enhanced/S_enhanced)]

Given F_enhanced = .9 and S_overall = 2, we want to find S_enhanced. 
By the principle of plug and chug, we have

	2 = 1/(.1 + .9/S_enhanced)

A little algebra then gives the answer: 

	S_enhanced = 9/4 = 2.25.


When dealing with speedup problems like this, keep in mind that
speedup of any kind has a precise unambiguous definition:

Speedup = Time of the unimproved version / Time of the improved version.

-----------------------------------------
PROBLEM 8.

  B______ hashTable[0]->var                NONE__ hashTable[2]->var

  NONE___ hashTable[0]->flags              NONE__ hashTable[i]->next->left

  NONE___ hashTable[i]->next->left->next   D_____ hashTable[i]->right->left

  C______ hashTable[i]->next->next->next   F_____ hashTable[i]->left->right