Official Solution for Exam 1 CS 15-347 Spring 1997 PROBLEM 1. Solution: ROW=8, COL=14 -- on entry: r4=&a, r5=i, r6=j fetch_aval: andi $5,$5,0x0007 -- r5 = i mod 8 = im8 sll $6,$6,2 -- r6 = 4*j sll $2,$5,3 -- r2 = 8*im8 subu $2,$2,$5 -- r2 = 8*im8 - i = 7*im8 sll $2,$2,3 -- r2 = 7*im8 = 56*im8 addu $2,$2,$4 -- r2 = &a + 56*im8 addu $6,$6,$2 -- r6 = &a + 56*im8 + 4*j lw $2,0($6) j $31 In general, if an integer array a is defined as a[ROW][COL], then &a[i][j] = &a + i*(4*COL) + 4*j. The first statement computes the value of (i mod ROW), thus ROW=8. In the last statement, the address (56*im8 + &a) + (4*j) implies that COL = 56/4 = 14. ----------------------------------------- PROBLEM 2. Solution: N >= delta/A Let t' be the measured running time of the loop and let t be its real running time. Our goal is |t'-t|/N <= A, which implies delta/N <= A, which implies N >= delta/A. ----------------------------------------- PROBLEM 3. a. How many bytes are allocated for callsum's stack frame? 40 b. How many bytes are allocated for sum stack frame? 24 c. Where is local variable result stored in callsum's stack frame? sp+24 d. Where is &result stored in callsum's stack frame? sp+20 e. In callsum, which instruction computes &result}. 0x4c: addiu r2,sp,24 f. Which arguments to sum are stored in a register? a0:r4 a1:r5 a2:r6 a3:r7 g. Which arguments to sum are stored in a stack frame? a4:callsum:16 val:callsum:20 ----------------------------------------- PROBLEM 4. a. k(2):c(1):fill(1):l(2):fill(2):i(4) b. Variables of type struct1 are allocated 12 bytes. c. Variables of type struct1 requires word alignment (i.e., the starting address must be multiple of 4). d1. k(2):c(1):fill(1):l(2):fill(2):i(4):fill(4) d2. d(8):fill(8) e. Variables of type union1 are allocated 16 bytes f. Variables of type union1 require double word alignment (i.e., the starting address must be a multiple of 8). ----------------------------------------- PROBLEM 5. A___ 0x9c: add.s $f2,$f2,$f4 D___ 0xa0: div.s $f4,$f0,$f6 L___ 0xa4: lwc1 $f0,0(r3) U___ 0xa8: addiu r5,r5,1 T___ 0xac: slt r2,r5,r6 T___ 0xb0: BNE r2,r0,0x9c L___ 0xb4: addiu r3,r3,4 ----------------------------------------- PROBLEM 6. a. Prologue: LOAD 0, DIVIDE 0, LOAD 1 b. Epilogue: ACCUMULATE n-2, DIVIDE n-1, ACCUMULATE n-1 c. LOAD: i+2 d. DIVIDE: i+1 e. ACCUMULATE: i Here is the picture you wanted to have in mind for this problem: L0 | epilogue D0 L1 | epilogue A0 D1 L2 | ith loop iteration A1 D2 | prologue A2 | prologue ----------------------------------------- PROBLEM 7. a. The maximum overall speedup is 25/10 = 2.5, which is achieved when the outer loop requires 0% of the total running time. b. By Amdahl's Law S_overall = 1/[(1-F_enhanced) + (F_enhanced/S_enhanced)] Given F_enhanced = .9 and S_overall = 2, we want to find S_enhanced. By the principle of plug and chug, we have 2 = 1/(.1 + .9/S_enhanced) A little algebra then gives the answer: S_enhanced = 9/4 = 2.25. When dealing with speedup problems like this, keep in mind that speedup of any kind has a precise unambiguous definition: Speedup = Time of the unimproved version / Time of the improved version. ----------------------------------------- PROBLEM 8. B______ hashTable[0]->var NONE__ hashTable[2]->var NONE___ hashTable[0]->flags NONE__ hashTable[i]->next->left NONE___ hashTable[i]->next->left->next D_____ hashTable[i]->right->left C______ hashTable[i]->next->next->next F_____ hashTable[i]->left->right