15-213 Introduction to Computer Systems
Lecture 11: Out-of-Order Processing

  • Reading: Ch 5.7-5.12; Notes
  • Slides: none
  • Code: 11-outoforder (11-outoforder.tar)
  • Concepts:
    • Superscalar processor
    • Pipelining
    • Latency and issue time
    • Functional units
    • Instructions and processor operations
    • Register renaming
    • Branch prediction
    • Data dependency
    • Timed dataflow diagram
    • Resource limitations
    • Loop splitting
  • Previous lecture: Program Optimization
  • Next lecture: Cache Memories

Notes on Lab Machines

These are a few notes about the characteristics of the processor used for this class, the Intel Nocona Xeon, which is a dual 3.2 GHz IA32-EM64T processor. This information was copied from the Fall 2005 instance of this course.

Functional Units

  • 2 "simple" integer units (e.g., add, bit ops)
  • 1 "complex" integer unit (e.g., multiply, divide)
  • Floating point move unit (all conversions)
  • Floating point/SSE3 unit (all floating point arithmetic)
  • Load (including address computation)
  • Store (including address computation)

Some Performance Numbers

Latency/Issue Times on Various Chips

These were determined experimentally.

 Nocona Opteron Pentium M Pentium III
Int +0.5/0.51/1?1/11/1
Int *10/13/14/14/1
Int /36/3646/4620/2036/36
Long /106/10676/76  
FP +5/24/13/13/1
FP *7/24/15/25/2
Float /32143636
Double /46173636
Load3/1
Store3/1

Straighforward combine

Code slightly different from book, shown here using integer addition.

void combine(int* data, int n, int* dest) {
  int i;
  int r = 0;
  for (i = 0; i < n; i++)
    r = r + data[i];
  *dest = r;
}

Results:

CPE	Int +	Int *	FP +	FP *
	2.20	10.00	5.00	7.00

Unroll loop by 2, 2-way parallelism

Code slightly different from book, shown here using integer multiplication.

void combine_step2(int* data, int n, int* dest) {
  int i;
  int r0 = 1;			/* even elements */
  int r1 = 1;			/* odd elements */
  int limit = n-1;		/* new limit for stepping by 2 */
  for (i = 0; i < limit; i += 2) {
    r0 = r0 * data[i];
    r1 = r1 * data[i+1];
  }
  /* multiplying in possibly remaining elements (here at most one) */
  for ( ; i < n; i++)
    r0 = r0 * data[i];
  *dest = r0 * r1;
}

Results:

CPE	Int +	Int *	FP +	FP *
	1.50	 5.00	2.50	3.50

[ Home | Schedule | Assignments | Exams | Lab Machines | Resources ]
[ Textbook | Autolab ]
[ Newsgroup | Blackboard ]

fp@cs
Frank Pfenning