# 15-213 Introduction to Computer Systems Lecture 11: Out-of-Order Processing

• Slides: none
• Code: 11-outoforder (11-outoforder.tar)
• Concepts:
• Superscalar processor
• Pipelining
• Latency and issue time
• Functional units
• Instructions and processor operations
• Register renaming
• Branch prediction
• Data dependency
• Timed dataflow diagram
• Resource limitations
• Loop splitting
• Previous lecture: Program Optimization
• Next lecture: Cache Memories

## Notes on Lab Machines

These are a few notes about the characteristics of the processor used for this class, the Intel Nocona Xeon, which is a dual 3.2 GHz IA32-EM64T processor. This information was copied from the Fall 2005 instance of this course.

### Functional Units

• 2 "simple" integer units (e.g., add, bit ops)
• 1 "complex" integer unit (e.g., multiply, divide)
• Floating point move unit (all conversions)
• Floating point/SSE3 unit (all floating point arithmetic)

### Some Performance Numbers

#### Latency/Issue Times on Various Chips

These were determined experimentally.

Nocona Opteron Pentium M Pentium III
Int +0.5/0.51/1?1/11/1
Int *10/13/14/14/1
Int /36/3646/4620/2036/36
Long /106/10676/76
FP +5/24/13/13/1
FP *7/24/15/25/2
Float /32143636
Double /46173636
Store3/1

#### Straighforward combine

Code slightly different from book, shown here using integer addition.

```void combine(int* data, int n, int* dest) {
int i;
int r = 0;
for (i = 0; i < n; i++)
r = r + data[i];
*dest = r;
}
```

Results:

```CPE	Int +	Int *	FP +	FP *
2.20	10.00	5.00	7.00
```

#### Unroll loop by 2, 2-way parallelism

Code slightly different from book, shown here using integer multiplication.

```void combine_step2(int* data, int n, int* dest) {
int i;
int r0 = 1;			/* even elements */
int r1 = 1;			/* odd elements */
int limit = n-1;		/* new limit for stepping by 2 */
for (i = 0; i < limit; i += 2) {
r0 = r0 * data[i];
r1 = r1 * data[i+1];
}
/* multiplying in possibly remaining elements (here at most one) */
for ( ; i < n; i++)
r0 = r0 * data[i];
*dest = r0 * r1;
}
```

Results:

```CPE	Int +	Int *	FP +	FP *
1.50	 5.00	2.50	3.50
```

fp@cs
Frank Pfenning