Topics
• Memory Hierarchy Basics
• Static RAM
• Dynamic RAM
• Magnetic Disks
• Access Time Gap
Computer System

Processor  
	Reg

Cache

Memory-I/O bus

Memory

I/O controller

Disk

I/O controller

Display

I/O controller

Network
Levels in Memory Hierarchy

- Register:
  - size: 200 B
  - speed: 3 ns
  - $/Mbyte: $100/MB
  - block size: 8 B

- Cache:
  - size: 32 KB / 4 MB
  - speed: 4 ns
  - $/Mbyte: $1.50/MB
  - block size: 32 B

- Memory:
  - size: 128 MB
  - speed: 60 ns
  - $/Mbyte: $1.50/MB
  - block size: 8 KB

- Disk Memory:
  - size: 20 GB
  - speed: 8 ms
  - $/Mbyte: $0.05/MB

larger, slower, cheaper
Scaling to 0.1µm

- Semiconductor Industry Association, 1992 Technology Workshop
  - Projected future technology based on past trends

<table>
<thead>
<tr>
<th>Year</th>
<th>Feature size (µm)</th>
<th>DRAM capacity</th>
<th>Chip area (cm²)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1992</td>
<td>0.5</td>
<td>16M</td>
<td>2.5</td>
</tr>
<tr>
<td>1995</td>
<td>0.35</td>
<td>64M</td>
<td>4.0</td>
</tr>
<tr>
<td>1998</td>
<td>0.25</td>
<td>256M</td>
<td>6.0</td>
</tr>
<tr>
<td>2001</td>
<td>0.18</td>
<td>1G</td>
<td>8.0</td>
</tr>
<tr>
<td>2004</td>
<td>0.12</td>
<td>4G</td>
<td>10.0</td>
</tr>
<tr>
<td>2007</td>
<td>0.10</td>
<td>16G</td>
<td>12.5</td>
</tr>
</tbody>
</table>

- Industry is slightly ahead of projection
- Doubles every 1.5 years
- Prediction on track
- Way off! Chips staying small
Static RAM (SRAM)

Fast
• ~4 nsec access time

Persistent
• as long as power is supplied
• no refresh required

Expensive
• ~$100/MByte
• 6 transistors/bit

Stable
• High immunity to noise and environmental disturbances

Technology for caches
Anatomy of an SRAM Cell

Write:
1. set bit lines to new data value
   • b’ is set to the opposite of b
2. raise word line to “high”
   ⇒ sets cell to new state (may involve flipping relative to old state)

Read:
1. set bit lines high
2. set word line high
3. see which bit line goes low

Terminology:
- bit line: carries data
- word line: used for addressing

Stable Configurations

0 1 1 0

(6 transistors)
SRAM Cell Principle

Inverter Amplifies
- Negative gain
- Slope $< -1$ in middle
- Saturates at ends

Inverter Pair Amplifies
- Positive gain
- Slope $> 1$ in middle
- Saturates at ends
Bistable Element

Stability
- Require \( Vin = V2 \)
- Stable at endpoints
  - recover from perturbation
- Metastable in middle
  - Fall out when perturbed

Ball on Ramp Analogy
Example SRAM Configuration (16 x 8)

Address decoder

A0
A1
A2
A3

W0
W1
W15

b7
b7'

b1
b1'

b0
b0'

memory cells

sense/write amps

Input/output lines

d7
d1
d0

R/W

class17.ppt
Dynamic RAM (DRAM)

Slower than SRAM
  • access time ~60 nsec

Nonpersistent
  • every row must be accessed every ~1 ms (refreshed)

Cheaper than SRAM
  • ~$1.50 / MByte
  • 1 transistor/bit

Fragile
  • electrical noise, light, radiation

Workhorse memory technology
Anatomy of a DRAM Cell

Writing:
- Word Line
- Bit Line
- Storage Node

Reading:
- Word Line
- Bit Line
- $\Delta V \sim \frac{C_{\text{node}}}{C_{\text{BL}}}$
Addressing Arrays with Bits

Array Size
- \( R \) rows, \( R = 2^r \)
- \( C \) columns, \( C = 2^c \)
- \( N = R \times C \) bits of memory

Addressing
- Addresses are \( n \) bits, where \( N = 2^n \)
- \( \text{row(address)} = \text{address} / C \)
  - leftmost \( r \) bits of address
- \( \text{col(address)} = \text{address} \mod C \)
  - rightmost bits of address

Example
- \( R = 2 \)
- \( C = 4 \)
- \( \text{address} = 6 \)

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>000</td>
<td>001</td>
<td>010</td>
<td>011</td>
</tr>
<tr>
<td>1</td>
<td>100</td>
<td>101</td>
<td>110</td>
<td>111</td>
</tr>
</tbody>
</table>

\( \text{row} 1 \) \hspace{2cm} \text{col 2}
Example 2-Level Decode DRAM (64Kx1)

Row address latch

Row decoder

256 Rows

256x256 cell array

Column address latch

Column latch and decoder

256 Columns

Provide 16-bit address in two 8-bit chunks

RAS

Row

A7-A0

col

CAS

Dout Din

R/W'

256 Rows

256 Columns
DRAM Operation

Row Address (~50ns)
- Set Row address on address lines & strobe RAS
- Entire row read & stored in column latches
- Contents of row of memory cells destroyed

Column Address (~10ns)
- Set Column address on address lines & strobe CAS
- Access selected bit
  - READ: transfer from selected column latch to Dout
  - WRITE: Set selected column latch to Din

Rewrite (~30ns)
- Write back entire row
Observations About DRAMs

Timing
- Access time (= 60ns) < cycle time (= 90ns)
- Need to rewrite row

Must Refresh Periodically
- Perform complete memory cycle for each row
- Approximately once every 1ms
- $\sqrt{n}$ cycles
- Handled in background by memory controller

Inefficient Way to Get a Single Bit
- Effectively read entire row of $\sqrt{n}$ bits
Enhanced Performance DRAMs

Conventional Access
- Row + Col
- RAS CAS RAS CAS ...

Page Mode
- Row + Series of columns
- RAS CAS CAS CAS ...
- Gives successive bits

Other Acronyms
- EDORAM
  - “Extended data output”
- SDRAM
  - “Synchronous DRAM”

Typical Performance

<table>
<thead>
<tr>
<th></th>
<th>row access time</th>
<th>col access time</th>
<th>cycle time</th>
<th>page mode cycle time</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>50ns</td>
<td>10ns</td>
<td>90ns</td>
<td>25ns</td>
</tr>
</tbody>
</table>
Video RAM

Performance Enhanced for Video / Graphics Operations

- Frame buffer to hold graphics image

Writing

- Random access of bits
- Also supports rectangle fill operations
  - Set all bits in region to 0 or 1

Reading

- Load entire row into shift register
- Shift out at video rates

Required Performance

- 1200 X 1800 pixels / frame
- 24 bits / pixel
- 60 frames / second
- 2.8 GBits / second
DRAM Driving Forces

Capacity
- 4X per generation
  - Square array of cells
- Typical scaling
  - Lithography dimensions 0.7X
    » Areal density 2X
  - Cell function packing 1.5X
  - Chip area 1.33X
- Scaling challenge
  - Typically $C_{\text{node}} / C_{\text{BL}} = 0.1–0.2$
  - Must keep $C_{\text{node}}$ high as shrink cell size

Retention Time
- Typically 16–256 ms
- Want higher for low-power applications
DRAM Storage Capacitor

Planar Capacitor
- Up to 1Mb
- $C$ decreases linearly with feature size

Trench Capacitor
- 4–256 Mb
- Lining of hole in substrate

Stacked Cell
- > 1Gb
- On top of substrate
- Use high $\varepsilon$ dielectric

Circuit Diagram:
- $C = \frac{\varepsilon A}{d}$
- Plate Area $A$
- Dielectric Material
- Dielectric Constant $\varepsilon$
- Distance $d$
Trench Capacitor

Process

- Etch deep hole in substrate
  - Becomes reference plate
- Grow oxide on walls
  - Dielectric
- Fill with polysilicon plug
  - Tied to storage node
IBM DRAM Evolution

- IBM J. R&D, Jan/Mar '95
- Evolution from 4 – 256 Mb
- 256 Mb uses cell with area 0.6 µm²
Mitsubishi Stacked Cell DRAM

- IEDM ‘95
- Claim suitable for 1 – 4 Gb

Technology
- 0.14 µm process
  - Synchrotron X-ray source
- 8 nm gate oxide
- 0.29 µm² cell

Storage Capacitor
- Fabricated on top of everything else
- Rubidium electrodes
- High dielectric insulator
  - 50X higher than SiO₂
  - 25 nm thick
- Cell capacitance 25 femtofarads
Mitsubishi DRAM Pictures

Fig. 3. SEM cross-sectional photgraph of the fabricated 0.29-μm³ memory cell with Ru/BST/Ru stacked capacitor. The facet was fabricated by focused ion beam etching.

Fig. 8. SEM photograph of a Ru-metal storage node array with a projection of a height of 0.2 μm.

Fig. 10. SEM cross-sectional view of a Ru/BST/Ru capacitor cell. The facet shown is a cleaved facet.
Magnetic Disks

- Disk surface spins at 3600–10800 RPM
- The surface consists of a set of concentric magnetized rings called tracks
- Each track is divided into sectors
- The read/write head floats over the disk surface and moves back and forth on an arm from track to track.
## Disk Capacity

<table>
<thead>
<tr>
<th>Parameter</th>
<th>18GB Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number Platters</td>
<td>12</td>
</tr>
<tr>
<td>Surfaces / Platter</td>
<td>2</td>
</tr>
<tr>
<td>Number of tracks</td>
<td>6962</td>
</tr>
<tr>
<td>Number sectors / track</td>
<td>213</td>
</tr>
<tr>
<td>Bytes / sector</td>
<td>512</td>
</tr>
<tr>
<td><strong>Total Bytes</strong></td>
<td><strong>18,221,948,928</strong></td>
</tr>
</tbody>
</table>
Disk Operation

Operation

• Read or write complete sector

Seek

• Position head over proper track
• Typically 6-9ms

Rotational Latency

• Wait until desired sector passes under head
• Worst case: complete rotation
  
  \[
  10,025 \text{ RPM} \Rightarrow 6 \text{ ms}
  \]

Read or Write Bits

• Transfer rate depends on # bits per track and rotational speed
• E.g., 213 * 512 bytes @10,025RPM = 18 MB/sec.
• Modern disks have external transfer rates of up to 80 MB/sec
  
  – DRAM caches on disk help sustain these higher rates
Disk Performance

Getting First Byte
- Seek + Rotational latency = 7,000 – 19,000 µsec

Getting Successive Bytes
- ~ 0.06 µsec each
  - roughly 100,000 times faster than getting the first byte!

Optimizing Performance:
- Large block transfers are more efficient
- Try to do other things while waiting for first byte
  - switch context to other computing task
  - processor is interrupted when transfer completes
Disk / System Interface

1. Processor Signals Controller
   - Read sector X and store starting at memory address Y

2. Read Occurs
   - “Direct Memory Access” (DMA) transfer
   - Under control of I/O controller

3. I/O Controller Signals Completion
   - Interrupts processor
   - Can resume suspended process
Magnetic Disk Technology

Seagate ST-12550N Barracuda 2 Disk

- Linear density 52,187. bits per inch (BPI)
  - Bit spacing 0.5 microns
- Track density 3,047. tracks per inch (TPI)
  - Track spacing 8.3 microns
- Total tracks 2,707. tracks
- Rotational Speed 7200. RPM
- Avg Linear Speed 86.4 kilometers / hour
- Head Floating Height 0.13 microns

Analogy:
- put the Sears Tower on its side
- fly it around the world, 2.5cm above the ground
- each complete orbit of the earth takes 8 seconds
CD Read Only Memory (CDROM)

Basis
- Optical recording technology developed for audio CDs
  - 74 minutes playing time
  - 44,100 samples / second
  - 2 X 16-bits / sample (Stereo)
  \[ \Rightarrow \text{Raw bit rate} = 172 \text{ KB / second} \]
- Add extra 288 bytes of error correction for every 2048 bytes of data
  - Cannot tolerate any errors in digital data, whereas OK for audio

Bit Rate
- \[ 172 \times 2048 / (288 + 2048) = 150 \text{ KB / second} \]
  - For 1X CDROM
  - \( N \times \text{CDROM} \) gives bit rate of \( N \times 150 \)
  - E.g., 12X CDROM gives 1.76 MB / second

Capacity
- 74 Minutes * 150 KB / second * 60 seconds / minute = 650 MB
## Storage Trends

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>SRAM</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$/MB</td>
<td>19,200</td>
<td>2,900</td>
<td>320</td>
<td>256</td>
<td>100</td>
<td><strong>190</strong></td>
</tr>
<tr>
<td>access (ns)</td>
<td>300</td>
<td>150</td>
<td>35</td>
<td>15</td>
<td>3</td>
<td><strong>100</strong></td>
</tr>
<tr>
<td><strong>DRAM</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$/MB</td>
<td>8,000</td>
<td>880</td>
<td>100</td>
<td>30</td>
<td>1.5</td>
<td><strong>5,300</strong></td>
</tr>
<tr>
<td>access (ns)</td>
<td>375</td>
<td>200</td>
<td>100</td>
<td>70</td>
<td>60</td>
<td><strong>6</strong></td>
</tr>
<tr>
<td>typical size(MB)</td>
<td>0.064</td>
<td>0.256</td>
<td>4</td>
<td>16</td>
<td>64</td>
<td><strong>1,000</strong></td>
</tr>
<tr>
<td><strong>Disk</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$/MB</td>
<td>500</td>
<td>100</td>
<td>8</td>
<td>0.30</td>
<td>0.05</td>
<td><strong>10,000</strong></td>
</tr>
<tr>
<td>access (ms)</td>
<td>87</td>
<td>75</td>
<td>28</td>
<td>10</td>
<td>8</td>
<td><strong>11</strong></td>
</tr>
<tr>
<td>typical size(MB)</td>
<td>1</td>
<td>10</td>
<td>160</td>
<td>1,000</td>
<td>9,000</td>
<td><strong>9,000</strong></td>
</tr>
</tbody>
</table>

*(Culled from back issues of Byte and PC Magazine)*
Storage Access Times (nsec)

- SRAM
- DRAM
- Disk

Storage Access Times (nsec)

1.0E+08
1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+01
1.0E+00

## Processor clock rates

### Processors

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>typical clock (MHz)</td>
<td>1</td>
<td>6</td>
<td>20</td>
<td>150</td>
<td>400</td>
<td>400</td>
</tr>
<tr>
<td>processor</td>
<td>8080</td>
<td>286</td>
<td>386</td>
<td>Pentium P-II</td>
<td>400</td>
<td></td>
</tr>
</tbody>
</table>

culled from back issues of Byte and PC Magazine
The CPU vs. DRAM Latency Gap (ns)
Memory Technology Summary

Cost and Density Improving at Enormous Rates

Speed Lagging Processor Performance

Memory Hierarchies Help Narrow the Gap:

• Small fast SRAMS (cache) at upper levels
• Large slow DRAMS (main memory) at lower levels
• Incredibly large & slow disks to back it all up

Locality of Reference Makes It All Work

• Keep most frequently accessed data in fastest memory