Virtual Memory
October 15, 2009

Topics
- Address spaces
- Motivations for virtual memory
- Address translation
- Accelerating translation with TLBs

Programs Refer to Virtual Memory Addresses
- Conceptually very large array of bytes
- Actually implemented with hierarchy of different memory types
- System provides address space private to particular “process”
  - Program being executed
  - Program can clobber its own data, but not that of others

Compiler + Run-Time System Control Allocation
- Where different program objects should be stored
- All allocation within single virtual address space

Simple Addressing Modes
- Normal (R) Mem[Reg[R]]
  - Register R specifies memory address
    movl (%ecx),%eax
- Displacement D(R) Mem[Reg[R]+D]
  - Register R specifies start of memory region
  - Constant displacement D specifies offset
    movl 8(%ebp),%edx

How does everything fit?
- 32-bit addresses: ~4,000,000,000 (4 billion) bytes
- 64-bit addresses: ~16,000,000,000,000,000,000 (16 quintillion) bytes

How to decide which memory to use in your program?
- How about after a fork()?

What if another process stores data into your memory?
- How could you debug your program?

Let’s think on this: physical memory?

Address Spaces
- A linear address space is an ordered set of contiguous nonnegative integer addresses:
  \[ \{0, 1, 2, 3, \ldots\} \]
- A virtual address space is a set of \( N = 2^n \) virtual addresses:
  \[ \{0, 1, 2, \ldots, N-1\} \]
- A physical address space is a set of \( M = 2^m \) (for convenience)
  physical addresses:
  \[ \{0, 1, 2, \ldots, M-1\} \]

In a system based on virtual addressing, each byte of main memory has a physical address and a virtual address (or more)

So, we add a level of indirection

One simple trick solves all three problems
- Each process gets its own private image of memory
  - appears to be a full-sized private memory range
- This fixes “how to choose” and “others shouldn’t mess w/your’s”
  - surprisingly, it also fixes “making everything fit”
- Implementation: translate addresses transparently
  - add a mapping function
  - to map private addresses to physical addresses
  - do the mapping on every load or store

This mapping trick is the heart of virtual memory
A System Using Physical Addressing

Used by many embedded microcontrollers in devices like cars, elevators, and digital picture frames

A System Using Virtual Addressing

One of the great ideas in computer science
*used by all modern desktop and laptop microprocessors

Why Virtual Memory?

1. VM allows efficient use of limited main memory (RAM)
   - Use RAM as a cache for the parts of a virtual address space
   - Some non-cached parts stored on disk
   - Some non-cached parts stored nowhere
   - Keep only active areas of virtual address space in memory
   - Transfer data back and forth as needed

2. VM simplifies memory management for programmers
   - Each process gets a full, private linear address space

3. VM isolates address spaces
   - One process can’t interfere with another’s memory
   - Because they operate in different address spaces
   - User process cannot access privileged information
   - Different sections of address spaces have different permissions

(1) VM as a Tool for Caching

Virtual memory is an array of N contiguous bytes
*Think of the array as being stored on disk
The contents of the array on disk are cached in physical memory (DRAM cache)

DRAM Cache Organization

DRAM cache organization driven by the enormous miss penalty
- DRAM is about 10x slower than SRAM
- Disk is about 100,000x slower than a DRAM
- To get first byte, though fast for next byte

DRAM cache properties
- Large page (block) size (typically 4-8 KB)
- Fully associative
- Any virtual page can be placed in any physical page
- Requires a “large” mapping function—different from CPU caches
- Highly sophisticated replacement algorithms
- Too complicated and open-ended to be implemented in hardware
- Write-back rather than write-through

Reminder: using virtual addressing

One of the great ideas in computer science
*used by all modern desktop and laptop microprocessors
How? Page Tables

A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.

- Per-process kernel data structure in DRAM

Address Translation with a Page Table

VIRTUAL ADDRESS

PHYSICAL ADDRESS

Valid
Physical page number (PPN)
Virtual page number (VPN)
Virtual page offset (VPO)
Physical page offset (PPO)

Page Hits

A page hit is a reference to a VM word that is in physical (main) memory.

Handling a Page Fault

The kernel’s page fault handler selects VP 4 as the victim and replaces it with a copy of VP 3 from disk (demand paging).

Why does it work? Locality

Virtual memory works because of locality.

At any point in time, programs tend to access a set of active virtual pages called the working set.

- Programs with better temporal locality will have smaller working sets

If (working set size < main memory size)

- Good performance for one process after compulsory misses

If (SUM(working set sizes) > main memory size)

- Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously
(2) VM as a Tool for Memory Mgmt
Key idea: each process has its own virtual address space
- It can view memory as a simple linear array
- Mapping function scatters addresses through physical memory
- Well chosen mappings simplify memory allocation and management

Virtual Address Space for Process 1:
- VP 1
- VP 2
- PP 2
  Address Translation
  0
  M-1
  N-1
  N-1 M-1
  (e.g., read-only library code)

Virtual Address Space for Process 2:
- VP 1
- VP 2
- PP 7

Simplifying Sharing and Allocation
Memory allocation
- Each virtual page can be mapped to any physical page
- A virtual page can be stored in different physical pages at different times – the program never knows

Sharing code and data among processes
- Map virtual pages to the same physical page (PP 7)

Virtual Address Space for Process 1:
- VP 1
- VP 2
- PP 2
  Address Translation
  0
  M-1
  N-1
  N-1 M-1
  (e.g., read-only library code)

Virtual Address Space for Process 2:
- VP 1
- VP 2
- PP 7

Simplifying Linking and Loading
Linking
- Each program has similar virtual address space
- Code, stack, and shared libraries always start at the same address

Loading
- execve() maps PTEs to the appropriate location in the executable binary file
- The .text and .data sections are copied, page by page, on demand by the virtual memory system.

(3) VM as a Tool for Memory Protection
Extend PTEs with permission bits
Page fault handler checks these before remapping
- If violated, send process SIGSEGV (segmentation fault)

Reminder: using virtual addressing
One of the great ideas in computer science
- used by all modern desktop and laptop microprocessors
Address Translation: Page Hit

1) Processor sends virtual address to MMU
2-3) MMU fetches PTE from page table in memory
4) MMU sends physical address to cache/memory
5) Cache/memory sends data word to processor

Address Translation: Page Fault

1) Processor sends virtual address to MMU
2-3) MMU fetches PTE from page table in memory
4) Valid bit is zero, so MMU triggers page fault exception
5) Handler identifies victim (and, if dirty, pages it out to disk)
6) Handler pages in new page and updates PTE in memory
7) Handler returns to original process, restarting faulting instruction

Speeding up Translation with a TLB

Page table entries (PTEs) are cached in L1 like any other memory word
- PTEs may be evicted by other data references
- PTE hit still requires a 1-cycle delay

Solution: Translation Lookaside Buffer (TLB)
- Small hardware cache in MMU
- Maps virtual page numbers to physical page numbers
- Contains complete page table entries for small number of pages

A TLB hit eliminates a memory access

A TLB miss incurs an add’l memory access (the PTE)
- Fortunately, TLB misses are rare

Simple Memory System Example

Addressing
- 14-bit virtual addresses
- 12-bit physical address
- Page size = 64 bytes
Simple Memory System Page Table

- Only show first 16 entries (out of 256)

<table>
<thead>
<tr>
<th>VPN</th>
<th>PPN</th>
<th>Valid</th>
<th>VPN</th>
<th>PPN</th>
<th>Valid</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>28</td>
<td>1</td>
<td>06</td>
<td>13</td>
<td>1</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>09</td>
<td>17</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>02</td>
<td>33</td>
<td>1</td>
<td>09</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>03</td>
<td>02</td>
<td>1</td>
<td>08</td>
<td>–</td>
<td>0</td>
</tr>
<tr>
<td>04</td>
<td>–</td>
<td>0</td>
<td>0C</td>
<td>–</td>
<td>0</td>
</tr>
<tr>
<td>05</td>
<td>16</td>
<td>1</td>
<td>0D</td>
<td>2D</td>
<td>1</td>
</tr>
<tr>
<td>06</td>
<td>–</td>
<td>0</td>
<td>0E</td>
<td>11</td>
<td>1</td>
</tr>
<tr>
<td>07</td>
<td>–</td>
<td>0</td>
<td>0F</td>
<td>0D</td>
<td>1</td>
</tr>
</tbody>
</table>

Simple Memory System TLB

- 16 entries
- 4-way associative

<table>
<thead>
<tr>
<th>Set</th>
<th>Tag</th>
<th>PPN</th>
<th>Valid</th>
<th>Tag</th>
<th>PPN</th>
<th>Valid</th>
<th>Tag</th>
<th>PPN</th>
<th>Valid</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>03</td>
<td>09</td>
<td>1</td>
<td>0D</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>02</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>02</td>
<td>1</td>
<td>2D</td>
<td>0</td>
<td>04</td>
<td>–</td>
<td>0</td>
<td>0A</td>
<td>–</td>
</tr>
<tr>
<td>2</td>
<td>02</td>
<td>–</td>
<td>08</td>
<td>–</td>
<td>05</td>
<td>–</td>
<td>0</td>
<td>03</td>
<td>–</td>
</tr>
<tr>
<td>3</td>
<td>07</td>
<td>–</td>
<td>03</td>
<td>1</td>
<td>0A</td>
<td>34</td>
<td>1</td>
<td>02</td>
<td>–</td>
</tr>
</tbody>
</table>

Address Translation Example #1

Virtual Address 0x03D4

<table>
<thead>
<tr>
<th>VPN</th>
<th>PPN</th>
<th>Valid</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>TLB Hit</th>
<th>Page Fault</th>
<th>NO</th>
<th>PPN</th>
<th>TBD</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x0F</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>NO</td>
<td>NO</td>
<td>0x0D</td>
<td></td>
</tr>
</tbody>
</table>

Address Translation Example #2

Virtual Address 0x0BFF

<table>
<thead>
<tr>
<th>VPN</th>
<th>PPN</th>
<th>Valid</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>TLB Hit</th>
<th>Page Fault</th>
<th>YES</th>
<th>PPN</th>
<th>TBD</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x2E</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>YES</td>
<td></td>
<td>0x28</td>
<td></td>
</tr>
</tbody>
</table>

Address Translation Example #3

Virtual Address 0x0020

<table>
<thead>
<tr>
<th>VPN</th>
<th>PPN</th>
<th>Valid</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>TLB Hit</th>
<th>Page Fault</th>
<th>TBD</th>
<th>PPN</th>
<th>TBD</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x00</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
<td>0x28</td>
<td></td>
</tr>
</tbody>
</table>
Summary

Programmer’s View of Virtual Memory
- Each process has its own private linear address space
- Cannot be corrupted by other processes

System View of Virtual Memory
- Uses memory efficiently by caching virtual memory pages
- Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions

Cache
- 16 lines
- 4-byte line size
- Direct mapped

Simple Memory System Cache
- 16 lines
- 4-byte line size
- Direct mapped

Address Translation Example #1
Virtual Address 0x0D4

<table>
<thead>
<tr>
<th>VPN</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>CT</th>
<th>CI</th>
<th>CO</th>
<th>PPN</th>
<th>PPO</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x0F</td>
<td>3</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Physical Address

<table>
<thead>
<tr>
<th>Offset</th>
<th>CT</th>
<th>HI</th>
<th>Byte:</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x03</td>
<td>0</td>
<td>Y</td>
<td>0x36</td>
</tr>
</tbody>
</table>

Address Translation Example #2
Virtual Address 0x0B8F

<table>
<thead>
<tr>
<th>VPN</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>CT</th>
<th>CI</th>
<th>CO</th>
<th>PPN</th>
<th>PPO</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x2E</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Physical Address

<table>
<thead>
<tr>
<th>Offset</th>
<th>CT</th>
<th>HI</th>
<th>Byte:</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x0B</td>
<td>2</td>
<td>0</td>
<td>0x2C</td>
</tr>
</tbody>
</table>

Address Translation Example #3
Virtual Address 0x0020

<table>
<thead>
<tr>
<th>VPN</th>
<th>TLB</th>
<th>TLBI</th>
<th>TLBT</th>
<th>CT</th>
<th>CI</th>
<th>CO</th>
<th>PPN</th>
<th>PPO</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x00</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Physical Address

<table>
<thead>
<tr>
<th>Offset</th>
<th>HI</th>
<th>Byte:</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x00</td>
<td>NO</td>
<td>MEM</td>
</tr>
</tbody>
</table>

Allocating Virtual Pages

Example: Allocating new virtual page VP5
- Kernel allocates VP 5 on disk and points PTE 5 to it
**Multi-Level Page Tables**

**Given:**
- 4KB ($2^{12}$) page size
- 48-bit address space
- 4-byte PTE

**Problem:**
- Would need a 256 GB page table!

**48-bit * 2^{12} * 2^{2} = 2^{38} bytes**

**Common solution**
- Multi-level page tables
  - Example: 2-level page table
    - Level 1 table: each PTE points to a page table (memory resident)
    - Level 2 table: Each PTE points to a page (paged in and out like other data)
  - Level 1 table stays in memory
  - Level 2 tables paged in and out

---

**A Two-Level Page Table Hierarchy**

- Level 1 page table
- Level 2 page tables
- Virtual memory

---

**Translating with a k-level Page Table**

**VIRTUAL ADDRESS**

- Level 1 page table
- Level 2 page table
- Levels' page table

**PHYSICAL ADDRESS**

- PPN
- PPO

---

**Servicing a Page Fault**

1. Processor signals disk controller
   - Read block of length P starting at disk address X and store starting at memory address Y

2. Read occurs
   - Direct Memory Access (DMA)
   - Under control of I/O controller

3. Controller signals completion
   - Interrupts processor
   - OS resumes suspended process

---

1 Initiate Block Read
2 DMA Transfer
3 Read Done
4 Initiate Block Read
5 Read Done