# Can Increasing the Hit Ratio Hurt Cache Throughput?

Ziyue Qiu

Juncheng Yang

**Mor Harchol-Balter** 



Carnegie Mellon University Computer Science Dept.



# What is a Cache?



#### **Hit Ratio**:

Fraction of requests found in cache

#### Cache Eviction Policies



# Why LRU is most popular



**Peter Denning** 

Data Access patterns show temporal locality.
Recently accessed data is more likely to be accessed again.





#### **Common Wisdom**



Beckman, Berg, Berger, Bunt, Carrig, Chen, Cheng, Cho, Cidon, Ciucu, Crooks, Eager, Feng, Gandhi, Ganger, Grosof, Gunasekar, Harchol-Balter, Hellerstein, Henningsen, Kozuch, Lakew, Li, Lu, McAllister, Sabnis, Schmitt, Sitaraman, Stoica, Sunderrajan, Tran, Vinayak, Willick, Yang, Yu, Yue, Zhu ...



Ziyue Qiu



Seems no one has actually studied the relationship between hit ratio and throughput/latency...



#### Thesis of Talk

#### For today's LRU-based caching systems,



# Caching System Implementation

- ☐ Prototype of Meta's HHVM cache
- ☐ Run on CloudLab platform
- $\square$  Requests are for 4KB blocks from Zipfian ( $\theta = 0.99$ ) popularity distribution
- Intel Xeon Platinum CPU for cache with 72 cores.

#### ☐ <u>KEY POINTS</u>:

- DRAM-based cache
  - $\circ$  Very fast (0.51  $\mu$ s) & Highly concurrent (72 cores)
- SSD-based disk
  - $0 100 \mu s$  but we emulate range from  $5 \mu s 500 \mu s$
  - Highly concurrent (72 concurrent requests)
- Each request is handled by a single core.
   Total # requests in system is limited by #cores → MPL = 72

# Queueing model for LRU caching system



# Q-theory: "Find the bottleneck"





#### <u>STEP 1</u>:

Think  $E[Z] = E[Z_{cache}] + (1 - p_{hit})E[Z_{disk}]$ 

time 12



#### MPL = 72Delink $p_{hit}$ $E[S_{delink}]$ $= 0.7 \, \mu s$ $p_{miss}$ **Head Update** Cache $E[S_{head}]$ $E[Z_{cache}]$ $= 0.59 \, \mu s$ $= 0.51 \, \mu s$ Tail Update Disk $E[S_{tail}] = 0.59 \,\mu s$ $\boldsymbol{E}[Z_{disk}]$ $= 100 \, \mu s$

Throughput = 
$$X \le min\left(\frac{\text{MPL}}{D + \boldsymbol{E}[Z]}, \frac{1}{D_{max}}\right)$$

$$\frac{72}{101.1 - 99.3p_{hit}} \left(\frac{1}{\max(0.59, 0.7p_{hit})}\right)$$

$$E[Z] = E[Z_{cache}] + p_{miss}E[Z_{disk}]$$

$$D_{delink} = p_{hit} \cdot (0.7)$$

$$D_{tail} = (1 - p_{hit}) \cdot (0.59)$$

$$D_{head} = 0.59$$

$$D = D_{delink} + D_{tail} + D_{head}$$

$$D_{max} = \begin{cases} D_{head} & \text{if } p_{hit} < 0.84 \\ D_{delink} & \text{if } p_{hit} \ge 0.84 \end{cases}$$

14

# MPL = 72 Delink $p_{hit}$ $E[S_{delink}]$ $= 0.7 \ \mu s$ Head Update $E[Z_{cache}]$ $= 0.59 \ \mu s$ Disk $E[S_{tail}] = 0.59 \ \mu s$

Throughput = 
$$X \le min\left(\frac{\text{MPL}}{D + \textbf{\textit{E}}[Z]}, \frac{1}{D_{max}}\right)$$

$$\frac{72}{101.1 - 99.3p_{hit}} \left(\frac{1}{\max(0.59, 0.7p_{hit})}\right)$$

 $E[Z_{disk}]$ 

 $= 100 \, \mu s$ 

#### 3 Regimes

- *❖*  $p_{hit}$  < 0.59
  - $\rightarrow X = \text{Left term}$
  - $\rightarrow$  X increases with  $p_{hit}$
- **♦**  $0.59 < p_{hit} < 0.84$

❖ 
$$p_{hit}$$
 > 0.84

 $\rightarrow$  X decreases with  $p_{hit}$ 

15



Queueing network upper bound

••••• Queueing network simulation

**— —** Implementation

#### 3 Regimes

*♦* 
$$p_{hit}$$
 < 0.59

$$\rightarrow X = \text{Left term}$$

 $\rightarrow$  X increases with  $p_{hit}$ 

**❖** 
$$0.59 < p_{hit} < 0.84$$

*♦* 
$$p_{hit}$$
 > 0.84

 $\rightarrow$  X decreases with  $p_{hit}$ 

16

# Summary



#### When $p_{hit}$ is high:

- Delink server becomes bottleneck
- ➤ Increasing p<sub>hit</sub> increases demand on Delink server, making queue even longer
  - → Request latency ↑
    Throughput ↓

# Same story holds for all LRU variants



# Future trends

- 1. Disks will get faster.
- 2. Concurrency level will increase for both cache and disk.



#### Faster Disk Speed



# For FIFO-based caches, X-put only rises





# Breakdown of Cache Eviction policies

#### **LRU-like behavior**

LRU LeCaR SLRU CACHEUS

ARC

LIRS

TinyLFU

LFU



#### FIFO-like behavior

FIFO LHD CLOCK LRB

S3-FIFO Random

**SIEVE** 

QDLP

Hyperbolic



# Improving future Caching Systems

#### The problem with LRU:



Q: Why not just forgo LRU altogether & do FIFO?

A: FIFO is less efficient in its use of cache space!

#### What we really need is some combination of LRU & FIFO!

- Naïve mixture: Probabilistic-LRU
- O Better idea:



As  $p_{hit}$  gets high, if X-put starts dropping, skip doing Delink step (as in FIFO).

# Conclusion





# Conclusion



- ➤ Olden days: Slower disk + lower MPL → "top left corner": higher hit ratio helps
- ➤ Also in olden days: lower disk concurrency → Queueing at disk → Disk is bottleneck
- > But today with concurrent disks, bottleneck has shifted to cache operations.
- $\triangleright$  Operations on the hit path (Delink) become bottleneck when  $p_{hit}$  is high.
- ➤ When this happens, throughput will drop. One solution: mix LRU & FIFO.