15-213 Recitation 11: Malloc Discussion/board notes Author: Sol Boucher === Goal: Understand how allocator design decisions affect performance/functionality. Think-pair-share activity: Why use the heap? And what constraints does this place on allocator implementations? ==================================================================================== Adjacency: (Need a contiguous block of addresses, as guaranteed by the C standard) (This makes the consequences of fragmentation severe: we might have to tell the application we failed!) [External fragmentation] Flexibility: (Might use to store any type of data) (This mandates using the widest possible alignment.) [Internal fragmentation] Lifetime: (Need to keep data around for a long time) (This makes our decisions really important: each placement could haunt us for the rest of the run.) Memory: (Not required to remember how large the space is) (This means the application won't remind us at deallocation-time, so we need to track it as part of our metadata.) [Internal fragmentation] Ordering: (Might deallocate in any order) (This means that we need to think about how space might [or might not] be reclaimed on deallocation, rather than simply how it might be affected at allocation-time.) [External fragmentation] Size: (Need to store a lot of data) (This basically means that blocks can't be of a fixed size: need to allow really big allocations without compromising on all allocations.) [External fragmentation] Speed: (Need the space immediately) (The inability to perform batching means we always have to predicate decisions on incomplete knowledge, and that we want to avoid syscalls.) [External fragmentation] Stability: (Need stable addresses) (This forces us to use a greedy approach, so there's always guaranteed to be a sequence for which we're suboptimal.) [External fragmentation] Define types of fragmentation. Which of the implementation constraints contribute to each? ========================================================================================== Internal fragmentation: w/in allocation, due to: (Alignment) (Overhead) In the worst case, we have a ton of *small allocations* and accounting information begins to dominate External fragmentation: across allocations, due to: (Adjacency) (Ordering) (Size) (Speed) (Stability) In the worst case, we either fail to accommodate a *large allocation* or have to expand the heap. Notice that application behavior can cause both types of fragmentation; all the implementation can do is try to minimize fragmentation. Begin tracethrough/discussion of needed structures and design decisions ======================================================================= Keep this on the board: Map of process image (text, rodata, data, bss, heap, free space, stack) Instance data Globals Application calls Linear heap layout Use a different color for design decisions! 3 main ways to perform global block tracking ============================================ Implicit free list Explicit free list Segregated list (multiple explicit lists) Develop implicit list ===================== What's the first thing we have to do to handle malloc(8)? 1. Know whether a location is free. -> [Global data] Root node -> [Instance data] Header w/ valid bit ! It's the *payload* (not the header) that must be aligned, but we're going to disregard altogether for now. * Initialize root block to invalid! How initialization happens -------------------------- mm_init() in the lab Normal C programs don't have to do anything at start of main(), though... Kernel calls _start()---NOT main()---on new processes! 2. Know where to put next allocation -> [Instance data] Size of this allocation Decide whether to include header and alignment in size, then be consistent or bug out! ! I'm assuming sizes can fit in a single byte for simplicity/laziness. (Note: initial free size was 27.) * Initialize next block to invalid! DESIGN: {First,Next,Best} fit ----------------------------- First is easy to implement Next requires remembering to wrap back to beginning if nothing found in latter part -> [Global data] Fit start (for next fit or efficient first fit) Next might cause more fragmentation Best is hard to do with an implicit list -> [Global data] Some kind of list for efficient best fit (we'll come back to it) Have people do malloc(3), then just add malloc(2) and malloc(5). 3. Know how to dealloc * Reset valid bit Anything else we need to do? Run through dealloc() of 2 then 3. 4. Know how to coalesce. * What happens when we dealloc() 3 then 2 instead? -> [Instance data] Footers w/ sizes (UPDATE THEM WITH FOOTERS and change their usable data sizes.) DESIGN: When to coalesce ======================== Disadvantage of doing it at dealloc-time? (Causes cache misses when we wouldn't have to traverse anyway.) What happens when we try to do a malloc(6)? 5. Know when we need to expand the heap. -> [Global data] Current heap size (to avoid extra sbrk(), which is expensive due to poor interface) DESIGN: How much to sbrk() -------------------------- More: Faster due to fewer syscalls Less: Faster due to reduced space usage/swapping avoidance Implicit pros/cons ================== + LEAST BOOKKEEPING/less internal fragmentation - SEARCHES are expensive, especially for large allocs. - Hard to implement BEST FIT efficiently/more external fragmentation - Numerous CACHE MISSES on every allocation search + Makes COALESCING EASY Changes for explicit list ========================= + Want to prevent *number of active allocations* from hurting search time. Keep a list of just the free blocks, rather than commingling free and alloc'd. Repurpose start pointer to indicate free list head node Do we still need to store sizes? (Yes: for coalescing) Can keep all instance data within allocation. Do we need to expand the header? (No: only free nodes need pointers, so we can store within payload BUT we will increase the minimum block size.) 1. Changes to allocation: search free list instead of all nodes, remember to remove from list! 2. Changes to freeing: re-add to list DESIGN: Where to place? ----------------------- FIFO (deallocated blocks aren't reused for a while) LIFO: + Finding the front constant-time regardless of double linkage + Fewer cache misses due to block reuse - Worse fragmentation Addressed-ordered: Allows you to do away with size footer if you keep a doubly-linked list Sacrifice payload space/minimum block size to reduce instance metadata 3. Changes to splitting/coalescing: remember to remove the original entry and add the new one! Explicit pros/cons ================== + From above: active allocations don't affect SEARCH TIME - Still hard to do BEST FIT - Fewer CACHE MISSES because searches traverse fewer nodes - Easier to incorporate bugs/attain inconsistent state Changes for segregated list =========================== + Want to efficiently implement *best fit* without driving up search time. Keep multiple free lists, one for each size (range) What if we run out of a particular size? (Need to search larger sizes and split.) Better to assign size buckets linearly or logarithmically? (Determines asymptotic time of small allocations) Can do a combination: linear for small sizes, then logarithmic, then catch-all Segregated pros/cons ==================== + Most GLOBAL OVERHEAD + SEARCHES can be cheaper, especially for large allocs. - SMALL ALLOCATIONS disadvantaged as lists run out + Can finally reasonably implement BEST FIT - Coalescing requires DOUBLY-LINKED LIST to avoid searching other lists - Most COMPLICATED to implement Heap checker ============ Implement mm_heapcheck() early on and keep it updated. What are some invariants to check? (Annotate existing stuff with a new color.) Block: (Header/footer consistent), (*Payload* aligned) List: (Prev/next consistent between blocks?), (Blocks in list iff free), (No contiguous free blocks?), (No cycles in list?), (Segregated sizes obeyed) Heap: (Nothing outside heap boundaries) Any others? (e.g. Address order) Safety and debugging ==================== Look at slides for a pointer arithmetic reminder Write reusable helper functions and macros to avoid duplication bugs Include conditionally-compiled debugging code Check invariants with assertions