Memory Subsystem in the Linux KernelMerlin KoglinUniversitaet HamburgDecember 8, 2015

OverviewMemory ManagementPhysical and virtual memoryZonesKernel Memory AllocationPage AllocatorSlabkmallocvmalloclarge buffersPicking an allocationEnd

Kinds of memoryIPhysical addressesII(Kernel) logical addressesIIIIaddresses used between the processor and the system’s memorynormal address space of the kernelalmost 1-1 mapping to physical memoryon most architectures logical associated physical addressesdiffer only by an offset(Kernel) virtual addressesIIIalso mapping from kernel space address to physical addressnot necessarily 1-to-1 mappingable to allocate physical memory that has no logical address

Virtual Memory - Physical -kernel/linux-kernel-slides.pdf

PagesIphysical memory is divided in parts of the same size calledpageIbasic unit of memory managementIsize is architecture-dependent, but typically 4096 byte getconf PAGE SIZEIin the kernel, every page is represented as a struct page,this structure ist defined in linux/mm types.h

Pages and ons/3/32/Virtual address space and physical address space relationship.svg

Zones (1)Ibecause of hardware limitations, the kernel cannot treat allpages as identicalIsome hardware can perform direct memory access to onlycertain memory adressIsome architectures can address larger amounts of physicalmemory than they can virtually address, so this memory is notpermanently mapped into the kernel address spaceI physical memory is divided into (more ore less) three zones

Zones (2)IDMAIIIlow 16MB of memoryexists for historical reasons, sometime there was hardware thatcould only do DMA in this area32DMAIIIonly in 64-bit linux low 4GBytes of memorytoday, there is hardware that can do DMA to 4GBytes

Zones (3)INormalIIIIHighMemIIIIdifferent on 32-bit and 64-bit machines32-bit: Memory from 16MB to 896MB64-bit: Memory above 4GBonly on 32-bit Linuxall Memory above 896 MBis not permanently or automatically mapped into the kernel’saddress spacecat /proc/pagetypeinfo

Memory zones for 8 GB RAM64 bit32 bit16 MB800 MBDMA16 MBDMA 4 GBDMA32 4 GBNORMALNORMAL 7 GBHIGHMEMPages inHIGHMEMmust bemapped intoNORMAL

Kernel Memory Allocation ux-kernel/linux-kernel-slides.pdf

Buddy systemIthe kernel uses a buddy allocator strategy so only allocationsof power of two number of pages are possible:1 page, 2 pages, 4 pages, 8 pages, 16 pages, etc.Iif a small area is needed and only a larger area is available, thelarger area is split into two halves (buddies), possiblyrepeatedly.Iwhen an area is freed, it is checked whether its buddy is freeas well, so they can get mergedInumber of free areas can be seen here /proc/buddyinfo

Getting a pageIunsigned long get free page(int flags)IIunsigned long get zeroed page(int flags)IIreturns virtual adress of a free pagereturns virtual adress of a free page, initialized to zerounsigned long get free pages(int flags,unsigned int order)Ireturns the starting virtual adress of an are of contiguous freepages, with order log2 (number of pages)

Flag categoriesIIThe flags are broken up into three categories:action modifiersIIzone modifiersIIspecify how the kernel is supposed to allocate memoryspecify where the kernel is supposed to allocate memorytypesIItype flags specify a combination of action and zone modifiersas needed by a certain type of memory allocationthese are mostly used

frequently used flagsIGFP KERNELIIIIIGFP ATOMICIIIIthe allocation is high priority and is not allowed to sleepnever blocks, allows to aaccess emergency poolscan fail if no free memory is readily availableGFP DMAIIIstandard kernel memory allocationthe allocation may block in order to find enough free memoryfine for most needs, except in interrupt handler contextthis flag should be your default choiceallocates memory in an area of the DMA Zonedevice drivers that need DMA-able memory use this flagfor all flags see include/linux/gfp.h

free pagesIvoid free page(unsigned long addr)IIfrees one pagevoid free pages(unsigned long addr,unsigned int order)IIfrees multiple pagesorder has to be the same as in allocation, passing the wrongorder can result in corruption.

UsageIthe low-level page functions are useful when you needpage-sized chunks of physically contiguous pages especially ifyou need exactly a single page or twoIit is also possible to use:struct page * alloc pages(int flags,unsigned int order)Ireturns a pointer to the first pages page struct, on error itreturns NULL

slab allocatorIallows to creates caches, which contains a set of objects ofthe same sizeIit uses the page allocatorprinciple aimsIIIcaching of commonly used objects system does not waste time allocating, initialising anddestroying objectsallocation of small blocks of memory help eliminate internal fragmentation that would beotherwise caused by the buddy system

Different SLAB allocatorsIthere are three different implementations of a SLAB allocatorin the linux kernel.Iyou can choose one at configuration of the kernelSLABIIISLUBIIlegacydefault, simpler, better scaling, less fragmentationSLOBIsimpler, more space effizient but doesn’t scale well.

kmalloc allocatorIkmalloc() is the normal method of allocating memory in thekernelIfor small sizes it relies on SLAB caches /proc/slabinfoIfor larger sizes it relies on the page allocatorIkmalloc() guarantees that the pages are physically contiguous(and virtually contiguous)Isame flags as for the page allocatorGFP KERNEL, GFP ATOMIC, GFP DMA, etc

kmalloc sizesIIthe maxium of space that can be allocated by kmallocdepends on the architectureMaximum sizes on x86 and armIIIIPer allocation: 4 MBMaximum sizes on 64-bitWe will test this later.For completely portable code, do not allocate anything largerthan 128 KB

kmalloc apiI#include linux/slab.h Ivoid *kmalloc(size t size, int flags);IIIIallocate size bytes and return pointer to the area (virtualadress)size: number of bytes to allocateflags: same flags as the page allocatorvoid kfree(const void *addr);Ifrees a block of memory previously allocated with kmalloc()

kmalloc API 2Ivoid *kzalloc(size t, int flags);IIvoid *kmalloc array(size t n, size t size t,gfp t flags);IIAllocates zero-initialized memoryallocates memory for an array of n elements of size sizevoid *kcalloc(size t n, size t, size, int flags);Iallocates memory from an array of n elements of size size andthe memory is set to zero,

kmalloc exampleIsimilar to malloc()IIf not enough memory is available, kmalloc() can return NULLso check after all calls to kmalloc() and handle the errorappropriatelyIstruct cat *p;p kmalloc(sizeof(struct cat), GFP KERNEL);if (!p)/* handle error . *///free the memorykfree(buf);

devm kmallocIdevm kmalloc is a resource-managed kmallocIautomatically frees the allocated buffers when thecorresponding device is detachedIvoid *devm kmalloc(struct device *dev,size t size, int flags);IIdev Device to allocate memory forless errors/memory leaks

vmalloc()Ivmalloc() allocates memory that is only virtually contiguous,but not physically contiguousIpages obtained via vmalloc() must be mapped by theirindividual pages (because they are not physically contiguous)Iis used only when absolutely necessaryItypically, to obtain large regions of memory

vmalloc apiI#includeIvoid *vmalloc(unsigned long size);II linux/vmalloc.h returns a pointer to at least size bytesvoid vfree(const void *addr);Ifrees an allocation obtained via vmalloc()

large buffersIwhat if you want to allocate a lot of (physically contiguous)memory?I allocate at boot timeIonly drivers directly linked to the kernel can do thatIto install, rebuild kernel and rebootIfreed memory ist possibly not reuseable!

bootmemIbootmem for allocating memory at boot timeI#include linux/bootmem.h Ivoid *alloc bootmem pages(unsigned long size);void *alloc bootmem low pages(unsigned long size);IIIallocated memory may be high memory unless low is usedunsigned long size size of memorypage-aligned memory areasIvoid free bootmem(unsigned long addr, unsigned long sizIbut not all pages are returned to the system

Picking an allocationIkmalloc()IIIIIgeneral purpose memory allocator for the kernelcontiguous physical pagesshould be used as the primary allocatorcan allocate DMA memoryvmalloc()IIIonly virtual contiguousslower than kmalloc()allocations of fairly large areas are possible

SummaryThank you :)Any questions? - Chapter 15 - Constantine ShulyupinILinux Device Drivers, 3rd Edition - O’ReillyIThe Linux Kernel - Chapter 3 - David A RuslingILinux Kernel Development - Robert Love (pdf)ILinux Kernel and Driver Development Training - free electrons(pdf)IMemory Subsystem and Data Types in the Linux Kernel Bjoern Broenmstrup and Alexander Koglin (pdf)

I Linux Device Drivers, 3rd Edition - O’Reilly I The Linux Kernel - Chapter 3 - David A Rusling I Linux Kernel Development - Robert Love(pdf) I Linux Kernel and Driver Development Training - free electrons (pdf) I Memory Subsystem and Data Types in the Linux