Memory management unit

From Free net encyclopedia

MMU, short for Memory Management Unit, is a class of computer hardware components responsible for handling memory accesses requested by the CPU. Among the functions of such devices are the translation of virtual addresses to physical addresses (i.e., virtual memory management), memory protection, cache control, bus arbitration, and, in simpler computer architectures (especially 8-bit systems), bank switching.

Modern MMUs typically divide the virtual address space (the range of addresses used by the processor) into pages, whose size is 2N, usually a few kilobytes. The bottom n bits of the address (the offset within a page) are left unchanged. The upper address bits are the (virtual) page number. The MMU normally translates virtual page numbers to physical page numbers via an associative cache called a Translation Lookaside Buffer (TLB). When the TLB lacks a translation, a slower mechanism involving hardware-specific data structures or software assistance will be used. The data items found in such data structures are typically called page table entries (PTEs), and the data structure itself is typically called a page table. The physical page number is combined with the page offset to give the complete physical address.

A PTE or TLB entry may also include information about whether the page has been written to (the dirty bit), when it was last used (the accessed bit, for a least recently used page replacement algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and whether it should be cached.

It is possible that TLB entry or PTE prohibits access to a virtual page, perhaps because no physical memory (RAM) has been allocated to that virtual page. In this case the MMU will signal a page fault to the CPU. The operating system will then handle the situation appropriately, perhaps by trying to find a spare page of RAM and set up a new PTE to map it to the requested virtual address. If no RAM is free it may be necessary to choose an existing page, using some replacement algorithm, and save it to disk (this is known as "paging"). With some MMUs there can also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new mapping.

In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory protection: an operating system can use it to protect against errant programs, by disallowing access to memory that a particular program should not have access to. Typically, an operating system assigns each program its own virtual address space.

An MMU also reduces the problem of fragmentation of memory. After blocks of memory have been allocated and freed, the free memory may become fragmented (discontinuous) so that the largest contiguous block of free memory may be much smaller than the total amount. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory.

In early designs memory management was performed by a separate integrated circuit such as the MC 68851 used with the Motorola 68020 CPU in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later CPUs such as the Motorola 68030 and the ZILOG Z280 have MMUs on the same IC as the CPU.

While this article concentrates on modern MMUs, which almost invariably use paging, other systems like segmentation and base-limit addressing (of which the former is a development) have been used in MMU and are occasionally still present on modern architectures; perhaps most notably, the x86 ISA provides for segmentation in addition to paging.

Examples

Most modern systems divide memory into pages that are 4 KiB to 64 KiB in size, often with the possiblity to use huge pages from 2 MiB to 512 MiB in size. Page translations are cached in a TLB. Some systems, mainly older RISC designs, trap into the OS when a page translation is not found in the TLB. Most systems use a hardware-based tree walker. Most systems allow the MMU to be disabled; many will disable the MMU when trapping into OS code.

DEC Alpha
The Alpha processor divides memory into 8192-byte pages. After a TLB miss, microcode (here called PALcode) walks a 3-level tree-structured page table. Addresses are broken down as follows: 21 bits unused, 10 bits to index the root level of the tree, 10 bits to index the middle level of the tree, 10 bits to index the leaf level of the tree, and 13 bits that pass through to the physical address without modification. Full read/write/execute permission bits are supported.
PowerPC G1, G2, G3, and G4
Pages are normally 4 KiB. After a TLB miss, the standard PowerPC MMU begins two simultaneous lookups. One lookup attempts to match the address with one of 4 or 8 data BAT registers, or 4 or 8 code BAT registers as appropriate. The BAT registers can map linear chunks of memory as large as 256 MiB, and are normally used by an OS to map large portions of the address space for the OS kernel's own use. If the BAT lookup succeeds, the other lookup will be halted and ignored. The other lookup, not directly supported by all processors in this family, is via a so-called "inverted page table" which acts as a hashed off-chip extension of the TLB. First, the top 4 bits of the address are used to select one of 16 segment registers. 24 bits from the segment register replace those 4 bits, producing a 52-bit address. The use of segment registers allows multiple processes to share the same hash table. The 52-bit address is hashed, then used as an index into the off-chip table. There, a group of 8 page table entries will be scanned for one that matches. If none match due to excessive hash collisions, the processor will try again with a slightly different hash function. If this too fails, the CPU will trap into the OS (with MMU disabled) so that the problem may be resolved. The OS will need to discard an entry from the hash table to make room for a new entry. The OS may generate the new entry from a more-normal tree-like page table or from per-mapping data structures which are likely to be slower and more space-efficient. Support for no-execute control is in the segment registers, leading to 256-MiB granularity. One of the major problems with this design is poor cache locality caused by the hash function. Tree-based designs avoid this problem by placing the page table entries for adjacent pages in adjacent locations. An operating system running on the PowerPC may minimize the size of the hash table to reduce this problem. It is also somewhat slow to remove the page table entries of a process; the OS may avoid reusing segment values to delay facing this or it may elect to suffer the waste of memory associated with per-process hash tables. G1 chips do not search for page table entries, but they do generate the hash with the expectation that an OS will search the standard hash table via software. (the OS can write to the TLB) G2, G3, and early G4 chips use hardware to search the hash table. The latest chips allow the OS to choose either method. On chips that make this optional or do not support it at all, the OS may choose to use a tree-based page table exclusively.
VAX
Pages are 512 bytes, which is very small. An OS may treat multiple pages as if they were a single larger page. Linux groups 8 pages together so that the system can be viewed as having 4 KiB pages. The VAX divides memory into 4 fixed-purpose regions, each 1 GiB in size. They are: paged memory for apps, paged memory for the kernel, unpaged memory for the kernel, and unused. Page tables are big linear arrays. Normally this would be very wasteful when addresses are used at both ends of the possible range, but the page table for apps is itself stored in the kernel's paged memory. Thus there is effectively a 2-level tree, allowing apps to have sparse memory layout without wasting lots of space on unused page table entries. The VAX MMU is notable for lacking an accessed bit. OSes which implement paging must find some way to emulate the accessed bit if they are to operate efficiently. Typically, the OS will periodically unmap pages so that page-not-present faults can be used to let the OS set an accessed bit.
x86
The x86 architecture has evolved over a long period of time while maintaining full software compatibility even for OS code. The MMU is thus extremely complex, with many different possible operating modes. Normal operation of the traditional 80386 CPU is described here. The CPU primarily divides memory into 4 KiB pages. Segment registers, fundamental to the older 8088 and 80286 MMU designs, are avoided as much as possible by modern OSes. There is one major exception to this: access to thread-specific data for apps or CPU-specific data for OS kernels. (this would involve explicit use of a segment register named FS or GS) All memory access involves a segment register, chosen according to the code being executed. The segment register acts as an index into a table, which provides an offset to be added to the virtual address. Except when using FS or GS as described above, the OS ensures that the offset will be zero. After the offset is added, the address is masked to be no larger than 32 bits. The result may be looked up via a tree-stuctured page table, with the bits of the address being split as follows: 10 bits for the root of the tree, 10 bits for the leaves of the tree, and the 12 lowest bits being directly copied to the result. No-execute support is only provided on a per-segment basis, making it very awkward to use. PaX is one way to emulate per-page non-execute support via the segments, with minor performance loss and the loss of half of the available address space. Minor revisions of the MMU introduced with the Pentium have allowed huge 2 MiB or 4 MiB pages by skipping the bottom level of the tree. Minor revisions of the MMU introduced with the Pentium Pro have allowed 36-bit physical addresses and specification of cachability by looking up a few high bits in a small on-CPU table.
AMD64
AMD64 is a 64-bit extension of x86, and thus is very similar. 64-bit usage, called long mode, will be described here. Excepting FS and GS, all segment offsets are ignored. The page table tree has four levels. The virtual addresses are divided up as follows: 16 bits unused, 9 bits each for 4 tree levels (total: 36 bits) , and the 12 lowest bits unmodified. The 16 highest bits are required to match the next highest bit; the low 48 bits are sign extended to fill the high 16 bits. A per-page no-execute bit, called the NX bit, can be used to block execution of individual pages.
S/390
The S/390 has the unusual feature of storing accessed and dirty bits outside of the page table. They refer to physical memory rather than virtual memory. They are accessed by special-purpose instructions. These unusual features make virtualization easier. They also reduce overhead for the OS, which would otherwise need to propagate accessed and dirty bits from the page tables to a more physically-oriented data structure.

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
de:Memory Management Unit

es:Unidad de Manejo de Memoria fr:Memory Management Unit it:Memory management unit ja:メモリ管理ユニット pl:Memory management unit pt:MMU sv:MMU