Virtual memory took decades of real problems and clever tricks to arrive at. Instead of explaining what it is, let's trace why we needed it.
In the earliest computers, there was no operating system. No multitasking. You loaded one program into memory, it ran, it finished, you loaded the next one. The program owned everything — every byte of RAM, every address.
The memory map was dead simple. The program started at some known address, say 0x0000,
and used as much RAM as it needed. Whatever was left over just sat there, unused.
Addresses in the code were physical addresses. When the program said
"store this value at address 0x4000," it meant physical address 0x4000 in
the actual RAM chip. No translation. No indirection. The program saw reality as it was.
This worked fine. One program, one machine, no conflicts.
But machines were expensive. Really expensive. And most of the time, the program was waiting — waiting for a punch card to be read, waiting for a tape to spin to the right position, waiting for output to print. The CPU sat idle. That expensive CPU, doing nothing.
The question became obvious: what if we ran more than one program at a time?
Let's say we have 64 KB of RAM and we want to run two programs. The simplest idea:
just load both into memory at the same time. Program A starts at 0x0000,
program B starts at 0x8000.
Sounds straightforward. But this creates three serious problems.
When program B was compiled, the compiler didn't know it would be loaded at
0x8000. It assumed it starts at 0x0000, like every other
program. So all its internal addresses — jump targets, data references, function
pointers — are wrong.
You could fix this by relocating the program at load time: scanning every address in the binary and adding an offset. Early systems actually did this. It worked, but it was slow and fragile.
Program A can read and write any physical address. Including the memory that belongs to program B. A bug in A — a wild pointer, an out-of-bounds array access — can silently corrupt B's data. Or worse, corrupt the OS itself.
This isn't hypothetical. In early multiprogramming systems, a buggy program could bring down every other program on the machine. There was no isolation. Programs had to be trusted, and that only works when there are very few of them.
If program A needs 20 KB and we give it a 32 KB partition, 12 KB is wasted. If program B needs 40 KB and the remaining partition is only 32 KB, it can't run at all — even though there's 12 KB of unused memory sitting in A's partition.
We need a system that solves all three: let programs use any address they want, protect them from each other, and use memory efficiently. The search for that system is what led to virtual memory.
The first real attempt at sharing memory was straightforward: divide physical memory into fixed-size partitions at boot time. Each partition gets one program. The OS takes one partition for itself.
What about the address problem? Each partition has a known starting address, so the hardware can add a simple base register that offsets every address by the partition's start. The OS loads the right base on each context switch. Programs compile assuming they start at zero — the hardware fixes it transparently.
For protection, the hardware adds a bounds check: before every memory access, it verifies that the address falls within the current program's assigned partition. If not, the hardware traps to the OS, and the offending program is killed. We finally have isolation.
But this scheme has obvious weaknesses.
The fundamental issue: fixed partitions force you to guess the future. How big should partitions be? How many? Get it wrong and you waste memory or can't run certain programs. We need something more flexible.
Here's a better idea. Instead of fixed partitions, give each program a variable-size region of memory. The hardware tracks two values for the running program:
Every memory access gets translated by the hardware:
// Hardware does this on every memory access: if (address >= bound) trap("segmentation fault"); // kill the process physical_address = base + address;
The program thinks it starts at address 0x0000. It uses addresses
0x0000 through whatever it needs. The hardware transparently adds the
base to every address, mapping it to the right spot in physical RAM. The program
never sees physical addresses. It can't access memory outside its region because the
bound check catches it.
This is a real improvement. Programs are compiled assuming they start at address zero. No relocation needed at load time — the hardware does it on every access. Programs are isolated. And partitions can be any size.
When the OS switches from running program A to program B, it saves A's base and bound registers and loads B's. Each program gets its own view of memory. This is called dynamic relocation.
As programs start and exit, memory becomes full of small gaps. Suppose four programs are running, and B and D exit:
There's enough total free memory, but it's split into small gaps. A new program that needs a contiguous block might not fit anywhere. This is external fragmentation — free memory exists but can't be used.
You could compact memory: stop everything, slide programs together to consolidate free space, update all the base registers. But compaction is expensive. Every byte of every running program might need to be copied. The system freezes while this happens. Not great.
We need a scheme where a program's memory doesn't have to be contiguous in physical RAM.
Base and bound gave each program one contiguous region. Segmentation extends this idea: give each program multiple base-and-bound pairs, one for each logical segment of its address space.
A typical program has at least three logical regions:
Each segment gets its own base, bound, and permissions (read, write, execute). The virtual address now has two parts: the segment number (which segment?) and the offset (where in that segment?).
// Hardware translation with segmentation: segment = address >> 14; // top 2 bits = segment number offset = address & 0x3FFF; // bottom 14 bits = offset if (offset >= segments[segment].bound) trap("segmentation fault"); if (!check_permissions(segments[segment], access_type)) trap("protection fault"); physical = segments[segment].base + offset;
This is a big step forward. A program's code, data, and stack can now live in different places in physical memory. They don't need to be next to each other. The hardware connects them through the segment table so the program sees a clean, continuous address space.
Even better, segments enable sharing. Two programs running the same executable can share a single physical copy of the code segment. Each has its own data and stack segments, but the read-only code is shared. Less memory used, more programs running.
Segmentation made things better, but it didn't eliminate the core problem. Variable-size chunks of memory always lead to fragmentation. Free space breaks into oddly-shaped holes. Managing those holes is a headache.
What if we made every chunk the same size?
Paging is the approach that every modern OS uses today. It takes a completely different approach to memory management.
Instead of dividing memory into variable-sized segments, we chop everything into small, equal-sized pieces:
The mapping between pages and frames is stored in a page table — a data structure maintained by the OS and read by the hardware (the MMU, the Memory Management Unit) on every memory access.
// Hardware translation with paging: page_number = address >> 12; // top bits = page number offset = address & 0xFFF; // bottom 12 bits = offset within page frame = page_table[page_number]; if (!frame.present) trap("page fault"); // page not in physical memory if (!check_permissions(frame, access_type)) trap("protection fault"); physical = frame.number * 4096 + offset;
This is the key insight: the program sees contiguous memory, but physical memory can be completely scattered. Page 0 might be in frame 3. Page 1 might be in frame 6. Page 255 might be in frame 4. It doesn't matter. The page table keeps track of the mapping, and the hardware translates transparently.
Because every page and every frame is the same size (4 KB), there's no such thing as "this free region is too small." Any free frame can hold any page. If there are 10 free frames scattered across physical memory, you can use all 10 — they don't need to be adjacent.
No external fragmentation. Gone. The only waste is internal fragmentation within the last page of a segment (if a program uses 5000 bytes, it needs two 4 KB pages, wasting ~3 KB). With 4 KB pages, the maximum waste per segment is 4095 bytes — negligible when machines have gigabytes of RAM.
Each entry in the page table isn't just a frame number. It also contains several important bits:
These bits give the OS fine-grained control over every single page of every process's memory. And they enable two critical features that base-and-bound and segmentation couldn't do well.
The present bit enables a powerful trick. When a program asks for memory, the OS can create a page table entry with the present bit cleared. No physical RAM is allocated. The program's virtual address space grows, but no real resources are consumed.
Only when the program actually touches that page does the hardware discover the present bit is zero. It triggers a page fault — the CPU traps to the OS, which allocates a physical frame, fills it with the right data (or zeros), updates the page table entry, and lets the program continue. The program never knows anything happened.
This is why a program can have gigabytes of virtual address space while using only megabytes of physical RAM. The OS only allocates real memory for pages the program actually uses.
What happens when physical RAM fills up? The OS picks a page that hasn't been used recently, writes it to disk, clears its present bit, and frees the frame. If the program accesses that page again, a page fault brings it back from disk.
This means programs can collectively use more memory than physically exists. The OS juggles pages between RAM and disk, keeping the hot pages in RAM and the cold pages on disk. From the program's perspective, nothing changes — it just gets slower when its pages get evicted.
The cons are minor and well-solved. Multi-level page tables compress the table by only allocating entries for the address ranges actually in use. The TLB (Translation Lookaside Buffer) caches recent translations so most lookups take a single CPU cycle.
Paging is so effective that it's the memory management scheme used by every modern OS — Linux, Windows, macOS, Android, iOS. It's the foundation of virtual memory as we know it.
The virtual memory you're using right now is paging, plus several optimizations that were added over the decades. Here's what a modern system (like x86-64 Linux) actually does:
A flat page table for a 48-bit address space would need 236 entries — 512 GB just for the table. Obviously unworkable. Instead, modern CPUs use a hierarchical page table. On x86-64, it's four levels deep:
Each level is a table with 512 entries. The MMU walks the tree: PML4 entry points to a PDPT, which points to a PD, which points to a PT, which gives the physical frame. Four memory accesses to resolve one virtual address.
The genius of this structure: most entries don't exist. If a process only uses a few regions of its address space, only the relevant branches of the tree are allocated. A typical process might need a few hundred KB for its page tables, not 512 GB.
Four memory accesses per translation would be catastrophic for performance. The TLB (Translation Lookaside Buffer) fixes this. It's a small, extremely fast cache inside the CPU that stores recent virtual-to-physical mappings.
A TLB hit resolves the translation in ~1 CPU cycle. A TLB miss triggers the full 4-level page table walk. On modern workloads, the TLB hit rate is typically above 99%. The 1% of misses are expensive, but rare enough to be manageable.
When the OS switches from process A to process B, it loads B's page table root address
into the CR3 register (on x86-64). The same virtual address
0x400000 now maps to completely different physical memory. Process A can't
see B's pages. Process B can't see A's. Complete isolation.
When a process calls fork(), the OS doesn't copy all its memory. Instead,
both parent and child share the same physical frames, with all pages marked
read-only. When either process tries to write to a shared
page, the hardware triggers a page fault. The OS then makes a private copy of just
that page. Pages that are never written are never copied.
This is why fork() is fast: it copies the page table, not the actual
memory. Most pages are never modified by the child, so they're never duplicated.
The OS can map a file directly into a process's address space. Reading from those
virtual addresses reads from the file. The kernel uses the page fault mechanism to
load file contents into physical frames on demand. This is how shared libraries work:
libc.so is mapped into every process, but only one copy exists in
physical RAM.
Let's look at how far we've come:
Each step solved a real problem but introduced new ones. The pattern is the same pattern you see everywhere in systems design: add a layer of indirection. Physical addresses were too rigid, so we added virtual addresses. A single mapping was too coarse, so we added per-segment mappings. Variable-sized segments caused fragmentation, so we switched to fixed-size pages.
The result is the virtual memory system running on your machine right now: an elaborate illusion that gives every process a private, contiguous, 256 TiB address space — all backed by maybe 8 or 16 GB of physical RAM, managed transparently by the hardware and kernel.