How Does Virtual Memory Work?

One Program, the Whole Machine

In the earliest computers, there was no operating system. No multitasking. You loaded one program into memory, it ran, it finished, you loaded the next one. The program owned everything — every byte of RAM, every address.

The memory map was dead simple. The program started at some known address, say 0x0000, and used as much RAM as it needed. Whatever was left over just sat there, unused.

Addresses in the code were physical addresses. When the program said "store this value at address 0x4000," it meant physical address 0x4000 in the actual RAM chip. No translation. No indirection. The program saw reality as it was.

This worked fine. One program, one machine, no conflicts.

But machines were expensive. Really expensive. And most of the time, the program was waiting — waiting for a punch card to be read, waiting for a tape to spin to the right position, waiting for output to print. The CPU sat idle. That expensive CPU, doing nothing.

The question became obvious: what if we ran more than one program at a time?

Two Programs, One Problem

Let's say we have 64 KB of RAM and we want to run two programs. The simplest idea: just load both into memory at the same time. Program A starts at 0x0000, program B starts at 0x8000.

Sounds straightforward. But this creates three serious problems.

Problem 1: Addresses Are Baked In

When program B was compiled, the compiler didn't know it would be loaded at 0x8000. It assumed it starts at 0x0000, like every other program. So all its internal addresses — jump targets, data references, function pointers — are wrong.

You could fix this by relocating the program at load time: scanning every address in the binary and adding an offset. Early systems actually did this. It worked, but it was slow and fragile.

Problem 2: No Protection

Program A can read and write any physical address. Including the memory that belongs to program B. A bug in A — a wild pointer, an out-of-bounds array access — can silently corrupt B's data. Or worse, corrupt the OS itself.

This isn't hypothetical. In early multiprogramming systems, a buggy program could bring down every other program on the machine. There was no isolation. Programs had to be trusted, and that only works when there are very few of them.

Problem 3: Memory Is Wasted

If program A needs 20 KB and we give it a 32 KB partition, 12 KB is wasted. If program B needs 40 KB and the remaining partition is only 32 KB, it can't run at all — even though there's 12 KB of unused memory sitting in A's partition.

We need a system that solves all three: let programs use any address they want, protect them from each other, and use memory efficiently. The search for that system is what led to virtual memory.

Fixed Partitions

The first real attempt at sharing memory was straightforward: divide physical memory into fixed-size partitions at boot time. Each partition gets one program. The OS takes one partition for itself.

What about the address problem? Each partition has a known starting address, so the hardware can add a simple base register that offsets every address by the partition's start. The OS loads the right base on each context switch. Programs compile assuming they start at zero — the hardware fixes it transparently.

For protection, the hardware adds a bounds check: before every memory access, it verifies that the address falls within the current program's assigned partition. If not, the hardware traps to the OS, and the offending program is killed. We finally have isolation.

But this scheme has obvious weaknesses.

Pros

Simple to implement
Hardware protection between programs
Fast — almost no overhead

Cons

Internal fragmentation — programs rarely fill their partition exactly
Fixed number of programs — partition count is decided at boot
A program can't be bigger than a single partition
Partition sizes must accommodate the largest expected program

The fundamental issue: fixed partitions force you to guess the future. How big should partitions be? How many? Get it wrong and you waste memory or can't run certain programs. We need something more flexible.

Base and Bound

Here's a better idea. Instead of fixed partitions, give each program a variable-size region of memory. The hardware tracks two values for the running program:

The base register — where this program's memory starts in physical RAM
The bound register (or limit) — how much memory it's allowed to use

Every memory access gets translated by the hardware:

// Hardware does this on every memory access:
if (address >= bound)
    trap("segmentation fault");   // kill the process
physical_address = base + address;

The program thinks it starts at address 0x0000. It uses addresses 0x0000 through whatever it needs. The hardware transparently adds the base to every address, mapping it to the right spot in physical RAM. The program never sees physical addresses. It can't access memory outside its region because the bound check catches it.

The base register relocates every address; the bound register enforces the limit

This is a real improvement. Programs are compiled assuming they start at address zero. No relocation needed at load time — the hardware does it on every access. Programs are isolated. And partitions can be any size.

When the OS switches from running program A to program B, it saves A's base and bound registers and loads B's. Each program gets its own view of memory. This is called dynamic relocation.

Pros

Programs don't need to know their physical address
Variable-size regions — no guessing
Hardware-enforced protection
The OS can relocate a program by just changing its base register

Cons

External fragmentation — as programs come and go, physical memory becomes a patchwork of holes
Each program must be contiguous in physical memory
No easy way for two programs to share the same code (e.g., a shared library)
Growing a program is painful — what if there's another program right after it?

The Fragmentation Problem

As programs start and exit, memory becomes full of small gaps. Suppose four programs are running, and B and D exit:

There's enough total free memory, but it's split into small gaps. A new program that needs a contiguous block might not fit anywhere. This is external fragmentation — free memory exists but can't be used.

You could compact memory: stop everything, slide programs together to consolidate free space, update all the base registers. But compaction is expensive. Every byte of every running program might need to be copied. The system freezes while this happens. Not great.

We need a scheme where a program's memory doesn't have to be contiguous in physical RAM.

Segmentation

Base and bound gave each program one contiguous region. Segmentation extends this idea: give each program multiple base-and-bound pairs, one for each logical segment of its address space.

A typical program has at least three logical regions:

The code segment — the program's instructions (read-only)
The data segment — global variables, the heap (read-write)
The stack segment — local variables, function call frames (grows/shrinks)

Each segment gets its own base, bound, and permissions (read, write, execute). The virtual address now has two parts: the segment number (which segment?) and the offset (where in that segment?).

// Hardware translation with segmentation:
segment = address >> 14;          // top 2 bits = segment number
offset  = address & 0x3FFF;       // bottom 14 bits = offset

if (offset >= segments[segment].bound)
    trap("segmentation fault");

if (!check_permissions(segments[segment], access_type))
    trap("protection fault");

physical = segments[segment].base + offset;

Segments can live in different parts of physical memory — they don't need to be adjacent

This is a big step forward. A program's code, data, and stack can now live in different places in physical memory. They don't need to be next to each other. The hardware connects them through the segment table so the program sees a clean, continuous address space.

Even better, segments enable sharing. Two programs running the same executable can share a single physical copy of the code segment. Each has its own data and stack segments, but the read-only code is shared. Less memory used, more programs running.

Pros

Segments match how programs actually work (code, data, stack)
Enables code sharing between processes
Per-segment permissions (code is read+execute, data is read+write)
Sparse address spaces — no need to allocate memory between segments

Cons

External fragmentation is still a problem — segments are variable-sized, so free memory still breaks into scattered gaps
Growing a segment (e.g., the heap) may require moving it
The OS has to manage variable-sized free regions, which is complex

Segmentation made things better, but it didn't eliminate the core problem. Variable-size chunks of memory always lead to fragmentation. Free space breaks into oddly-shaped holes. Managing those holes is a headache.

What if we made every chunk the same size?

Paging — The Breakthrough

Paging is the approach that every modern OS uses today. It takes a completely different approach to memory management.

Instead of dividing memory into variable-sized segments, we chop everything into small, equal-sized pieces:

Virtual memory is divided into pages (typically 4 KB each)
Physical memory is divided into frames (same size: 4 KB)
Any page can map to any frame

The mapping between pages and frames is stored in a page table — a data structure maintained by the OS and read by the hardware (the MMU, the Memory Management Unit) on every memory access.

// Hardware translation with paging:
page_number = address >> 12;           // top bits = page number
offset      = address & 0xFFF;         // bottom 12 bits = offset within page

frame = page_table[page_number];

if (!frame.present)
    trap("page fault");              // page not in physical memory

if (!check_permissions(frame, access_type))
    trap("protection fault");

physical = frame.number * 4096 + offset;

Pages map to frames in any order — contiguous virtual, scattered physical

This is the key insight: the program sees contiguous memory, but physical memory can be completely scattered. Page 0 might be in frame 3. Page 1 might be in frame 6. Page 255 might be in frame 4. It doesn't matter. The page table keeps track of the mapping, and the hardware translates transparently.

Why This Kills Fragmentation

Because every page and every frame is the same size (4 KB), there's no such thing as "this free region is too small." Any free frame can hold any page. If there are 10 free frames scattered across physical memory, you can use all 10 — they don't need to be adjacent.

No external fragmentation. Gone. The only waste is internal fragmentation within the last page of a segment (if a program uses 5000 bytes, it needs two 4 KB pages, wasting ~3 KB). With 4 KB pages, the maximum waste per segment is 4095 bytes — negligible when machines have gigabytes of RAM.

What the Page Table Entry Stores

Each entry in the page table isn't just a frame number. It also contains several important bits:

Present bit — is this page actually backed by physical RAM right now?
Read/Write bit — can the program write to this page?
User/Supervisor bit — can user-mode code access this?
Dirty bit — has this page been written to?
Accessed bit — has this page been read or written recently?

These bits give the OS fine-grained control over every single page of every process's memory. And they enable two critical features that base-and-bound and segmentation couldn't do well.

Demand Paging: Memory That Doesn't Exist Yet

The present bit enables a powerful trick. When a program asks for memory, the OS can create a page table entry with the present bit cleared. No physical RAM is allocated. The program's virtual address space grows, but no real resources are consumed.

Only when the program actually touches that page does the hardware discover the present bit is zero. It triggers a page fault — the CPU traps to the OS, which allocates a physical frame, fills it with the right data (or zeros), updates the page table entry, and lets the program continue. The program never knows anything happened.

This is why a program can have gigabytes of virtual address space while using only megabytes of physical RAM. The OS only allocates real memory for pages the program actually uses.

Swapping: Using Disk as Overflow

What happens when physical RAM fills up? The OS picks a page that hasn't been used recently, writes it to disk, clears its present bit, and frees the frame. If the program accesses that page again, a page fault brings it back from disk.

This means programs can collectively use more memory than physically exists. The OS juggles pages between RAM and disk, keeping the hot pages in RAM and the cold pages on disk. From the program's perspective, nothing changes — it just gets slower when its pages get evicted.

Pros

No external fragmentation — any free frame fits any page
Demand paging — only allocate RAM for pages that are actually used
Swapping — use disk as overflow, run more programs than RAM allows
Fine-grained permissions per page
Easy code sharing — two processes can map the same frame

Cons

Page tables themselves use memory (a full 48-bit address space needs a lot of entries)
Every memory access needs a page table lookup — solved by the TLB cache
Small internal fragmentation (up to one page per segment)

The cons are minor and well-solved. Multi-level page tables compress the table by only allocating entries for the address ranges actually in use. The TLB (Translation Lookaside Buffer) caches recent translations so most lookups take a single CPU cycle.

Paging is so effective that it's the memory management scheme used by every modern OS — Linux, Windows, macOS, Android, iOS. It's the foundation of virtual memory as we know it.

Modern Virtual Memory

The virtual memory you're using right now is paging, plus several optimizations that were added over the decades. Here's what a modern system (like x86-64 Linux) actually does:

Multi-Level Page Tables

A flat page table for a 48-bit address space would need 2³⁶ entries — 512 GB just for the table. Obviously unworkable. Instead, modern CPUs use a hierarchical page table. On x86-64, it's four levels deep:

Each level is a table with 512 entries. The MMU walks the tree: PML4 entry points to a PDPT, which points to a PD, which points to a PT, which gives the physical frame. Four memory accesses to resolve one virtual address.

The genius of this structure: most entries don't exist. If a process only uses a few regions of its address space, only the relevant branches of the tree are allocated. A typical process might need a few hundred KB for its page tables, not 512 GB.

The TLB — Making It Fast

Four memory accesses per translation would be catastrophic for performance. The TLB (Translation Lookaside Buffer) fixes this. It's a small, extremely fast cache inside the CPU that stores recent virtual-to-physical mappings.

A TLB hit resolves the translation in ~1 CPU cycle. A TLB miss triggers the full 4-level page table walk. On modern workloads, the TLB hit rate is typically above 99%. The 1% of misses are expensive, but rare enough to be manageable.

Each Process Gets Its Own Page Table

When the OS switches from process A to process B, it loads B's page table root address into the CR3 register (on x86-64). The same virtual address 0x400000 now maps to completely different physical memory. Process A can't see B's pages. Process B can't see A's. Complete isolation.

Copy-on-Write

When a process calls fork(), the OS doesn't copy all its memory. Instead, both parent and child share the same physical frames, with all pages marked read-only. When either process tries to write to a shared page, the hardware triggers a page fault. The OS then makes a private copy of just that page. Pages that are never written are never copied.

This is why fork() is fast: it copies the page table, not the actual memory. Most pages are never modified by the child, so they're never duplicated.

Memory-Mapped Files

The OS can map a file directly into a process's address space. Reading from those virtual addresses reads from the file. The kernel uses the page fault mechanism to load file contents into physical frames on demand. This is how shared libraries work: libc.so is mapped into every process, but only one copy exists in physical RAM.

The Journey

Let's look at how far we've come:

Bare metal — one program, physical addresses, no protection
Fixed partitions — multiple programs, but rigid and wasteful
Base and bound — flexible sizes, but external fragmentation and no sharing
Segmentation — logical regions, sharing possible, but fragmentation persists
Paging — fixed-size pages eliminate fragmentation, enable demand paging and swapping
Modern VM — multi-level page tables, TLB, copy-on-write, memory-mapped files

Each step solved a real problem but introduced new ones. The pattern is the same pattern you see everywhere in systems design: add a layer of indirection. Physical addresses were too rigid, so we added virtual addresses. A single mapping was too coarse, so we added per-segment mappings. Variable-sized segments caused fragmentation, so we switched to fixed-size pages.

The result is the virtual memory system running on your machine right now: an elaborate illusion that gives every process a private, contiguous, 256 TiB address space — all backed by maybe 8 or 16 GB of physical RAM, managed transparently by the hardware and kernel.

Next up: I'm writing about the memory layout of a running process — text, data, heap, stack, page faults, and how it all fits together. Coming soon.

Resources

Operating Systems: Three Easy Pieces — Chapters 13–23 (free)