Zones

Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into different zones. The kernel uses the zones to group pages of similar properties. In particular, Linux has to deal with two shortcomings of hardware with respect to memory addressing:

Some hardware devices are capable of performing DMA (direct memory access) to only certain memory addresses.
Some architectures are capable of physically addressing larger amounts of memory than they can virtually address. Consequently, some memory is not permanently mapped into the kernel address space.

Because of these constraints, there are three memory zones in Linux:

ZONE_DMA This zone contains pages that are capable of undergoing DMA.
ZONE_NORMAL This zone contains normal, regularly mapped, pages.
ZONE_HIGHMEM This zone contains "high memory," which are pages not permanently mapped into the kernel's address space.

These zones are defined in <linux/mmzone.h>.

The actual use and layout of the memory zones is architecture independent. For example, some architectures have no problem performing DMA into any memory address. In those architectures, ZONE_DMA is empty and ZONE_NORMAL is used for allocations regardless of their use. As a counterexample, on the x86 architecture, ISA devices^[1] cannot perform DMA into the full 32-bit address space, because ISA devices can access only the first 16MB of physical memory. Consequently, ZONE_DMA on x86 consists of all memory in the range 016MB.

^[1] Some broken PCI devices can perform DMA into only a 24-bit address space. But they are broken

ZONE_HIGHMEM works in the same regard. What an architecture can and cannot directly map varies. On x86, ZONE_HIGHMEM is all memory above the physical 896MB mark. On other architectures, ZONE_HIGHMEM is empty because all memory is directly mapped. The memory contained in ZONE_HIGHMEM is called high memory^[2]. The rest of the system's memory is called low memory.

^[2] This has nothing to do with high memory in DOS.

ZONE_NORMAL tends to be whatever is left over after the previous two zones claim their requisite shares. On x86, for example, ZONE_NORMAL is all physical memory from 16MB to 896MB. On other (more fortunate) architectures, ZONE_NORMAL is all available memory. Table 11.1 is a listing of each zone and its consumed pages on x86.

Table 11.1. Zones on x86
Zone
Description
Physical Memory
ZONE_DMA
DMA-able pages
< 16MB
ZONE_NORMAL
Normally addressable pages
16896MB
ZONE_HIGHMEM
Dynamically mapped pages
> 896MB

Linux partitions the system's pages into zones to have a pooling in place to satisfy allocations as needed. For example, having a ZONE_DMA pool gives the kernel the capability to satisfy memory allocations needed for DMA. If such memory is needed, the kernel can simply pull the required number of pages from ZONE_DMA. Note that the zones do not have any physical relevance; they are simply logical groupings used by the kernel to keep track of pages.

Although some allocations may require pages from a particular zone, the zones are not hard requirements in both directions. Although an allocation for DMA-able memory must originate from ZONE_DMA, a normal allocation can come from ZONE_DMA or ZONE_NORMAL. The kernel prefers to satisfy normal allocations from the normal zone, of course, to save the pages in ZONE_DMA for allocations that need it. But if push comes to shove (say, if memory should get low), the kernel can dip its fingers in whatever zone is available and suitable.

Each zone is represented by struct zone, which is defined in <linux/mmzone.h>:

struct zone {
        spinlock_t              lock;
        unsigned long           free_pages;
        unsigned long           pages_min;
        unsigned long           pages_low;
        unsigned long           pages_high;
        unsigned long           protection[MAX_NR_ZONES];
        spinlock_t              lru_lock;
        struct list_head        active_list;
        struct list_head        inactive_list;
        unsigned long           nr_scan_active;
        unsigned long           nr_scan_inactive;
        unsigned long           nr_active;
        unsigned long           nr_inactive;
        int                     all_unreclaimable;
        unsigned long           pages_scanned;
        int                     temp_priority;
        int                     prev_priority;
        struct free_area        free_area[MAX_ORDER];
        wait_queue_head_t       *wait_table;
        unsigned long           wait_table_size;
        unsigned long           wait_table_bits;
        struct per_cpu_pageset  pageset[NR_CPUS];
        struct pglist_data      *zone_pgdat;
        struct page             *zone_mem_map;
        unsigned long           zone_start_pfn;
        char                    *name;
        unsigned long           spanned_pages;
        unsigned long           present_pages;
};

The structure is big, but there are only three zones in the system and, thus, only three of these structures. Let's look at the more important fields.

The lock field is a spin lock that protects the structure from concurrent access. Note that it protects just the structure, and not all the pages that reside in the zone. A specific lock does not protect individual pages, although parts of the kernel may lock the data that happens to reside in said pages.

The free_pages field is the number of free pages in this zone. The kernel tries to keep at least pages_min pages free (through swapping), if possible.

The name field is, unsurprisingly, a NULL-terminated string representing the name of this zone. The kernel initializes this value during boot in mm/page_alloc.c and the three zones are given the names "DMA," "Normal," and "HighMem."