ZonesBecause of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into different zones. The kernel uses the zones to group pages of similar properties. In particular, Linux has to deal with two shortcomings of hardware with respect to memory addressing:
Because of these constraints, there are three memory zones in Linux:
These zones are defined in <linux/mmzone.h>. The actual use and layout of the memory zones is architecture independent. For example, some architectures have no problem performing DMA into any memory address. In those architectures, ZONE_DMA is empty and ZONE_NORMAL is used for allocations regardless of their use. As a counterexample, on the x86 architecture, ISA devices[1] cannot perform DMA into the full 32-bit address space, because ISA devices can access only the first 16MB of physical memory. Consequently, ZONE_DMA on x86 consists of all memory in the range 016MB.
ZONE_HIGHMEM works in the same regard. What an architecture can and cannot directly map varies. On x86, ZONE_HIGHMEM is all memory above the physical 896MB mark. On other architectures, ZONE_HIGHMEM is empty because all memory is directly mapped. The memory contained in ZONE_HIGHMEM is called high memory[2]. The rest of the system's memory is called low memory.
ZONE_NORMAL tends to be whatever is left over after the previous two zones claim their requisite shares. On x86, for example, ZONE_NORMAL is all physical memory from 16MB to 896MB. On other (more fortunate) architectures, ZONE_NORMAL is all available memory. Table 11.1 is a listing of each zone and its consumed pages on x86.
Linux partitions the system's pages into zones to have a pooling in place to satisfy allocations as needed. For example, having a ZONE_DMA pool gives the kernel the capability to satisfy memory allocations needed for DMA. If such memory is needed, the kernel can simply pull the required number of pages from ZONE_DMA. Note that the zones do not have any physical relevance; they are simply logical groupings used by the kernel to keep track of pages. Although some allocations may require pages from a particular zone, the zones are not hard requirements in both directions. Although an allocation for DMA-able memory must originate from ZONE_DMA, a normal allocation can come from ZONE_DMA or ZONE_NORMAL. The kernel prefers to satisfy normal allocations from the normal zone, of course, to save the pages in ZONE_DMA for allocations that need it. But if push comes to shove (say, if memory should get low), the kernel can dip its fingers in whatever zone is available and suitable. Each zone is represented by struct zone, which is defined in <linux/mmzone.h>: struct zone { spinlock_t lock; unsigned long free_pages; unsigned long pages_min; unsigned long pages_low; unsigned long pages_high; unsigned long protection[MAX_NR_ZONES]; spinlock_t lru_lock; struct list_head active_list; struct list_head inactive_list; unsigned long nr_scan_active; unsigned long nr_scan_inactive; unsigned long nr_active; unsigned long nr_inactive; int all_unreclaimable; unsigned long pages_scanned; int temp_priority; int prev_priority; struct free_area free_area[MAX_ORDER]; wait_queue_head_t *wait_table; unsigned long wait_table_size; unsigned long wait_table_bits; struct per_cpu_pageset pageset[NR_CPUS]; struct pglist_data *zone_pgdat; struct page *zone_mem_map; unsigned long zone_start_pfn; char *name; unsigned long spanned_pages; unsigned long present_pages; }; The structure is big, but there are only three zones in the system and, thus, only three of these structures. Let's look at the more important fields. The lock field is a spin lock that protects the structure from concurrent access. Note that it protects just the structure, and not all the pages that reside in the zone. A specific lock does not protect individual pages, although parts of the kernel may lock the data that happens to reside in said pages. The free_pages field is the number of free pages in this zone. The kernel tries to keep at least pages_min pages free (through swapping), if possible. The name field is, unsurprisingly, a NULL-terminated string representing the name of this zone. The kernel initializes this value during boot in mm/page_alloc.c and the three zones are given the names "DMA," "Normal," and "HighMem." |