mmap() and do_mmap(): Creating an Address Interval

`mmap()` and `do_mmap()`: Creating an Address Interval

The do_mmap() function is used by the kernel to create a new linear address interval. Saying that this function creates a new VMA is not technically correct, because if the created address interval is adjacent to an existing address interval, and if they share the same permissions, the two intervals are merged into one. If this is not possible, a new VMA is created. In any case, do_mmap() is the function used to add an address interval to a process's address spacewhether that means expanding an existing memory area or creating a new one.

The do_mmap() function is declared in <linux/mm.h>:

unsigned long do_mmap(struct file *file, unsigned long addr,
                      unsigned long len, unsigned long prot,
                      unsigned long flag, unsigned long offset)

This function maps the file specified by file at offset offset for length len. The file parameter can be NULL and offset can be zero, in which case the mapping will not be backed by a file. In that case, this is called an anonymous mapping. If a file and offset are provided, the mapping is called a file-backed mapping.

The addr function optionally specifies the initial address from which to start the search for a free interval.

The prot parameter specifies the access permissions for pages in the memory area. The possible permission flags are defined in <asm/mman.h> and are unique to each supported architecture, although in practice each architecture defines the flags listed in Table 14.2.

Table 14.2. Page Protection Flags
Flag
Effect on the Pages in the New Interval
PROT_READ
Corresponds to VM_READ
PROT_WRITE
Corresponds to VM_WRITE
PROT_EXEC
Corresponds to VM_EXEC
PROT_NONE
Page cannot be accessed

The flags parameter specifies flags that correspond to the remaining VMA flags. These flags are also defined in <asm/mman.h>. See Table 14.3.

Table 14.3. Page Protection Flags
Flag
Effect on the New Interval
MAP_SHARED
The mapping can be shared
MAP_PRIVATE
The mapping cannot be shared
MAP_FIXED
The new interval must start at the given address addr
MAP_ANONYMOUS
The mapping is not file-backed, but is anonymous
MAP_GROWSDOWN
Corresponds to VM_GROWSDOWN
MAP_DENYWRITE
Corresponds to VM_DENYWRITE
MAP_EXECUTABLE
Corresponds to VM_EXECUTABLE
MAP_LOCKED
Corresponds to VM_LOCKED
MAP_NORESERVE
No need to reserve space for the mapping
MAP_POPULATE
Populate (prefault) page tables
MAP_NONBLOCK
Do not block on I/O

If any of the parameters are invalid, do_mmap() returns a negative value. Otherwise, a suitable interval in virtual memory is located. If possible, the interval is merged with an adjacent memory area. Otherwise, a new vm_area_struct structure is allocated from the vm_area_cachep slab cache, and the new memory area is added to the address space's linked list and red-black tree of memory areas via the vma_link() function. Next, the total_vm field in the memory descriptor is updated. Finally, the function returns the initial address of the newly created address interval.

The `mmap()` System Call

The do_mmap() functionality is exported to user-space via the mmap() system call. The mmap() system call is defined as

void  * mmap2(void *start,
              size_t length,
              int prot,
              int flags,
              int fd,
              off_t pgoff)

This system call is named mmap2() because it is the second variant of mmap(). The original mmap() took an offset in bytes as the last parameter; the current mmap2() receives the offset in pages. This enables larger files with larger offsets to be mapped. The original mmap(), as specified by POSIX, is available from the C library as mmap(), but is no longer implemented in the kernel proper, whereas the new version is available as mmap2(). Both library calls use the mmap2() system call, with the original mmap() converting the offset from bytes to pages.