Team LiB
Previous Section Next Section

mmap() and do_mmap(): Creating an Address Interval

The do_mmap() function is used by the kernel to create a new linear address interval. Saying that this function creates a new VMA is not technically correct, because if the created address interval is adjacent to an existing address interval, and if they share the same permissions, the two intervals are merged into one. If this is not possible, a new VMA is created. In any case, do_mmap() is the function used to add an address interval to a process's address spacewhether that means expanding an existing memory area or creating a new one.

The do_mmap() function is declared in <linux/mm.h>:

unsigned long do_mmap(struct file *file, unsigned long addr,
                      unsigned long len, unsigned long prot,
                      unsigned long flag, unsigned long offset)

This function maps the file specified by file at offset offset for length len. The file parameter can be NULL and offset can be zero, in which case the mapping will not be backed by a file. In that case, this is called an anonymous mapping. If a file and offset are provided, the mapping is called a file-backed mapping.

The addr function optionally specifies the initial address from which to start the search for a free interval.

The prot parameter specifies the access permissions for pages in the memory area. The possible permission flags are defined in <asm/mman.h> and are unique to each supported architecture, although in practice each architecture defines the flags listed in Table 14.2.

Table 14.2. Page Protection Flags

Flag

Effect on the Pages in the New Interval

PROT_READ

Corresponds to VM_READ

PROT_WRITE

Corresponds to VM_WRITE

PROT_EXEC

Corresponds to VM_EXEC

PROT_NONE

Page cannot be accessed


The flags parameter specifies flags that correspond to the remaining VMA flags. These flags are also defined in <asm/mman.h>. See Table 14.3.

Table 14.3. Page Protection Flags

Flag

Effect on the New Interval

MAP_SHARED

The mapping can be shared

MAP_PRIVATE

The mapping cannot be shared

MAP_FIXED

The new interval must start at the given address addr

MAP_ANONYMOUS

The mapping is not file-backed, but is anonymous

MAP_GROWSDOWN

Corresponds to VM_GROWSDOWN

MAP_DENYWRITE

Corresponds to VM_DENYWRITE

MAP_EXECUTABLE

Corresponds to VM_EXECUTABLE

MAP_LOCKED

Corresponds to VM_LOCKED

MAP_NORESERVE

No need to reserve space for the mapping

MAP_POPULATE

Populate (prefault) page tables

MAP_NONBLOCK

Do not block on I/O


If any of the parameters are invalid, do_mmap() returns a negative value. Otherwise, a suitable interval in virtual memory is located. If possible, the interval is merged with an adjacent memory area. Otherwise, a new vm_area_struct structure is allocated from the vm_area_cachep slab cache, and the new memory area is added to the address space's linked list and red-black tree of memory areas via the vma_link() function. Next, the total_vm field in the memory descriptor is updated. Finally, the function returns the initial address of the newly created address interval.

The mmap() System Call

The do_mmap() functionality is exported to user-space via the mmap() system call. The mmap() system call is defined as

void  * mmap2(void *start,
              size_t length,
              int prot,
              int flags,
              int fd,
              off_t pgoff)

This system call is named mmap2() because it is the second variant of mmap(). The original mmap() took an offset in bytes as the last parameter; the current mmap2() receives the offset in pages. This enables larger files with larger offsets to be mapped. The original mmap(), as specified by POSIX, is available from the C library as mmap(), but is no longer implemented in the kernel proper, whereas the new version is available as mmap2(). Both library calls use the mmap2() system call, with the original mmap() converting the offset from bytes to pages.

    Team LiB
    Previous Section Next Section