Process TerminationIt is sad, but eventually processes must die. When a process terminates, the kernel releases the resources owned by the process and notifies the child's parent of its unfortunate demise. Typically, process destruction occurs when the process calls the exit() system call, either explicitly when it is ready to terminate or implicitly on return from the main subroutine of any program (that is, the C compiler places a call to exit() after main() returns). A process can also terminate involuntarily. This occurs when the process receives a signal or exception it cannot handle or ignore. Regardless of how a process terminates, the bulk of the work is handled by do_exit(), which completes a number of chores:
The code for do_exit() is defined in kernel/exit.c. At this point, all objects associated with the task (assuming the task was the sole user) are freed. The task is not runnable (and in fact no longer has an address space in which to run) and is in the TASK_ZOMBIE state. The only memory it occupies is its kernel stack, the thread_info structure, and the task_struct structure. The task exists solely to provide information to its parent. After the parent retrieves the information, or notifies the kernel that it is uninterested, the remaining memory held by the process is freed and returned to the system for use. Removal of the Process DescriptorAfter do_exit() completes, the process descriptor for the terminated process still exists but the process is a zombie and is unable to run. As discussed, this allows the system to obtain information about a child process after it has terminated. Consequently, the acts of cleaning up after a process and removing its process descriptor are separate. After the parent has obtained information on its terminated child, or signified to the kernel that it does not care, the child's task_struct is deallocated. The wait() family of functions are implemented via a single (and complicated) system call, wait4(). The standard behavior is to suspend execution of the calling task until one of its children exits, at which time the function returns with the PID of the exited child. Additionally, a pointer is provided to the function that on return holds the exit code of the terminated child. When it is time to finally deallocate the process descriptor, release_task() is invoked. It does the following:
At this point, the process descriptor and all resources belonging solely to the process have been freed. The Dilemma of the Parentless TaskIf a parent exits before its children, some mechanism must exist to reparent the child tasks to a new process, or else parentless terminated processes would forever remain zombies, wasting system memory. The solution, hinted upon previously, is to reparent a task's children on exit to either another process in the current thread group or, if that fails, the init process. In do_exit(), notify_parent() is invoked, which calls forget_original_parent() to perform the reparenting: struct task_struct *p, *reaper = father; struct list_head *list; if (father->exit_signal != -1) reaper = prev_thread(reaper); else reaper = child_reaper; if (reaper == father) reaper = child_reaper; This code sets reaper to another task in the process's thread group. If there is not another task in the thread group, it sets reaper to child_reaper, which is the init process. Now that a suitable new parent for the children is found, each child needs to be located and reparented to reaper: list_for_each(list, &father->children) { p = list_entry(list, struct task_struct, sibling); reparent_thread(p, reaper, child_reaper); } list_for_each(list, &father->ptrace_children) { p = list_entry(list, struct task_struct, ptrace_list); reparent_thread(p, reaper, child_reaper); } This code iterates over two lists: the child list and the ptraced child list, reparenting each child. The rationale behind having both lists is interesting; it is a new feature in the 2.6 kernel. When a task is ptraced, it is temporarily reparented to the debugging process. When the task's parent exits, however, it must be reparented along with its other siblings. In previous kernels, this resulted in a loop over every process in the system looking for children. The solution, as noted previously, is simply to keep a separate list of a process's children that are being ptracedreducing the search for one's children from every process to just two relatively small lists. With the process successfully reparented, there is no risk of stray zombie processes. The init process routinely calls wait() on its children, cleaning up any zombies assigned to it. |