Critical Regions and Race Conditions

Code paths that access and manipulate shared data are called critical regions. It is usually unsafe for multiple threads of execution to access the same resource simultaneously. To prevent concurrent access during critical regions, the programmer must ensure that the code executes atomicallythat is, the code completes without interruption as if the entire critical region were one indivisible instruction. It is a bug if it is possible for two threads of execution to be simultaneously in the same critical region. When this actually occurs, we call it a race condition (named because the threads raced to get there). Note how rare this could bedebugging race conditions is often very hard because they are not easily reproducible. Ensuring that unsafe concurrency is prevented and that race conditions do not occur is called synchronization.

Why Do We Need Protection?

To best identify race conditions, let's look at just how ubiquitous critical regions are. For a first example, let's consider a real world case: an ATM (Automated Teller Machine, called a cash machine, cashpoint, or ABM outside of the United States).

One of the most common functions performed by cash machines is withdrawing money from a person's personal bank account. A person walks up to the machine, inserts an ATM card, types in a PIN providing a modicum of authentication, selects Withdrawal, inputs a pecuniary amount, hits OK, takes the money, and mails it to me.

After the user has asked for a specific amount of money, the cash machine needs to ensure that the money actually exists in that user's account. If the money exists, it then needs to deduct the withdrawal from the total funds available. The code to implement this would look something like

int total = get_total_from_account();      /* total funds in account */
int withdrawal = get_withdrawal_amount();  /* amount user asked to withdrawal */

/* check whether the user has enough funds in her account */
if (total < withdrawal)
        error("You do not have that much money!")

/* OK, the user has enough money: deduct the withdrawal amount from her total */
total -= withdrawal;
update_total_funds(total);

/* give the user their money */
spit_out_money(withdrawal);

Now, let's presume that another deduction in the user's funds is happening at the same time. It does not matter how the simultaneous deduction is happening: Assume that the user's spouse is initiating another withdrawal at another ATM, or electronically transferring funds out of the account, or the bank is deducting a fee from the account (as banks these days are so wont to do), or whatever.

Both systems performing the withdrawal would have code similar to what we just looked at: first check whether the deduction is possible, then compute the new total funds, and finally execute the physical deduction. Now let's make up some numbers. Presume that the first deduction is a withdrawal from an ATM for $100 and that the second deduction is the bank applying a fee of $10 because the customer walked into the bank, (which is not allowed, you must use the ATM, we do not want to see you). Assume the customer has a total of $105 in the bank. Obviously, one of these transactions cannot correctly complete without sending the account into the red.

What you would expect is something like this: The fee transaction happens first. Ten dollars is less than $105, so ten is subtracted from 105 to get a new total of 95, and $10 is pocketed by the bank. Then the ATM withdrawal comes along and fails because $95 is less than $100.

Life can be much more interesting. Assume that the two transactions are initiated at roughly the same time. Both transactions verify that sufficient funds exist: $105 is more than both $100 and $10, so all is good. Then the withdrawal process subtracts $100 from $105, yielding $5. The fee transaction then does the same, subtracting $10 from $105 and getting $95. The withdrawal process then updates the user's new total available funds to $5. Now the fee transaction also updates the new total, resulting in $95. Free money!

Clearly, financial institutions must ensure that this can never happen. They must lock the account during certain operations, making each transaction atomic with respect to any other transaction. Such transactions must occur in their entirety, without interruption, or not occur at all.

The Single Variable

Now, let's look at a specific computing example. Consider a very simple shared resource, a single global integer, and a very simple critical region, the operation of merely incrementing it:

i++;

This might translate into machine instructions to the computer's processor that resemble the following:

get the current value of i and copy it into a register
add one to the value stored in the register
write back to memory the new value of i

Now, assume that there are two threads of execution, both enter this critical region, and the initial value of i is seven. The desired outcome is then similar to the following (with each row representing a unit of time):

Thread 1
Thread 2
get i (7)
-
increment i (7 -> 8)
-
write back i (8)
-
-
get i (8)
-
increment i (8 -> 9)
-
write back i (9)

As expected, seven incremented twice is nine.A possible outcome, however, is the following:

Thread 1
Thread 2
get i (7)
get i (7)
increment i (7 -> 8)
-
-
increment i (7 -> 8)
write back i (8)
-
-
write back i (8)

If both threads of execution read the initial value of i before it is incremented, both threads will increment and save the same value. As a result, the variable i contains the value eight when, in fact, it should now contain nine. This is one of the simplest examples of a critical region. Thankfully, the solution is equally as simple: We merely need a way to perform these operations in one indivisible step. Most processors provide an instruction to atomically read, increment, and write back a single variable. Using this atomic instruction, the only possible outcome is

Thread 1
Thread 2
increment i (7 -> 8)
-
-
increment i (8 -> 9)

Or:

Thread 1
Thread 2
-
increment i (7 -> 8)
increment i (8 -> 9)
-

It would never be possible for the two atomic operations to interleave. The processor would physically ensure that it was impossible. Using such an instruction would alleviate the problem. The kernel provides a set of interfaces that implement these atomic instructions; they are discussed in the next chapter.