[ Team LiB ] Previous Section Next Section

26.8 Condition Variables

A mutex is fine to prevent simultaneous access to a shared variable, but we need something else to let us go to sleep waiting for some condition to occur. Let's demonstrate this with an example. We return to our Web client in Section 26.6 and replace the Solaris thr_join with pthread_join. But, we cannot call the Pthread function until we know that a thread has terminated. We first declare a global variable that counts the number of terminated threads and protect it with a mutex.


     int             ndone;        /* number of terminated threads */
     pthread_mutex_t ndone_mutex = PTHREAD_MUTEX_INITIALIZER;

We then require that each thread increment this counter when it terminates, being careful to use the associated mutex.


     void *
     do_get_read (void *vptr)
     {
         ...

         Pthread_mutex_lock(&ndone_mutex);
         ndone++;
         Pthread_mutex_unlock(&ndone_mutex);

         return(fptr);       /* terminate thread */
     }

This is fine, but how do we code the main loop? It needs to lock the mutex continually and check if any threads have terminated.


          while (nlefttoread > 0) {
              while (nconn < maxnconn && nlefttoconn > 0) {
                      /* find a file to read */
                  ...
              }
                  /* See if one of the threads is done */
              Pthread_mutex_lock(&ndone_mutex);
              if (ndone > 0) {
                  for (i = 0; i < nfiles; i++) {
                      if (file[i].f_flags & F_DONE) {
                          Pthread_join(file[i].f_tid, (void **) &fptr);

                          /* update file[i] for terminated thread */
                          ...
                      }
                  }
              }
              Pthread_mutex_unlock(&ndone_mutex);
          }

While this is okay, it means the main loop never goes to sleep; it just loops, checking ndone every time around the loop. This is called polling and is considered a waste of CPU time.

We want a method for the main loop to go to sleep until one of its threads notifies it that something is ready. A condition variable, in conjunction with a mutex, provides this facility. The mutex provides mutual exclusion and the condition variable provides a signaling mechanism.

In terms of Pthreads, a condition variable is a variable of type pthread_cond_t. They are used with the following two functions:

#include <pthread.h>

int pthread_cond_wait(pthread_cond_t *cptr, pthread_mutex_t *mptr);

int pthread_cond_signal(pthread_cond_t *cptr);

Both return: 0 if OK, positive Exxx value on error

The term "signal" in the second function's name does not refer to a Unix SIGxxx signal.

An example is the easiest way to explain these functions. Returning to our Web client example, the counter ndone is now associated with both a condition variable and a mutex.


     int             ndone;
     pthread_mutex_t ndone_mutex = PTHREAD_MUTEX_INITIALIZER;
     pthread_cond_t  ndone_cond  = PTHREAD_COND_INITIALIZER;

A thread notifies the main loop that it is terminating by incrementing the counter while its mutex lock is held and by signaling the condition variable.


          Pthread_mutex_lock(&ndone_mutex);
          ndone++;
          Pthread_cond_signal(&ndone_cond);
          Pthread_mutex_unlock(&ndone_mutex);

The main loop then blocks in a call to pthread_cond_wait, waiting to be signaled by a terminating thread.


          while (nlefttoread > 0) {
              while (nconn < maxnconn && nlefttoconn > 0) {
                      /* find file to read */
                  ...
              }

                  /* Wait for thread to terminate */
              Pthread_mutex_lock(&ndone_mutex);
              while (ndone == 0)
                  Pthread_cond_wait (&ndone_cond, &ndone_mutex);

              for (i = 0; i < nfiles; i++) {
                  if (file[i].f_flags & F_DONE) {
                      Pthread_join(file[i].f_tid, (void **) &fptr);

                      /* update file[i] for terminated thread */
                      ...
                  }
               }
               Pthread_mutex_unlock (&ndone_mutex);
          }

Notice that the variable ndone is still checked only while the mutex is held. Then, if there is nothing to do, pthread_cond_wait is called. This puts the calling thread to sleep and releases the mutex lock it holds. Furthermore, when the thread returns from pthread_cond_wait (after some other thread has signaled it), the thread again holds the mutex.

Why is a mutex always associated with a condition variable? The "condition" is normally the value of some variable that is shared between the threads. The mutex is required to allow this variable to be set and tested by the different threads. For example, if we did not have the mutex in the example code just shown, the main loop would test it as follows:


              /* Wait for thread to terminate */
          while (ndone == 0)
              Pthread_cond_wait(&ndone_cond, &ndone_mutex);

But, there is a possibility that the last of the threads increments ndone after the test of ndone == 0, but before the call to pthread_cond_wait. If this happens, this last "signal" is lost and the main loop would block forever, waiting for something that will never occur again.

This is the same reason that pthread_cond_wait must be called with the associated mutex locked, and why this function unlocks the mutex and puts the calling thread to sleep as a single, atomic operation. If this function did not unlock the mutex and then lock it again when it returns, the thread would have to unlock and lock the mutex and the code would look like the following:


              /* Wait for thread to terminate */
          Pthread_mutex_lock(&ndone_mutex);
          while (ndone == 0) {
              Pthread_mutex_unlock(&ndone_mutex);
              Pthread_cond_wait(&ndone_cond, &ndone_mutex);
              Pthread_mutex_lock(&ndone_mutex);
          }

But again, there is a possibility that the final thread could terminate and increment the value of ndone between the call to pthread_mutex_unlock and pthread_cond_wait.

Normally, pthread_cond_signal awakens one thread that is waiting on the condition variable. There are instances when a thread knows that multiple threads should be awakened, in which case, pthread_cond_broadcast will wake up all threads that are blocked on the condition variable.

#include <pthread.h>

int pthread_cond_broadcast (pthread_cond_t * cptr);

int pthread_cond_timedwait (pthread_cond_t * cptr, pthread_mutex_t *mptr, const struct timespec *abstime);

Both return: 0 if OK, positive Exxx value on error

pthread_cond_timedwait lets a thread place a limit on how long it will block. abstime is a timespec structure (as we defined with the pselect function, Section 6.9) that specifies the system time when the function must return, even if the condition variable has not been signaled yet. If this timeout occurs, ETIME is returned.

This time value is an absolute time; it is not a time delta. That is, abstime is the system time—the number of seconds and nanoseconds past January 1, 1970, UTC—when the function should return. This differs from both select and pselect, which specify the number of seconds and microseconds (nanoseconds for pselect) until some time in the future when the function should return. The normal procedure is to call gettimeofday to obtain the current time (as a timeval structure!), and copy this into a timespec structure, adding in the desired time limit. For example,


     struct timeval tv;
     struct timespec ts;
     if (gettimeofday(&tv, NULL) < 0)
         err_sys("gettimeofday error");
     ts.tv_sec = tv.tv_sec + 5;     /* 5 seconds in future */
     ts.tv_nsec = tv.tv_usec * 1000; /* microsec to nanosec */

     pthread_cond_timedwait( ..., &ts);

The advantage in using an absolute time instead of a delta time is if the function prematurely returns (perhaps because of a caught signal), the function can be called again, without having to change the contents of the timespec structure. The disadvantage, however, is having to call gettimeofday before the function can be called the first time.

The POSIX specification defines a clock_gettime function that returns the current time as a timespec structure.

    [ Team LiB ] Previous Section Next Section