Guide to DECthreads

Previous | Contents

This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.

Most compilers on VAX systems do not probe the stack. Portable code that supports threads should use as little stack memory as practical.

Most compilers on Alpha systems generate code in the procedure prologue that probes the stack, ensuring there is enough space for the procedure to run.

3.4.1 Sizing the Stack

To determine the optimal size of a thread's stack, multiply the largest number of nested subroutine calls by the size of the call frames and local variables. Add to that number an extra amount of memory to accommodate interrupts. Determining this figure is difficult because stack frames vary in size and because it might not be possible to estimate the depth of library routine call frames.

You can also run your program using a profiling tool that measures actual stack use. This is commonly done by "poisoning" the stack before it is used by writing a distinctive pattern, and then checking for that pattern after the thread completes. Remember: Use of profiling monitoring tools typically increases the amount of stack memory that your program uses.

3.4.2 Using a Stack Guard Area

By default, at the overflow end of each thread's stack DECthreads allocates a guard area, or a region of no-access memory. A guard area can help a multithreaded program detect overflow of a thread's stack. When the thread attempts to access a memory location within this region, a memory addressing violation occurs.

For a thread that allocates large data structures on the stack, create that thread using a thread attributes object in which a large guardsize attribute value has been set. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.

The low-level memory regions that form a stack guard region are also known as guard pages.

3.4.3 Handling Stack Overflow

A process can produce a memory access violation (or bus error or segmentation fault) when it overflows its stack. As a first step in debugging this behavior, it is often necessary to run the program under the control of your system's debugger to determine which routine's stack has overflowed. However, if the debugger shares resources with the target process (as under OpenVMS), perhaps allocating its own data objects on the target process's stack, the debugger might not operate properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger.

To set the stacksize attribute in a thread attributes object, use the pthread_attr_setstacksize() routine. (See Section 2.3.2.4 for more information.)

If a thread receives a memory access exception during a routine call or when accessing a local variable, increase the size of the thread's stack. Of course, not all memory access violations indicate a stack overflow.

For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it hangs.

3.5 Scheduling Issues

There are programming issues that are unique to the scheduling attributes of threads.

3.5.1 Real-Time Scheduling

Use care when writing code that uses real-time scheduling to control the priority of threads:

Review Section 3.1. Scheduling of threads is not the same as synchronizing of threads.
Giving threads higher priority does not necessarily make your code run faster. Real-time priority adds overhead that can slow a program down, especially when interfacing with other libraries. For example, a higher-priority thread that polls for keyboard input may block work being done by other threads.
Watch for pitfalls like priority inversion. It is best to avoid relying on real-time scheduling, except where necessary to meet design goals. On the other hand, most systems that interact with external devices have some real-time aspect.

3.5.2 Priority Inversion

Priority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes.

You can address the phenomenon of priority inversion as follows:

To avoid priority inversion, associate a priority (at least as high as the highest-priority thread that will use it) with each resource and force any thread using that object to first increase its priority to that associated with the object.
To minimize the chance that an occurrence of priority inversion will cause a complete blockage of higher-priority threads, use the (default) throughput scheduling policy. The throughput scheduling policy allows even low-priority threads to execute eventually and to release the resources they hold. The FIFO and RR scheduling policies do not provide for resumption of the low-priority thread if the middle-priority thread executes indefinitely.

3.5.3 Dependencies Among Scheduling Attributes and Contention Scope

On DIGITAL UNIX systems, to use high (real-time) thread scheduling priorities, a thread with system contention scope must run in a process with sufficient real-time scheduling privileges. On the other hand, a thread with process contention scope has access to all levels of priority without requiring special real-time scheduling privileges.

Due to this, for a process that is not privileged, when a thread with a high priority and with process contention scope attempts to create another thread with system contention scope, the creation will fail if the created thread's attributes object specifies to inherit the creating thread's scheduling policy and priority.

3.6 Using Synchronization Objects

The following sections discuss how to determine when to use a mutex versus a condition variable, and how to use mutexes to prevent two erroneous behaviors that are common in multithreaded programs: race conditions and deadlocks.

Also discussed is why you should signal a condition variable with the associated mutex locked.

3.6.1 Distinguishing Proper Usage of Mutexes and Condition Variables

Use a mutex for tasks with fine granularity. Examples of a "fine-grained" task are those that serialize access to shared memory or make simple modifications to shared memory. This typically corresponds to a critical section of a few program statements or less.

Mutex waits are not interruptible. Threads waiting to acquire a mutex cannot be alerted or canceled.

Do not use a condition variable to protect access to data. Rather, use it to wait for data to assume a desired state. Always use a condition variable with a mutex that protects the shared data. Condition variable waits are interruptible.

See Section 2.4.1 and Section 2.4.2 for more information about mutexes and condition variables.

3.6.2 Avoiding Race Conditions

A race condition occurs when two or more threads perform an operation, and the result of the operation depends on unpredictable timing factors; specifically, when each thread executes and waits and when each thread completes the operation.

For example, if two threads execute routines and each increments the same variable (such as x = x + 1), the variable could be incremented twice and one of the threads could use the wrong value. For example:

Thread A increments variable x.
Thread A is interrupted (or blocked, or scheduled off), and thread B is started.
Thread B starts and increments variable x.
Thread B is interrupted (or blocked, or scheduled off), and thread A is started.
Thread A checks the value of x and performs an action based on that value.
The value of x differs from when thread A incremented it, and the program's behavior is incorrect.

Race conditions result from lack of (or ineffectual) synchronization. To avoid race conditions, ensure that any variable modified by more than one thread has only one mutex associated with it, and ensure that all accesses to the variable are made after acquiring that mutex.

See Section 3.6.4 for another example of a race condition.

3.6.3 Avoiding Deadlocks

A deadlock occurs when a thread holding a resource is waiting for a resource held by another thread, while that thread is also waiting for the first thread's resource. Any number of threads can be involved in a deadlock if there is at least one resource per thread. A thread can deadlock on itself. Other threads can also become blocked waiting for resources involved in the deadlock.

Following are two techniques you can use to avoid deadlocks:

Use sequence numbers with fast mutexes. Associate a sequence number with each mutex and acquire mutexes in sequence. Never attempt to acquire a mutex with a sequence number lower than that of a mutex the thread already holds.
If a thread needs to acquire a mutex with a lower sequence number, it must first release all mutexes with a higher sequence number (after ensuring that the protected data is in a consistent state).
Use a recursive mutex. This technique is useful when a thread needs to acquire the same mutex more than once before releasing it. This technique can help prevent a thread from deadlocking on itself.

3.6.4 Signaling a Condition Variable

When you are signaling a condition variable and that signal might cause the condition variable to be deleted, signal or broadcast the condition variable with the mutex locked.

The following C code fragment is executed by a releasing thread (Thread A) to wake a blocked thread:

 
   pthread_mutex_lock (m); 
 
   ... /* Change shared variables to allow another thread to proceed */ 
 
   predicate = TRUE; 
   pthread_mutex_unlock (m); 
                               (1)
   pthread_cond_signal (cv);   (2)

The following C code fragment is executed by a potentially blocking thread (thread B):

 
   pthread_mutex_lock (m); 
   while (!predicate ) 
         pthread_cond_wait (cv, m); 
 
   pthread_mutex_unlock (m); 
   pthread_cond_destroy (cv);

If thread B is allowed to run while thread A is at this point, it finds the predicate true and continues without waiting on the condition variable. Thread B might then delete the condition variable with the pthread_cond_destroy() routine before thread A resumes execution.
When thread A executes this statement, the condition variable does not exist and the program fails.

These code fragments also demonstrate a race condition; that is, the routine, as coded, depends on a sequence of events among multiple threads, but does not enforce the desired sequence. Signaling the condition variable while still holding the associated mutex eliminates the race condition. Doing so prevents thread B from deleting the condition variable until after thread A has signaled it.

This problem can occur when the releasing thread is a worker thread and the waiting thread is a boss thread, and the last worker thread tells the boss thread to delete the variables that are being shared by boss and worker.

Code the signaling of a condition variable with the mutex locked as follows:

   pthread_mutex_lock (m); 
   ... 
 
   /* Change shared variables to allow some other thread to proceed */ 
   pthread_cond_signal (cv);   
   pthread_mutex_unlock (m);

3.7 One-Time Initialization

Your program might have one or more routines that must be executed before any thread executes code in your facility, but that must be executed only once, regardless of the sequence in which threads start executing. For example, your program can initialize mutexes, condition variables, or thread-specific data keys---each of which must be created only once---in a one-time initialization routine.

Use the pthread_once() routine to ensure that your program's initialization routine executes only once---that is, by the first thread that attempts to initialize your program's resources. Multiple threads can call the pthread_once() routine, and DECthreads ensures that the specified routine is called only once.

On the other hand, rather than use the pthread_once() routine, your program can statically initialize a mutex and a flag, then simply lock the mutex and test the flag. In many cases, this technique might be more straightforward to implement.

Finally, you can use implicit (and nonportable) initialization mechanisms, such as OpenVMS LIB$INITIALIZE, DIGITAL UNIX dynamic loader __init_ code, or Win32 DLL initialization handlers for Windows NT and Windows 95.

3.8 Managing Dependencies Upon Other Libraries

Because multithreaded programming has become common only recently, many existing code libraries are incompatible with multithreaded routines. For example, many of the traditional C run-time library routines maintain state across multiple calls using static storage. This storage can become corrupted if routines are called from multiple threads at the same time. Even if the calls from multiple threads are serialized, code that depends upon a sequence of return values might not work.

For example, the UNIX getpwent(2) routine returns the entries in the password file in sequence. If multiple threads call getpwent(2) repeatedly, even if the calls are serialized, no thread can obtain all entries in the password file.

Library routines might be compatible with multithreaded programming to different extents. The important distinctions are thread reentrancy and thread safety.

3.8.1 Thread Reentrancy

A routine is thread reentrant if it performs correctly despite being called simultaneously or sequentially by different threads. For example, the standard C run-time library routine strtok() can be made thread reentrant most efficiently by adding an argument that specifies a context for the sequence of tokens. Thus, multiple threads can simultaneously parse different strings without interfering with each other.

The ideal thread-reentrant routine has no dependency on static data. Because static data must be synchronized using mutexes and condition variables, there is always a performance penalty due to the time required to lock and unlock the mutex and also in the loss of potential parallelism throughout the program. A routine that does not use any data that is shared between threads can proceed without locking.

If you are developing new interfaces, make sure that any persistent context information (like the last-token-returned pointer in strtok()) is passed explicitly so that multiple threads can process independent streams of information independently. Return information to the caller through routine values, output parameters (where the caller passes the address and length of a buffer), or by allocating dynamic memory and requiring the caller to free that memory when finished. Try to avoid using errno for returning error or diagnostic information; use routine return values instead.

3.8.2 Thread Safety

A routine is thread safe if it can be called simultaneously from multiple threads without risk of corruption. Generally this means that it does some simple level of locking (perhaps using the DECthreads global lock) to prevent simultaneously active calls in different threads. See Section 3.8.3.3 for information about the DECthreads global lock.

Thread-safe routines might be inefficient. For example, a UNIX stdio package that is thread safe might still block all threads in the process while waiting to read or write data to a file.

Routines such as localtime(3) or strtok(), which traditionally rely on static storage, can be made thread safe by using thread-specific data instead of static variables. This prevents corruption and avoids the overhead of synchronization. However, using thread-specific data is not without its own cost, and it is not always the best solution. Using an alternate, reentrant version of the routine, such as the POSIX strtok_r() interface, is preferable.

3.8.3 Lacking Thread Safety

When your program must call a routine that is not thread safe, your program must ensure serialization and exclusivity of the unsafe routine across all threads in the program.

If a routine is not specifically documented as thread reentrant or thread safe, you are most safe to assume that it is not safe to use as-is with your multithreaded program. Never assume that a routine is fully thread reentrant unless it is expressly documented as such; a routine can use static data in ways that are not obvious from its interface. A routine carefully written to be thread reentrant but that calls some other routine that is not thread safe without proper protection, is itself not thread safe.

3.8.3.1 Using Mutex Around Call to Unsafe Code

Holding a mutex while calling any unsafe code accomplishes this. All threads and libraries using the routine should use the same mutex. Note that even if two libraries carefully lock a mutex around every call to a given routine, if each library uses a different mutex, the routine is not protected against multiple simultaneous calls from different libraries.

Note that your program might be required to protect a series of calls, rather than just a single call, to routines that are not thread safe.

3.8.3.2 Using or Copying Static Data Before Releasing the Mutex

In many cases your program must protect more than just the call itself to a routine that is not thread safe. Your program must use or copy any static return values before releasing the mutex that is being held.

3.8.3.3 Using the DECthreads Global Lock

To ensure serialization and exclusivity of the unsafe code, DECthreads provides one global lock that can be used by all threads in a program when calling routines or code that is not thread safe. The global lock allows a thread to acquire the lock recursively, so that you do not need to be concerned if you call a routine that also may acquire the global lock.

Acquire the global lock by calling pthread_lock_global_np(); release the global lock by calling pthread_unlock_global_np().

Because there is only one global lock, you do not need to fully analyze all of the dependencies in unsafe code that your program calls. For example, with private locks to protect unsafe code, one lock might protect calls to the stdio routine, while another protects calls to math routines. However, if stdio next calls a math routine without acquiring the math routine lock, the call is just as unsafe as if no locks were used.

Use the global lock whenever calling unsafe routines. If you are unsure, assume that a routine is not thread safe unless it is expressly documented otherwise. All DECthreads routines are thread safe.

3.8.4 Use of Multiple Threads Libraries Not Supported

DECthreads performs user-mode execution context-switching within a process (OpenVMS VAX) or virtual processor (DIGITAL UNIX and OpenVMS Alpha) by exchanging register sets, including the program counter and stack pointer. If any other code within the process also performs this sort of context switch, neither DECthreads nor that other code can ever know which context is active at any time. This can result in, at best, unpredictable behavior---and, at worst, severe errors.

For example, under OpenVMS VAX, the VAX Ada run-time library provides its own tasking package that does not use DECthreads scheduling. Therefore, VAX Ada tasking cannot be used within a process that also uses DECthreads. (This restriction does not exist for DEC Ada for DIGITAL UNIX or for OpenVMS Alpha, because it uses DECthreads.)

This potential confusion might not exist for platforms that offer kernel thread-only packages. For example, DECthreads for Windows NT coexists smoothly with that platform's native Win32 threads.

3.9 Detecting DECthreads Error Conditions

DECthreads can detect some of the following types of errors:

Application programming interface (API) errors can occur when the program specifies an invalid parameter or attempts an inappropriate operation on some DECthreads object.
Internal errors can occur when DECthreads determines that internal information has become corrupted to the point where it cannot continue operation.

API errors are reported in different ways by the various DECthreads interfaces:

The DECthreads pthread interface returns an integer value indicating the type of error.
The DECthreads cma interface raises exceptions to indicate error conditions.

DECthreads internal errors result in a bugcheck. DECthreads writes a message that summarizes the problem to the process's current error device, and (on OpenVMS and Windows NT platforms) writes a file that contains more detailed information.

By default, the file is named pthread_dump.log and is created in the process's current (or default) directory. To cause DECthreads to write the bugcheck information into a different file, define PTHREAD_CONFIG and set its dump= major keyword. (See Section D.1 for more information about using PTHREAD_CONFIG.)

If DECthreads cannot create the specified file when it performs the bugcheck, it will try to create the default file. If it cannot create the default file, it will write the detailed information to the error device.

Note
On DIGITAL UNIX systems:
DECthreads no longer creates a dump file, because a core file is sufficient for analysis of the process using the Ladebug debugger.

3.9.1 Contents of a DECthreads Bugcheck Dump File

The header message written to the error device starts with a line reporting that DECthreads has detected an internal problem and that it is terminating execution. It also includes the version of the DECthreads library. The message resembles this:

 
   %DECthreads bugcheck (version V3.13-180), terminating execution.

The next line states the reason for the failure. On DIGITAL UNIX, this is followed by process termination with SIGABRT (SIGIOT), which causes writing of a core dump file. On other platforms, a final line on the error device specifies the location of the file that contains detailed state information produced by DECthreads, as in the following example:

 
   % Dumping to pthread_dump.log

The detailed information file contains information that is usually necessary to track down the problem. If you encounter a DECthreads bugcheck, please contact your DIGITAL support representative and include this information file (or the DIGITAL UNIX core file) along with sample code and output. Always include the full name and version of the operating system, and any patches that have been installed. If complete version information is lacking, useful core file analysis might not be possible.

3.9.2 Interpreting a DECthreads Bugcheck

The fact that DECthreads terminated the process with a bugcheck can mean that some subtle problem in DECthreads has been uncovered. However, DECthreads does not check for all possible API errors, and there are a number of ways in which incorrect code in your program can lead to a DECthreads bugcheck.

A common example is the use of any mutex operation or of certain condition variable operations from within an interrupt routine (that is, a DIGITAL UNIX signal handler or OpenVMS AST routine). This type of programming error most commonly results in a bugcheck that reports an "krnSpinLockPrm: deadlock detected" message or a "Can't find null thread" message. To prevent this type of error, avoid using DECthreads routines other than pthread_cond_signal_int_np() from an interrupt routine (or the equivalent routines in other APIs).

In addition, DECthreads maintains a variety of state information in memory which can be overwritten by your own code. Therefore, it is possible for an application to accidentally modify DECthreads state by writing through invalid pointers, which can result in a bugcheck or other undesirable behavior.

Chapter 4
Writing Thread-Safe Libraries

A thread-safe library typically consists of routines that do not themselves create or use threads. However, the routines in a thread-safe library must be coded so that they are safe to be called from applications that use threads. DECthreads provides the thread-independent services (or tis) interface to support writing efficient, thread-safe code that does not itself use threads.

When called by a single-threaded program, the tis interface provides thread-independent synchronization services that are very efficient. For instance, tis routines avoid the use of interlocked instructions and memory barriers.

When called by a multithreaded program, the tis routines also provide full support for DECthreads synchronization, such as synchronization objects and thread joining.

The guidelines for using the DECthreads pthread interface routines also apply to using the corresponding tis interface routine in a multithreaded environment.

4.1 Features of the tis Interface

Among the key features of the DECthreads tis interface are:

Distinct tis library, with some unique routines and some routines that correspond to those in the DECthreads pthread interface
Common synchronization objects (such as mutexes and condition variables) with the pthread interface
Unique tis synchronization objects (such as the read-write lock)
Support for thread-specific data objects

Previous | Next | Contents