CATEGORIES:

Biology Chemistry Construction Culture Ecology Economy Electronics Finance Geography History Informatics Law Mathematics Mechanics Medicine Other Pedagogy Philosophy Physics Policy Psychology Sociology Sport Tourism

C#’s Support for Volatile Fields

Making sure that programmers call the Volatile.Read and Volatile.Write methods correctly is a lot to ask. It’s hard for programmers to keep all of this in their minds and to start imagining what other threads might be doing to shared data in the background. To simplify this, the C# compiler has the volatile keyword, which can be applied to static or instance fields of any of these types: Boolean, (S)Byte, (U)Int16, (U)Int32, (U)IntPtr, Single, or Char. You can also apply the volatile keyword to reference types and any enum field as long as the enumerated type has an underlying type of (S)Byte, (U)Int16, or (U)Int32. The JIT compiler ensures that all accesses to a volatile field are performed as volatile reads and writes, so that it is not necessary to explicitly call Volatile's static Read or Write methods. Furthermore, the volatile keyword tells the C# and JIT compilers not to cache the field in a CPU register, ensuring that all reads to and from the field actu- ally cause the value to be read from memory.

Using the volatile keyword, we can rewrite the ThreadsSharingData class as follows.

internal sealed class ThreadsSharingData { private volatile Int32 m_flag = 0; private Int32 m_value = 0;

// This method is executed by one thread public void Thread1() {

// Note: 5 must be written to m_value before 1 is written to m_flag m_value = 5;

m_flag = 1;

}

// This method is executed by another thread public void Thread2() {

// Note: m_value must be read after m_flag is read if (m_flag == 1)

Console.WriteLine(m_value);

}

There are some developers (and I am one of them) who do not like C#’s volatile keyword, and they think that the language should not provide it.6 Our thinking is that most algorithms require few volatile read or write accesses to a field and that most other accesses to the field can occur normally, improving performance; seldom is it required that all accesses to a field be volatile. For example, it is difficult to interpret how to apply volatile read operations to algorithms like this one.

m_amount = m_amount + m_amount; // Assume m_amount is a volatile field defined in a class

Normally, an integer number can be doubled simply by shifting all bits left by 1 bit, and many compilers can examine the preceding code and perform this optimization. However, if m_amount is a volatile field, then this optimization is not allowed. The compiler must produce code to read

m_amount into a register and then read it again into another register, add the two registers together,

6 By the way, it is good to see that Microsoft Visual Basic does not offer a volatile semantic built into its language.

and then write the result back out to the m_amount field. The unoptimized code is certainly bigger

and slower; it would be unfortunate if it were contained inside a loop.

Furthermore, C# does not support passing a volatile field by reference to a method. For ex- ample, if m_amount is defined as a volatile Int32, attempting to call Int32’s TryParse method causes the compiler to generate a warning as shown here.

Boolean success = Int32.TryParse("123", out m_amount);

// The preceding line causes the C# compiler to generate a warning:

// CS0420: a reference to a volatile field will not be treated as volatile

Finally, volatile fields are not Common Language Specification (CLS) compliant because many lan- guages (including Visual Basic) do not support them.

Interlocked Constructs

Volatile’s Read method performs an atomic read operation, and its Write method performs an atomic write operation. That is, each method performs either an atomic read operation or an atomic write operation. In this section, we look at the static System.Threading.Interlocked class’s meth- ods. Each of the methods in the Interlocked class performs an atomic read and write operation. In addition, all the Interlocked methods are full memory fences. That is, any variable writes before the call to an Interlocked method execute before the Interlocked method, and any variable reads after the call execute after the call.

The static methods that operate on Int32 variables are by far the most commonly used methods.

I show them here.

public static class Interlocked {

// return (++location)

public static Int32 Increment(ref Int32 location);

// return (location)

public static Int32 Decrement(ref Int32 location);

// return (location += value)

// Note: value can be a negative number allowing subtraction public static Int32 Add(ref Int32 location, Int32 value);

// Int32 old = location; location = value; return old;

public static Int32 Exchange(ref Int32 location, Int32 value);

// Int32 old = location;

// if (location == comparand) location = value;

// return old;

public static Int32 CompareExchange(ref Int32 location, Int32 value, Int32 comparand);

...

}

There are also overloads of the preceding methods that operate on Int64 values. Furthermore, the Interlocked class offers Exchange and CompareExchange methods that take Object, IntPtr, Single, and Double, and there is also a generic version in which the generic type is constrained to class (any reference type).

Personally, I love the Interlocked methods, because they are relatively fast and you can do so much with them. Let me show you some code that uses the Interlocked methods to asynchronously query several web servers and concurrently process the returned data. This code is pretty short, never blocks any threads, and uses thread pool threads to scale automatically, consuming up to the number of CPUs available if its workload could benefit from it. In addition, the code, as is, supports accessing up to 2,147,483,647 (Int32.MaxValue) web servers. In other words, this code is a great model to fol- low for your own scenarios.

internal sealed class MultiWebRequests {

// This helper class coordinates all the asynchronous operations private AsyncCoordinator m_ac = new AsyncCoordinator();

// Set of web servers we want to query & their responses (Exception or Int32)

// NOTE: Even though multiple could access this dictionary simultaneously,

// there is no need to synchronize access to it because the keys are

// readonly after construction

private Dictionary<String, Object> m_servers = new Dictionary<String, Object> {

{ "http://Wintellect.com/", null },

{ "http://Microsoft.com/", null },

{ "http://1.1.1.1/", null }

};

public MultiWebRequests(Int32 timeout = Timeout.Infinite) {

// Asynchronously initiate all the requests all at once var httpClient = new HttpClient();

foreach (var server in m_servers.Keys) { m_ac.AboutToBegin(1); httpClient.GetByteArrayAsync(server)

.ContinueWith(task => ComputeResult(server, task));

}

// Tell AsyncCoordinator that all operations have been initiated and to call

// AllDone when all operations complete, Cancel is called, or the timeout occurs m_ac.AllBegun(AllDone, timeout);

}

private void ComputeResult(String server, Task<Byte[]> task) { Object result;

if (task.Exception != null) {

result = task.Exception.InnerException;

} else {

// Process I/O completion here on thread pool thread(s)

// Put your own computeintensive algorithm here...

result = task.Result.Length; // This example just returns the length

}

// Save result (exception/sum) and indicate that 1 operation completed m_servers[server] = result;

m_ac.JustEnded();

}

// Calling this method indicates that the results don't matter anymore public void Cancel() { m_ac.Cancel(); }

// This method is called after all web servers respond,

// Cancel is called, or the timeout occurs private void AllDone(CoordinationStatus status) {

switch (status) {

case CoordinationStatus.Cancel: Console.WriteLine("Operation canceled."); break;

case CoordinationStatus.Timeout: Console.WriteLine("Operation timedout."); break;

case CoordinationStatus.AllDone: Console.WriteLine("Operation completed; results below:"); foreach (var server in m_servers) {

Console.Write("{0} ", server.Key); Object result = server.Value;

if (result is Exception) {

Console.WriteLine("failed due to {0}.", result.GetType().Name);

} else {

Console.WriteLine("returned {0:N0} bytes.", result);

}

break;

}

OK, the preceding code doesn’t actually use any Interlocked methods directly, because I en- capsulated all the coordination code in a reusable class called AsyncCoordinator, which I’ll explain shortly. Let me first explain what this class is doing. When the MultiWebRequest class is construct- ed, it initializes an AsyncCoordinator and a dictionary containing the set of server URIs (and their future result). It then issues all the web requests asynchronously one right after the other. It does this by first calling AsyncCoordinator’s AboutToBegin method, passing it the number of requests about to be issued.7 Then it initiates the request by calling HttpClient’s GetByteArrayAsync. This returns a Task and I then call ContinueWith on this Task so that when the server replies with the bytes, they can be processed by my ComputeResult method concurrently via many thread pool threads. After all the web servers’ requests have been made, the AsyncCoordinator’s AllBegun method is called, passing it the name of the method (AllDone) that should execute when all the operations complete and a timeout value. As each web server responds, various thread pool threads

7 The code would still work correctly if it was rewritten calling m_ac.AboutToBegin(m_requests.Count) just once before the for loop instead of calling AboutToBegin inside the loop.

will call the MultiWebRequests’s ComputeResult method. This method processes the bytes re- turned from the server (or any error that may have occurred) and saves the result in the dictionary collection. After storing each result, AsyncCoordinator’s JustEnded method is called to let the AsyncCoordinator object know that an operation completed.

If all the operations have completed, then the AsyncCoordinator will invoke the AllDone method to process the results from all the web servers. The code executing the AllDone method will be the thread pool thread that just happened to get the last web server response. If timeout or cancellation occurs, then AllDone will be invoked via whatever thread pool thread notifies the AsyncCoordinator of timeout or using whatever thread happened to call the Cancel method. There is also a chance that the thread issuing the web server requests could invoke AllDone itself if the last request completes before AllBegun is called.

Note that there is a race because it is possible that all web server requests complete, AllBegun

is called, timeout occurs, and Cancel is called all at the exact same time. If this happens, then the AsyncCoordinator will select a winner and three losers, ensuring that the AllDone method is never called more than once. The winner is identified by the status argument passed into AllDone, which can be one of the symbols defined by the CoordinationStatus type.

internal enum CoordinationStatus { AllDone, Timeout, Cancel };

Now that you get a sense of what happens, let’s take a look at how it works. The Async Coordinator class encapsulates all the thread coordination logic in it. It uses Interlocked methods for everything to ensure that the code runs extremely fast and that no threads ever block. Here is the code for this class.

internal sealed class AsyncCoordinator {

private Int32 m_opCount = 1; // Decremented when AllBegun calls JustEnded private Int32 m_statusReported = 0; // 0=false, 1=true

private Action<CoordinationStatus> m_callback; private Timer m_timer;

// This method MUST be called BEFORE initiating an operation public void AboutToBegin(Int32 opsToAdd = 1) {

Interlocked.Add(ref m_opCount, opsToAdd);

}

// This method MUST be called AFTER an operation’s result has been processed public void JustEnded() {

if (Interlocked.Decrement(ref m_opCount) == 0) ReportStatus(CoordinationStatus.AllDone);

}

// This method MUST be called AFTER initiating ALL operations public void AllBegun(Action<CoordinationStatus> callback,

Int32 timeout = Timeout.Infinite) {

m_callback = callback;

if (timeout != Timeout.Infinite)

m_timer = new Timer(TimeExpired, null, timeout, Timeout.Infinite); JustEnded();

}

private void TimeExpired(Object o) { ReportStatus(CoordinationStatus.Timeout); } public void Cancel() { ReportStatus(CoordinationStatus.Cancel); }

private void ReportStatus(CoordinationStatus status) {

// If status has never been reported, report it; else ignore it if (Interlocked.Exchange(ref m_statusReported, 1) == 0)

m_callback(status);

}

The most important field in this class is the m_opCount field. This field keeps track of the number of asynchronous operations that are still outstanding. Just before each asynchronous operation is started, AboutToBegin is called. This method calls Interlocked.Add to add the number passed to it to the m_opCount field in an atomic way. Adding to m_opCount must be performed atomically because web servers could be processing responses on thread pool threads as more operations are

being started. As web server responses are processed, JustEnded is called. This method calls Inter locked.Decrement to atomically subtract 1 from m_opCount. Whichever thread happens to set m_opCount to 0 calls ReportStatus.

The ReportStatus method arbitrates the race that can occur among all the operations complet- ing, the timeout occurring, and Cancel being called. ReportStatus must make sure that only one of these conditions is considered the winner so that the m_callback method is invoked only once. Arbitrating the winner is done via calling Interlocked.Exchange, passing it a reference to the m_statusReported field. This field is really treated as a Boolean variable; however, it can’t actually be a Boolean variable because there are no Interlocked methods that accept a Boolean variable. So I use an Int32 variable instead where 0 means false and 1 means true.

Inside ReportStatus, the Interlocked.Exchange call will change m_statusReported to 1. But only the first thread to do this will see Interlocked.Exchange return a 0, and only this thread will invoke the callback method. Any other threads that call Interlocked.Exchange will get a return value of 1, effectively notifying these threads that the callback method has already been invoked and therefore it should not be invoked again.

Implementing a Simple Spin Lock

The Interlocked methods are great, but they mostly operate on Int32 values. What if you need to manipulate a bunch of fields in a class object atomically? In this case, we need a way to stop all threads but one from entering the region of code that manipulates the fields. Using Interlocked methods, we can build a thread synchronization lock.

internal struct SimpleSpinLock {

private Int32 m_ResourceInUse; // 0=false (default), 1=true

public void Enter() { while (true) {

// Always set resource to inuse

// When this thread changes it from not inuse, return

if (Interlocked.Exchange(ref m_ResourceInUse, 1) == 0) return;

// Black magic goes here...

}

public void Leave() {

// Set resource to not inuse Volatile.Write(ref m_ResourceInUse, 0);

}

And here is a class that shows how to use the SimpleSpinLock.

public sealed class SomeResource {

private SimpleSpinLock m_sl = new SimpleSpinLock();

public void AccessResource() { m_sl.Enter();

// Only one thread at a time can get in here to access the resource... m_sl.Leave();

}

The SimpleSpinLock implementation is very simple. If two threads call Enter at the same time, Interlocked.Exchange ensures that one thread changes m_resourceInUse from 0 to 1 and sees that m_resourceInUse was 0. This thread then returns from Enter so that it can continue executing the code in the AccessResource method. The other thread will change m_resourceInUse from a 1 to a 1. This thread will see that it did not change m_resourceInUse from a 0, and this thread will now start spinning continuously, calling Exchange until the first thread calls Leave.

When the first thread is done manipulating the fields of the SomeResource object, it calls Leave, which internally calls Volatile.Write and changes m_resourceInUse back to a 0. This causes

the spinning thread to then change m_resourceInUse from a 0 to a 1, and this thread now gets to return from Enter so that it can access SomeResource object’s fields.

There you have it. This is a simple implementation of a thread synchronization lock. The big po- tential problem with this lock is that it causes threads to spin when there is contention for the lock.

This spinning wastes precious CPU time, preventing the CPU from doing other, more useful work. As a result, spin locks should only ever be used to guard regions of code that execute very quickly.

Spin locks should not typically be used on single-CPU machines, because the thread that holds the lock can’t quickly release it if the thread that wants the lock is spinning. The situation becomes much worse if the thread holding the lock is at a lower priority than the thread wanting to get the lock, because now the thread holding the lock may not get a chance to run at all, resulting in a livelock situ- ation. Windows sometimes boosts a thread’s priority dynamically for short periods of time. Therefore, boosting should be disabled for threads that are using spin locks; see the PriorityBoostEnabled properties of System.Diagnostics.Process and System.Diagnostics.ProcessThread. There are issues related to using spin locks on hyperthreaded machines, too. In an attempt to circumvent these kinds of problems, many spin locks have some additional logic in them; I refer to the additional logic as Black Magic. I’d rather not go into the details of Black Magic because it changes over time as more people study locks and their performance. However, I will say this: The FCL ships with a structure, System.Threading.SpinWait, which encapsulates the state-of-the-art thinking around this Black Magic.

Putting a Delay in the Thread’s Processing

The Black Magic is all about having a thread that wants a resource to pause its execution tem- porarily so that the thread that currently has the resource can execute its code and relinquish the resource. To do this, the SpinWait struct internally calls Thread’s static Sleep, Yield, and SpinWait methods. I’ll briefly describe these methods in this sidebar.

A thread can tell the system that it does not want to be schedulable for a certain amount of time. This is accomplished by calling Thread’s static Sleep method.

public static void Sleep(Int32 millisecondsTimeout); public static void Sleep(TimeSpan timeout);

This method causes the thread to suspend itself until the specified amount of time has elapsed. Calling Sleep allows the thread to voluntarily give up the remainder of its time-slice. The system makes the thread not schedulable for approximately the amount of time specified. That’s right—if you tell the system you want a thread to sleep for 100 milliseconds, the thread will sleep approximately that long, but possibly several seconds or even minutes more. Remem- ber that Windows is not a real-time operating system. Your thread will probably wake up at the right time, but whether it does depends on what else is going on in the system.

You can call Sleep and pass the value in System.Threading.Timeout.Infinite (defined as 1) for the millisecondsTimeout parameter. This tells the system to never schedule the thread, and it is not a useful thing to do. It is much better to have the thread exit and then re- cover its stack and kernel object. You can pass 0 to Sleep. This tells the system that the calling thread relinquishes the remainder of its current time-slice, and it forces the system to schedule another thread. However, the system can reschedule the thread that just called Sleep. This will happen if there are no more schedulable threads at the same priority or higher.

A thread can ask Windows to schedule another thread on the current CPU by calling

Thread’s Yield method.

public static Boolean Yield();

If Windows has another thread ready to run on the current processor, then Yield returns true and the thread that called Yield ended its time-slice early, the selected thread gets to run for one time-slice, and then the thread that called Yield is scheduled again and starts run- ning with a fresh new time-slice. If Windows does not have another thread to run on the current processor, then Yield returns false and the thread continues its time-slice.

The Yield method exists in order to give a thread of equal or lower priority that is starving for CPU time a chance to run. A thread calls this method if it wants a resource that is currently owned by another thread. The hope is that Windows will schedule the thread that currently owns the resource and that this thread will relinquish the resource. Then, when the thread that called Yield runs again, this thread can have the resource.

Yield is a cross between calling Thread.Sleep(0) and Thread.Sleep(1). Thread. Sleep(0) will not let a lower-priority thread run, whereas Thread.Sleep(1) will always force a context switch and Windows will force the thread to sleep longer than one millisec- ond due to the resolution of the internal system timer.

Hyperthreaded CPUs really let only one thread run at a time. So, when executing spin loops on these CPUs, you need to force the current thread to pause so that the CPU switches to the other thread, allowing it to run. A thread can force itself to pause, allowing a hyperthreaded CPU to switch to its other thread by calling Thread’s SpinWait method.

public static void SpinWait(Int32 iterations);

Calling this method actually executes a special CPU instruction; it does not tell Windows to do anything (because Windows already thinks that it has scheduled two threads on the CPU). On a non-hyperthreaded CPU, this special CPU instruction is simply ignored.

The FCL also includes a System.Threading.SpinLock structure that is similar to my Simple SpinLock class shown earlier, except that it uses the SpinWait structure to improve performance. The SpinLock structure also offers timeout support. By the way, it is interesting to note that my SimpleSpinLock and the FCL’s SpinLock are both value types. This means that they are lightweight, memory-friendly objects. A SpinLock is a good choice if you need to associate a lock with each item in a collection, for example. However, you must make sure that you do not pass SpinLock instances

around, because they are copied and you will lose any and all synchronization. And although you can define instance SpinLock fields, do not mark the field as readonly, because its internal state must change as the lock is manipulated.

The Interlocked Anything Pattern

Many people look at the Interlocked methods and wonder why Microsoft doesn't create a richer set of interlocked methods that can be used in a wider range of scenarios. For example, it would be nice if the Interlocked class offered Multiply, Divide, Minimum, Maximum, And, Or, Xor, and a bunch of other methods. Although the Interlocked class doesn’t offer these methods, there is a well-known pattern that allows you to perform any operation on an Int32 in an atomic way by using Interlocked.CompareExchange. In fact, because Interlocked.CompareExchange has additional overloads that operate on Int64, Single, Double, Object, and a generic reference type, this pat- tern will actually work for all these types, too.

This pattern is similar to optimistic concurrency patterns used for modifying database records.

Here is an example of the pattern that is being used to create an atomic Maximum method.

public static Int32 Maximum(ref Int32 target, Int32 value) { Int32 currentVal = target, startVal, desiredVal;

// Don't access target in the loop except in an attempt

// to change it because another thread may be touching it do {

// Record this iteration's starting value startVal = currentVal;

// Calculate the desired value in terms of startVal and value desiredVal = Math.Max(startVal, value);

// NOTE: the thread could be preempted here!

// if (target == startVal) target = desiredVal

// Value prior to potential change is returned

currentVal = Interlocked.CompareExchange(ref target, desiredVal, startVal);

// If the starting value changed during this iteration, repeat

} while (startVal != currentVal);

// Return the maximum value when this thread tried to set it return desiredVal;

}

Now let me explain exactly what is going on here. Upon entering the method, currentVal is initialized to the value in target at the moment the method starts executing. Then, inside the loop, startVal is initialized to this same value. Using startVal, you can perform any operation you want. This operation can be extremely complex, consisting of thousands of lines of code. But, ultimately, you must end up with a result that is placed into desiredVal. In my example, I simply determine whether startVal or value contains the larger value.

Now, while this operation is running, another thread could change the value in target. It is unlike- ly that this will happen, but it is possible. If this does happen, then the value in desiredVal is based off an old value in startVal, not the current value in target, and therefore, we should not change the value in target. To ensure that the value in target is changed to desiredVal if no thread has changed target behind our thread’s back, we use Interlocked.CompareExchange. This method checks whether the value in target matches the value in startVal (which identifies the value that we thought was in target before starting to perform the operation). If the value in target didn’t change, then CompareExchange changes it to the new value in desiredVal. If the value in target did change, then CompareExchange does not alter the value in target at all.

CompareExchange returns the value that is in target at the time when CompareExchange is called, which I then place in currentVal. Then, a check is made comparing startVal with the new value in currentVal. If these values are the same, then a thread did not change target behind our thread’s back, target now contains the value in desiredVal, the while loop does not loop around, and the method returns. If startVal is not equal to currentVal, then a thread did change the value in target behind our thread’s back, target did not get changed to our value in desiredVal, and the while loop will loop around and try the operation again, this time using the new value in currentVal that reflects the other thread’s change.

Personally, I have used this pattern in a lot of my own code and, in fact, I made a generic method,

Morph, which encapsulates this pattern.8

delegate Int32 Morpher<TResult, TArgument>(Int32 startValue, TArgument argument, out TResult morphResult);

static TResult Morph<TResult, TArgument>(ref Int32 target, TArgument argument, Morpher<TResult, TArgument> morpher) {

TResult morphResult;

Int32 currentVal = target, startVal, desiredVal; do {

startVal = currentVal;

desiredVal = morpher(startVal, argument, out morphResult);

currentVal = Interlocked.CompareExchange(ref target, desiredVal, startVal);

} while (startVal != currentVal); return morphResult;

}

8 Obviously, the Morph method incurs a performance penalty due to invoking the morpher callback method. For best performance, execute the operation inline, as in the Maximum example.

Date: 2016-03-03; view: 1671

<== previous page	\|	next page ==>
Nbsp; User-Mode Constructs	\|	Nbsp; Kernel-Mode Constructs

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.091 sec.)