Saturday, February 8, 2014

C# Volatile Constructs

Volatile Constructs
Back in the early days of computing, software was written using assembly language. Assembly
language is very tedious, because programmers must explicitly state everything—use this CPU register
for this, branch to that, call indirect through this other thing, and so on. To simplify programming,
higher-level languages were introduced. These higher-level languages introduced common useful
constructs, like if/else, switch/case, various loops, local variables, arguments, virtual method calls,
operator overloads, and much more. Ultimately, these language compilers must convert the high-level
constructs down to the low-level constructs so that the computer can actually do what you want it to
do.
In other words, the C# compiler translates your C# constructs into Intermediate Language (IL),
which is then converted by the just-in-time (JIT) compiler into native CPU instructions, which must then
be processed by the CPU itself. In addition, the C# compiler, the JIT compiler, and even the CPU itself
can optimize your code. For example, the following ridiculous method can ultimately be compiled into
nothing:
private static void OptimizedAway() {
// Constant expression is computed at compile time resulting in zero
Int32 value = (1 * 100) - (50 * 2);
// If value is 0, the loop never executes
for (Int32 x = 0; x < value; x++) {
// There is no need to compile the code in the loop since it can never execute
Console.WriteLine("Jeff");
}
}
In this code, the compiler can see that value will always be 0; therefore, the loop will never execute
and consequently, there is no need to compile the code inside the loop. This method could be
compiled down to nothing. In fact, when JITting a method that calls OptimizedAway, the JITter will try
to inline the OptimizedAway method’s code. Since there is no code, the JITter will even remove the
code that tries to call OptimizedAway. We love this feature of compilers. As developers, we get to
write the code in the way that makes the most sense to us. The code should be easy to write, read, and
maintain. Then compilers translate our intentions into machine-understandable code. We want our
compilers to do the best job possible for us.
When the C# compiler, JIT compiler, and CPU optimize our code, they guarantee us that the
intention of the code is preserved. That is, from a single-threaded perspective, the method does what
we want it to do, although it may not do it exactly the way we described in our source code. However,
the intention might not be preserved from a multithreaded perspective. Here is an example where the
optimizations make the program not work as expected:
internal static class StrangeBehavior {
// As you'll see later, mark this field as volatile to fix the problem
private static Boolean s_stopWorker = false;
public static void Main() {
Console.WriteLine("Main: letting worker run for 5 seconds");
Thread t = new Thread(Worker);
t.Start();
Thread.Sleep(5000);
s_stopWorker = true;
Console.WriteLine("Main: waiting for worker to stop");
t.Join();
}
private static void Worker(Object o) {
Int32 x = 0;
while (!s_stopWorker) x++;
Console.WriteLine("Worker: stopped when x={0}", x);
}
}
In this code, the Main method creates a new thread that executes the Worker method. This Worker
method counts as high as it can before being told to stop. The Main method allows the Worker thread
to run for 5 seconds before telling it to stop by setting the static Boolean field to true. At this
point, the Worker thread should display what it counted up to, and then the thread will terminate. The
Main thread waits for the Worker thread to terminate by calling Join, and then the Main thread
returns, causing the whole process to terminate.
Looks simple enough, right? Well, the program has a potential problem due to all the optimizations
that could happen to it. You see, when the Worker method is compiled, the compiler sees that
s_stopWorker is either true or false, and it also sees that this value never changes inside the
Worker method itself. So the compiler could produce code that checks s_stopWorker first. If
s_stopWorker is true, then Worker: stopped when x=0 will be displayed. If s_stopWorker is
false, then the compiler produces code that enters an infinite loop that increments x forever. You
see, the optimizations cause the loop to run very fast because checking s_stopWorker only occurs
once before the loop; it does not get checked with each iteration of the loop.
If you actually want to see this in action, put this code in a .cs file and compile the code using C#’s
/platform:x86 and /optimize+ switches. Then run the resulting EXE file, and you’ll see that the
program runs forever. Note that you have to compile for x86 ensuring that the x86 JIT compiler is used
at runtime. The x86 JIT compiler is more mature than the x64 JIT compiler, so it performs more
aggressive optimizations. The x64 JIT compiler does not perform this particular optimization, and
therefore the program runs to completion. This highlights another interesting point about all of this.
Whether your program behaves as expected depends on a lot of factors, such as which compiler
version and compiler switches are used, which JIT compiler is used, and which CPU your code is
running on. In addition, to see the program above run forever, you must not run the program under a
debugger because the debugger causes the JIT compiler to produce unoptimized code that is easier to
step through.
Let’s look at another example, which has two threads that are both accessing two fields:
internal sealed class ThreadsSharingData {
private Int32 m_flag = 0;
private Int32 m_value = 0;
// This method is executed by one thread
public void Thread1() {
// Note: These could execute in reverse order
m_value = 5;
m_flag = 1;
}
// This method is executed by another thread
public void Thread2() {
// Note: m_value could be read before m_flag
if (m_flag == 1)
Console.WriteLine(m_value);
}
}
The problem with this code is that the compilers/CPU could translate the code in such a way as to
reverse the two lines of code in the Thread1 method. After all, reversing the two lines of code does
not change the intention of the method. The method needs to get a 5 in m_value and a 1 in m_flag.
From a single-threaded application’s perspective, the order of executing this code is unimportant. If
these two lines do execute in reverse order, then another thread executing the Thread2 method could
see that m_flag is 1 and then display 0.
Let’s look at this code another way. Let’s say that the code in the Thread1 method executes in
program order (the way it was written). When compiling the code in the Thread2 method, the
compiler must generate code to read m_flag and m_value from RAM into CPU registers. It is possible
that RAM will deliver the value of m_value first, which would contain a 0. Then the Thread1 method
could execute, changing m_value to 5 and m_flag to 1. But Thread2’s CPU register doesn’t see that
m_value has been changed to 5 by this other thread, and then the value in m_flag could be read
from RAM into a CPU register and the value of m_flag becomes 1 now, causing Thread2 to again
display 0.
This is all very scary stuff and is more likely to cause problems in a release build of your program
than in a debug build of your program, making it particularly tricky to detect these problems and
correct your code. Now, let’s talk about how to correct your code.
The static System.Threading.Volatile class offers two static methods that look like this:68
public static class Volatile {
public static void Write(ref Int32 location, Int32 value);
public static Int32 Read(ref Int32 location);
}
These methods are special. In effect, these methods disable some optimizations usually performed
by the C# compiler, the JIT compiler, and the CPU itself. Here’s how the methods work:
The Volatile.Write method forces the value in location to be written to at the point of
the call. In addition, any earlier program-order loads and stores must occur before the call to
Volatile.Write.
The Volatile.Read method forces the value in location to be read from at the point of the
call. In addition, any later program-order loads and stores must occur after the call to
Volatile.Read.
Important I know that this can be very confusing, so let me summarize it as a simple rule. When
threads are communicating with each other via shared memory, write the last value by calling
Volatile.Write and read the first value by calling Volatile.Read.
So now we can fix the ThreadsSharingData class using these methods:
internal sealed class ThreadsSharingData {
private Int32 m_flag = 0;
private Int32 m_value = 0;
// This method is executed by one thread
public void Thread1() {
// Note: 5 must be written to m_value before 1 is written to m_flag
m_value = 5;
Volatile.Write(ref m_flag, 1);
}
// This method is executed by another thread
public void Thread2() {
// Note: m_value must be read after m_flag is read
if (Volatile.Read(ref m_flag) == 1)
Console.WriteLine(m_value);
}
}
First, notice that we are following the rule. The Thread1 method writes two values out to fields that
68 There are also overloads of Read and Write that operate on the following types: Boolean, (S)Byte, (U)Int16,
UInt32, (U)Int64, (U)IntPtr, Single, Double, and T where T is a generic type constrained to ‘class
(reference types).
are shared by multiple threads. The last value that we want written (setting m_flag to 1) is performed
by calling Volatile.Write. The Thread2 method reads two values from fields shared by multiple
threads, and the first value being read (m_flag) is performed by calling Volatile.Read.
But what is really happening here? Well, for the Thread1 method, the Volatile.Write call
ensures that all the writes above it are completed before a 1 is written to m_flag. Since m_value = 5 is
before the call to Volatile.Write, it must complete first. In fact, if there were many variables being
modified before the call to Volatile.Write, they would all have to complete before 1 is written to
m_flag. Note that the writes before the call to Volatile.Write can be optimized to execute in any
order; it’s just that all the writes have to complete before the call to Volatile.Write.
For the Thread2 method, the Volatile.Read call ensures that all variable reads after it start after
the value in m_flag has been read. Since reading m_value is after the call to Volatile.Read, the
value must be read after having read the value in m_flag. If there were many reads after the call to
Volatile.Read, they would all have to start after the value in m_flag has been read. Note that the
reads after the call to Volatile.Read can be optimized to execute in any order; it’s just that the reads
can’t start happening until after the call to Volatile.Read.
C#’s Support for Volatile Fields
Making sure that programmers call the Volatile.Read and Volatile.Write methods correctly is a
lot to ask. It’s hard for programmers to keep all of this in their minds and to start imagining what other
threads might be doing to shared data in the background. To simplify this, the C# compiler has the
volatile keyword, which can be applied to static or instance fields of any of these types: Boolean,
(S)Byte, (U)Int16, (U)Int32, (U)IntPtr, Single, or Char. You can also apply the volatile
keyword to reference types and any enum field so long as the enumerated type has an underlying type
of (S)Byte, (U)Int16, or (U)Int32. The JIT compiler ensures that all accesses to a volatile field are
performed as volatile reads and writes, so that it is not necessary to explicitly call Volatile's static
Read or Write methods. Furthermore, the volatile keyword tells the C# and JIT compilers not to
cache the field in a CPU register, ensuring that all reads to and from the field actually cause the value
to be read from memory.
Using the volatile keyword, we can rewrite the ThreadsSharingData class as follows:
internal sealed class ThreadsSharingData {
private volatile Int32 m_flag = 0;
private Int32 m_value = 0;
// This method is executed by one thread
public void Thread1() {
// Note: 5 must be written to m_value before 1 is written to m_flag
m_value = 5;
m_flag = 1;
}
// This method is executed by another thread
public void Thread2() {
// Note: m_value must be read after m_flag is read
if (m_flag == 1)
Console.WriteLine(m_value);
}
}
There are some developers (and I am one of them) who do not like C#’s volatile keyword, and
they think that the language should not provide it.69 Our thinking is that most algorithms require few
volatile read or write accesses to a field and that most other accesses to the field can occur normally,
improving performance; seldom is it required that all accesses to a field be volatile. For example, it is
difficult to interpret how to apply volatile read operations to algorithms like this one:
m_amount = m_amount + m_amount; // Assume m_amount is a volatile field defined in a class
Normally, an integer number can be doubled simply by shifting all bits left by 1 bit, and many
compilers can examine the code above and perform this optimization. However, if m_amount is a
volatile field, then this optimization is not allowed. The compiler must produce code to read
m_amount into a register and then read it again into another register, add the two registers together,
and then write the result back out to the m_amount field. The unoptimized code is certainly bigger and
slower; it would be unfortunate if it were contained inside a loop.
Furthermore, C# does not support passing a volatile field by reference to a method. For example,
if m_amount is defined as a volatile Int32, attempting to call Int32’s TryParse method causes
the compiler to generate a warning as shown here:
Boolean success = Int32.TryParse("123", out m_amount);
// The above line causes the C# compiler to generate a warning:
// CS0420: a reference to a volatile field will not be treated as volatile
Finally, volatile fields are not Common Language Specification (CLS) compliant because many

languages (including Visual Basic) do not support them.

No comments:

Post a Comment