Volatile
Constructs
Back
in the early days of computing, software was written using assembly language.
Assembly
language
is very tedious, because programmers must explicitly state everything—use this
CPU register
for
this, branch to that, call indirect through this other thing, and so on. To
simplify programming,
higher-level
languages were introduced. These higher-level languages introduced common
useful
constructs,
like if/else, switch/case, various loops, local
variables, arguments, virtual method calls,
operator
overloads, and much more. Ultimately, these language compilers must convert the
high-level
constructs
down to the low-level constructs so that the computer can actually do what you
want it to
do.
In
other words, the C# compiler translates your C# constructs into Intermediate
Language (IL),
which
is then converted by the just-in-time (JIT) compiler into native CPU
instructions, which must then
be
processed by the CPU itself. In addition, the C# compiler, the JIT compiler,
and even the CPU itself
can
optimize your code. For example, the following ridiculous method can ultimately
be compiled into
nothing:
private
static void OptimizedAway() {
//
Constant expression is computed at compile time resulting in zero
Int32
value = (1 * 100) - (50 * 2);
// If
value is 0, the loop never executes
for
(Int32 x = 0; x < value; x++) {
// There
is no need to compile the code in the loop since it can never execute
Console.WriteLine("Jeff");
}
}
In
this code, the compiler can see that value will always be 0; therefore, the loop
will never execute
and
consequently, there is no need to compile the code inside the loop. This method
could be
compiled
down to nothing. In fact, when JITting a method that calls OptimizedAway, the JITter will try
to
inline the OptimizedAway
method’s
code. Since there is no code, the JITter will even remove the
code
that tries to call OptimizedAway. We love this feature of compilers. As
developers, we get to
write
the code in the way that makes the most sense to us. The code should be easy to
write, read, and
maintain.
Then compilers translate our intentions into machine-understandable code. We
want our
compilers
to do the best job possible for us.
When
the C# compiler, JIT compiler, and CPU optimize our code, they guarantee us
that the
intention of the code
is preserved. That is, from a single-threaded perspective, the method does what
we
want it to do, although it may not do it exactly the way we described in our
source code. However,
the
intention might not be preserved from a multithreaded perspective. Here is an
example where the
optimizations
make the program not work as expected:
internal
static class StrangeBehavior {
// As
you'll see later, mark this field as volatile to fix the problem
private
static Boolean s_stopWorker = false;
public
static void Main() {
Console.WriteLine("Main:
letting worker run for 5 seconds");
Thread t
= new Thread(Worker);
t.Start();
Thread.Sleep(5000);
s_stopWorker
= true;
Console.WriteLine("Main:
waiting for worker to stop");
t.Join();
}
private
static void Worker(Object o) {
Int32 x
= 0;
while
(!s_stopWorker) x++;
Console.WriteLine("Worker:
stopped when x={0}", x);
}
}
In
this code, the Main
method
creates a new thread that executes the Worker method. This Worker
method
counts as high as it can before being told to stop. The Main method allows the Worker thread
to
run for 5 seconds before telling it to stop by setting the static
Boolean field
to true. At this
point,
the Worker
thread
should display what it counted up to, and then the thread will terminate. The
Main
thread
waits for the Worker
thread to
terminate by calling Join, and then the Main thread
returns,
causing the whole process to terminate.
Looks
simple enough, right? Well, the program has a potential problem due to all the
optimizations
that
could happen to it. You see, when the Worker method is compiled,
the compiler sees that
s_stopWorker
is either
true
or false, and it also sees
that this value never changes inside the
Worker
method
itself. So the compiler could produce code that checks s_stopWorker
first. If
s_stopWorker
is true, then “Worker:
stopped when x=0” will be displayed. If s_stopWorker
is
false, then the compiler
produces code that enters an infinite loop that increments x forever. You
see,
the optimizations cause the loop to run very fast because checking s_stopWorker
only
occurs
once
before the loop; it does not get checked with each iteration of the loop.
If
you actually want to see this in action, put this code in a .cs file and
compile the code using C#’s
/platform:x86
and /optimize+
switches.
Then run the resulting EXE file, and you’ll see that the
program
runs forever. Note that you have to compile for x86 ensuring that the x86 JIT
compiler is used
at
runtime. The x86 JIT compiler is more mature than the x64 JIT compiler, so it
performs more
aggressive
optimizations. The x64 JIT compiler does not perform this particular
optimization, and
therefore
the program runs to completion. This highlights another interesting point about
all of this.
Whether your program
behaves as expected depends on a lot of factors, such as which compiler
version
and compiler switches are used, which JIT compiler is used, and which CPU your
code is
running
on. In addition, to see the program above run forever, you must not run the
program under a
debugger
because the debugger causes the JIT compiler to produce unoptimized code that
is easier to
step
through.
Let’s
look at another example, which has two threads that are both accessing two
fields:
internal
sealed class ThreadsSharingData {
private
Int32 m_flag = 0;
private
Int32 m_value = 0;
// This
method is executed by one thread
public
void Thread1() {
// Note:
These could execute in reverse order
m_value
= 5;
m_flag =
1;
}
// This
method is executed by another thread
public
void Thread2() {
// Note:
m_value could be read before m_flag
if
(m_flag == 1)
Console.WriteLine(m_value);
}
}
The
problem with this code is that the compilers/CPU could translate the code in
such a way as to
reverse
the two lines of code in the Thread1 method. After all, reversing the two lines of
code does
not
change the intention of the method. The method needs to get a 5 in m_value and a 1 in m_flag.
From
a single-threaded application’s perspective, the order of executing this code
is unimportant. If
these
two lines do execute in reverse order, then another thread executing the Thread2 method could
see
that m_flag
is 1 and then display 0.
Let’s
look at this code another way. Let’s say that the code in the Thread1 method executes in
program
order (the way it was written). When compiling the code in the Thread2 method, the
compiler
must generate code to read m_flag and m_value from RAM into CPU registers. It is possible
that
RAM will deliver the value of m_value first, which would contain a 0. Then the Thread1 method
could
execute, changing m_value to 5 and m_flag to 1. But Thread2’s CPU register
doesn’t see that
m_value
has been
changed to 5
by this
other thread, and then the value in m_flag could be read
from
RAM into a CPU register and the value of m_flag becomes 1 now, causing Thread2 to again
display
0.
This
is all very scary stuff and is more likely to cause problems in a release build
of your program
than
in a debug build of your program, making it particularly tricky to detect these
problems and
correct your code.
Now, let’s talk about how to correct your code.
The
static System.Threading.Volatile
class
offers two static methods that look like this:68
public
static class Volatile {
public
static void Write(ref Int32 location, Int32 value);
public
static Int32 Read(ref Int32 location);
}
These
methods are special. In effect, these methods disable some optimizations
usually performed
by
the C# compiler, the JIT compiler, and the CPU itself. Here’s how the methods
work:
The Volatile.Write
method
forces the value in location to be written to at the point of
the
call. In addition, any earlier program-order loads and stores must occur before
the call to
Volatile.Write.
The Volatile.Read
method
forces the value in location to be read from at the point of the
call.
In addition, any later program-order loads and stores must occur after the call
to
Volatile.Read.
Important
I know
that this can be very confusing, so let me summarize it as a simple rule. When
threads
are communicating with each other via shared memory, write the last value by
calling
Volatile.Write
and read
the first value by calling Volatile.Read.
So
now we can fix the ThreadsSharingData class using these methods:
internal
sealed class ThreadsSharingData {
private
Int32 m_flag = 0;
private
Int32 m_value = 0;
// This
method is executed by one thread
public
void Thread1() {
// Note:
5 must be written to m_value before 1 is written to m_flag
m_value
= 5;
Volatile.Write(ref
m_flag, 1);
}
// This
method is executed by another thread
public
void Thread2() {
// Note:
m_value must be read after m_flag is read
if
(Volatile.Read(ref m_flag) == 1)
Console.WriteLine(m_value);
}
}
First,
notice that we are following the rule. The Thread1 method writes two
values out to fields that
68
There are
also overloads of Read and Write that operate on the following types: Boolean, (S)Byte, (U)Int16,
UInt32, (U)Int64, (U)IntPtr, Single, Double, and T where T is a generic type
constrained to ‘class’
(reference types).
are
shared by multiple threads. The last value that we want written (setting m_flag to 1) is performed
by
calling Volatile.Write. The Thread2 method reads two
values from fields shared by multiple
threads,
and the first value being read (m_flag) is performed by calling Volatile.Read.
But
what is really happening here? Well, for the Thread1 method, the Volatile.Write
call
ensures
that all the writes above it are completed before a 1 is written to m_flag. Since m_value =
5 is
before
the call to Volatile.Write, it must complete
first. In fact, if there were many variables being
modified
before the call to Volatile.Write, they would all have to complete before 1 is written to
m_flag. Note that the writes
before the call to Volatile.Write can be optimized to execute in any
order;
it’s just that all the writes have to complete before the call to Volatile.Write.
For
the Thread2
method,
the Volatile.Read
call
ensures that all variable reads after it start after
the
value in m_flag
has been
read. Since reading m_value is after the call to Volatile.Read, the
value
must be read after having read the value in m_flag. If there were many
reads after the call to
Volatile.Read, they would all have
to start after the value in m_flag has been read. Note that the
reads
after the call to Volatile.Read can be optimized to execute in any order;
it’s just that the reads
can’t start happening
until after the call to Volatile.Read.
C#’s
Support for Volatile Fields
Making
sure that programmers call the Volatile.Read and Volatile.Write methods correctly is a
lot
to ask. It’s hard for programmers to keep all of this in their minds and to
start imagining what other
threads
might be doing to shared data in the background. To simplify this, the C#
compiler has the
volatile
keyword,
which can be applied to static or instance fields of any of these types: Boolean,
(S)Byte, (U)Int16, (U)Int32, (U)IntPtr,
Single,
or Char. You can also apply
the volatile
keyword
to reference types and any enum field so long as the enumerated type has an
underlying type
of
(S)Byte, (U)Int16, or (U)Int32. The JIT compiler
ensures that all accesses to a volatile field are
performed
as volatile reads and writes, so that it is not necessary to explicitly call Volatile's static
Read
or Write methods. Furthermore,
the volatile
keyword
tells the C# and JIT compilers not to
cache
the field in a CPU register, ensuring that all reads to and from the field
actually cause the value
to
be read from memory.
Using
the volatile
keyword,
we can rewrite the ThreadsSharingData class as follows:
internal
sealed class ThreadsSharingData {
private
volatile Int32 m_flag = 0;
private
Int32 m_value = 0;
// This
method is executed by one thread
public
void Thread1() {
// Note:
5 must be written to m_value before 1 is written to m_flag
m_value
= 5;
m_flag =
1;
}
// This
method is executed by another thread
public void Thread2() {
// Note:
m_value must be read after m_flag is read
if
(m_flag == 1)
Console.WriteLine(m_value);
}
}
There
are some developers (and I am one of them) who do not like C#’s volatile keyword, and
they
think that the language should not provide it.69 Our thinking is that most algorithms require
few
volatile
read or write accesses to a field and that most other accesses to the field can
occur normally,
improving
performance; seldom is it required that all accesses to a field be volatile.
For example, it is
difficult
to interpret how to apply volatile read operations to algorithms like this one:
m_amount
= m_amount + m_amount; // Assume m_amount is a volatile field defined in a
class
Normally,
an integer number can be doubled simply by shifting all bits left by 1 bit, and
many
compilers
can examine the code above and perform this optimization. However, if m_amount is a
volatile
field,
then this optimization is not allowed. The compiler must produce code to read
m_amount
into a
register and then read it again into another register, add the two registers together,
and
then write the result back out to the m_amount field. The unoptimized
code is certainly bigger and
slower;
it would be unfortunate if it were contained inside a loop.
Furthermore,
C# does not support passing a volatile field by reference to a method. For example,
if
m_amount
is
defined as a volatile
Int32,
attempting to call Int32’s TryParse method causes
the
compiler to generate a warning as shown here:
Boolean
success = Int32.TryParse("123", out m_amount);
// The
above line causes the C# compiler to generate a warning:
//
CS0420: a reference to a volatile field will not be treated as volatile
Finally,
volatile fields are not Common Language Specification (CLS) compliant because
many
languages (including
Visual Basic) do not support them.
No comments:
Post a Comment