Memory Barriers

I can’t seem to keep the details of memory ordering and memory barriers fresh in my mind, so this time I am taking notes.


  1. Memory Ordering in Modern Microprocessors by Paul E. McKenney
  2. Reordering Constraints for Pthread-Style Locks by Hans-J. Boehm


Processors used to be slow, in-order, and single-core. Now they are fast, out-of-order, and multi-core.

It’s the out-of-order memory accesses, combined with multi-core, that really cause the problems. Where do these out-of-order memory accesses come from?

  • The compiler may reorder your code, at least between sequence points.
  • The CPU may execute the instructions out-of-order. The visible effect is that memory references occur out of order. Processors do this to varying degrees, as summarized on Memory Ordering at Wikipedia.
  • CPUs may see other CPUs’ writes out-of-order, due to per-CPU cache design. Section “Why Reorder Memory Accesses?” of [1] discusses this. In summary, CPUs are now so fast that each needs multiple independent cache banks. One bank may be busy, while the other may be idle. A write to the idle bank may become globally visible before an earlier write to the busy bank.

Types of Barriers (Fences)

compiler memory barrier (nothing emitted into code) vs hardware memory barrier

Of course in practice, macros which emit hardware memory barriers also will emit a compiler memory barrier.


acquire / release semantics

Barriers in Locking Primitives

Locking primitives typically act as some sort of memory barrier.

Linux Kernel

The Linux kernel clearly documents that lock/unlock critical sections are not fully fenced. Loosely speaking, memory accesses outside can “leak” into the critical section, but accesses inside cannot leak out. See [3].


I used to think that using pthreads in userspace was simple and safe. My understanding (assumption?) was that each pthread mutex call was, effectively, a full memory barrier. That is, memory accesses before the lock were completed strictly before; memory accesses between the lock and unlock were completed between, and memory accesses after the unlock remained after.

It’s not clear that’s a safe assumption. Quoting [2]:

It is clearly not generally acceptable to move memory operations out of critical sections, i.e.
regions in which a lock is held; doing so would introduce a data race. But it is far less clear
whether it is acceptable to move memory operations into critical sections, which is the question we
pursue here.


...a memory operation immediately following a pthread_mutex_unlock operation may always be moved to
just before the pthread_mutex_unlock. But the corresponding reordering of a pthread_mutex_lock
with a preceding memory operation is not generally safe. More general movement into critical
sections is possible in the absence of the pthread_mutex_trylock call.

As we point out in section 3, many current implementations either do not follow these rules, or add
extra fences. Nor is it completely apparent that they should, since these semantics appear to have
been largely accidental.

To me, this sounds like the only safe, portable expectation of pthreads mutexes is to treat them like Linux kernel locks: The enclosed memory references cannot leak out, but other memory refernces can leak in.

In fact, Table 1 of [2] shows this through code inspection.

To state it clearly: pthreads locks are not full memory barriers.



struct T { atomic_t refs; };

struct T *alloc_t() { struct T *t = kmalloc(sizeof(struct T)); atomic_set(&t->refs, 1); return t; }

static void free_t(struct T *t) { }

void get_t(struct T *t) { }

void put_t(struct T *t) { }

October 4, 2013
576 words


linux kernel



ccoffing on GitHub

ccoffing on LinkedIn