Core CS Concept

Concurrency for CS Students

Threads, mutexes, race conditions, deadlocks, atomic operations, message passing, and async/await explained from first principles. Worked deadlock walkthrough showing thread A holds lock 1 needs lock 2 while thread B holds lock 2 needs lock 1, plus per-language primitives in Java, C++, Python, Go, JavaScript. Verified CS graduates from CMU, MIT, and Berkeley, starting at $20 per task.

Concurrency concept visualization showing parallel execution, synchronization, and shared-state coordination
4 Languages covered
10 FAQ answers
2 Related subjects
3 Code examples

What it means

A working definition of Concurrency

Concurrency lets multiple computations make progress at the same time, by interleaving on one CPU or running in parallel on many. Coordination requires synchronization primitives (mutex, semaphore, condition variable, atomic) or message passing.

Java AtomicInteger, C++ std::atomic, Python threading.Lock, Go channels: every language exposes the same hardware primitives differently. The Python GIL makes threads useful for I/O concurrency but not CPU parallelism unless you reach for multiprocessing.

Primary example

The canonical race condition

The canonical race condition

          
          // Race condition: 4 threads each increment 250000 times.
        
          
          // Expected final value: 1000000. Actual: between 500000 and 1000000.
        
          
          public class Race {
        
          
              static int counter = 0;
        
          
              public static void main(String[] args) throws Exception {
        
          
                  Thread[] threads = new Thread[4];
        
          
                  for (int i = 0; i < 4; i++) {
        
          
                      threads[i] = new Thread(() -> {
        
          
                          for (int j = 0; j < 250000; j++) counter++;
        
          
                      });
        
          
                      threads[i].start();
        
          
                  }
        
          
                  for (Thread t : threads) t.join();
        
          
                  System.out.println(counter);  // < 1000000 due to lost updates
        
          
              }
        
          
          }
        
          
           
        
          
          // Fix: use AtomicInteger.incrementAndGet() or synchronized block.
        

Four synchronization primitives

Common concurrency patterns

Mutex: mutual exclusion

A mutex lets exactly one thread hold the lock at a time, protecting a critical section from simultaneous read-modify-write access. Every shared variable touched by more than one thread needs a mutex, or an atomic equivalent.

Condition variable

A condition variable lets a thread release a mutex and sleep until another thread signals a state change. Standard for producer-consumer queues: consumers wait on "not empty"; producers wait on "not full". Always wait in a while loop to handle spurious wakeups.

Atomic operations

Hardware-supported instructions like compare-and-swap and fetch-and-add update a memory location indivisibly, no mutex required. Use atomics for counters, flags, and lock-free data structures. Memory ordering (acquire, release, seq_cst) controls how the operation interleaves with surrounding code.

Wrong way vs right way

Fix this concurrency bug

Deadlock from inconsistent lock order Python
import threading

lock1 = threading.Lock()
lock2 = threading.Lock()

def thread_a():
    with lock1:
        # ... do work ...
        with lock2:  # waits for B to release lock2
            pass

def thread_b():
    with lock2:
        # ... do work ...
        with lock1:  # waits for A to release lock1
            pass
# Both threads sleep forever.
Global lock-ordering breaks the cycle Python
import threading

lock1 = threading.Lock()
lock2 = threading.Lock()

def thread_a():
    with lock1:
        with lock2:
            pass

# Always acquire lock1 first, then lock2.
def thread_b_fixed():
    with lock1:
        with lock2:
            pass
# No cycle in the wait-for graph; deadlock impossible.
Thread A holds lock1 then waits for lock2; thread B holds lock2 then waits for lock1. The wait-for graph has a cycle, both threads sleep forever. The fix is global lock-ordering: every code path acquires locks in the same total order, and no cycle can form.

Cross-language

Same concept across languages

JavaScript: parallel vs sequential async
// Async/await for parallel HTTP requests in Node.js
async function fetchAll(urls) {
    // Promise.all runs requests in parallel, awaits all results
    const responses = await Promise.all(
        urls.map(url => fetch(url))
    );
    const bodies = await Promise.all(
        responses.map(r => r.text())
    );
    return bodies;
}

// Sequential version (slow): each await blocks the next
async function fetchAllSlow(urls) {
    const results = [];
    for (const url of urls) {
        const r = await fetch(url);
        results.push(await r.text());
    }
    return results;
}

// Difference for 10 URLs at 100ms each:
//   fetchAll: ~100ms (all parallel)
//   fetchAllSlow: ~1000ms (sequential)

FAQ

Concurrency FAQ

What is the difference between a thread and a process?
A process has an isolated address space; threads share the address space within a process. Processes are isolated by hardware (one process cannot read or write another's memory); threads share all global memory implicitly. Processes are expensive to create (milliseconds, requires kernel fork); threads are cheap (microseconds, requires kernel clone with shared address space). Communication: processes use pipes, sockets, shared memory regions; threads share memory directly with synchronization primitives. Web servers traditionally used threads per connection; modern designs use a few threads with async I/O. Python's GIL makes threads useful for I/O but not CPU parallelism in CPython.
How do I prevent race conditions?
Three options. First: protect every shared variable with a mutex. Lock before read or write, release after. Make sure all access paths use the same mutex. Second: use atomic operations for single-variable updates. AtomicInteger.incrementAndGet in Java, std::atomic in C++, sync/atomic in Go. Third: avoid sharing mutable state. Use message passing (Go channels, Erlang actors) or immutable data structures so there is nothing to race on. Run ThreadSanitizer (-fsanitize=thread) or Helgrind to catch races at runtime; both report the conflicting accesses with stack traces.
What is a deadlock and how do I prevent it?
A deadlock is when 2 or more threads wait for each other in a cycle and no thread can proceed. The 4 Coffman conditions are jointly necessary: mutual exclusion, hold-and-wait, no preemption, circular wait. Breaking any one prevents deadlock. The standard prevention is global lock-ordering: if every code path acquires locks in the same total order, no cycle can form. Other techniques: try_lock with timeout and back-off (introduces livelock risk), lock-free data structures, 2-phase locking with retry on conflict. Detect deadlocks by examining stack traces: jstack on Java, gdb on C and C++, py-spy on Python.
When do I use a mutex vs a condition variable vs a semaphore?
Mutex: protect a critical section so only one thread executes it at a time. Use for read-modify-write on shared data. Condition variable: a thread waits until a condition becomes true, releasing a mutex while waiting. Use for producer-consumer queues where consumers wait for items. Always pair with a mutex and wait in a while loop to handle spurious wakeups. Semaphore: counted resource with N permits. Use for thread pool worker slots, database connection pools, rate limiting. RWLock: multiple concurrent readers or a single writer. Use when reads vastly outnumber writes (cache lookups, configuration objects).
What are atomic operations and when do I need them?
Atomic operations are hardware-supported instructions that read-modify-write a single memory location indivisibly. Common: atomic load, store, increment, exchange, compare-and-swap (CAS). Use atomics for single-variable updates (counter increment, flag setting) where a mutex would be overkill. Use atomics to implement lock-free data structures (Michael-Scott queue, hazard pointers) where mutex contention is the bottleneck. C++ std::atomic, Java AtomicInteger and AtomicReference, Go sync/atomic. Memory ordering matters: seq_cst is the default and easiest to reason about; relaxed is the fastest but requires explicit fences for happens-before guarantees.
How does async/await work in JavaScript and Python?
async/await is single-threaded cooperative scheduling. An async function returns a promise (JavaScript) or coroutine (Python); await pauses execution until the awaited future resolves, yielding control back to the runtime. The runtime polls ready futures and runs them until the next await point. Suitable for I/O-bound workloads (HTTP requests, file I/O, database queries) where the program spends most time waiting. Not suitable for CPU-bound work because there is no parallelism. For CPU work in Python: use multiprocessing or concurrent.futures.ProcessPoolExecutor to bypass the GIL. For CPU work in Node.js: use worker_threads.
What is the Python GIL and how does it affect my code?
The Global Interpreter Lock (GIL) is a process-wide lock in CPython that prevents two threads from executing Python bytecode simultaneously. Practical impact: threads are useful for I/O concurrency (the GIL releases during I/O system calls), but provide no CPU parallelism for pure Python code. CPU-bound code with 4 threads runs at the speed of 1 thread on a 4-core machine. Workarounds: use multiprocessing for true parallelism (each process has its own GIL), use C extensions that release the GIL (NumPy, TensorFlow), use Cython with nogil blocks, use PyPy or the upcoming free-threaded CPython (PEP 703).
What is the difference between concurrency and parallelism?
Concurrency is the property of a program that allows multiple computations to make progress at the same time, whether by interleaving on a single CPU or running on multiple CPUs. Parallelism is the simultaneous execution of multiple computations on multiple CPUs. Concurrency is a structuring concept (how the program is organized); parallelism is an execution property (whether the hardware runs things simultaneously). A single-threaded async program is concurrent (multiple I/O operations in flight) but not parallel (one core does all the work). A multi-threaded program on a multi-core machine is both concurrent and parallel.
How do I debug a race condition that only appears 1 in 10000 times?
Three tools. First: ThreadSanitizer (-fsanitize=thread in gcc and clang) catches data races at runtime by tracking the happens-before relation; reports the 2 conflicting accesses with stack traces, even if the race did not actually manifest in this run. Second: Helgrind (Valgrind tool) catches lock-order violations and missing synchronization. Third: stress tests with pthread_yield, sched_yield, or sleep(0) at every potential interleaving point to force the rare schedule. For Java: jstack on a hung process shows the wait-for graph; jcstress is a stress-testing framework specifically for concurrency. For Go: go test -race enables the race detector.
Can you help with CMU 15-410, CS162 Pintos, or 6.S081 concurrency labs?
Yes. CMU 15-410 Pebbles kernel project 3 implements a scheduler with futex-based synchronization. CS162 Pintos project 1 covers priority scheduling, MLFQ, and priority donation through chained locks. MIT 6.S081 lab 8 (lock) tunes the kernel allocator and buffer cache for parallel scaling. Standard work for verified CS graduates with OS systems experience. Our deliverables include the kernel patch with explicit happens-before reasoning per synchronization change, the make grade autograder pass, and a 1-page design memo on the kernel state diagram before and after the change.

Stuck on concurrency?

Submit your assignment and get expert, pedagogical help within 12 hours. Every solution ships with line-by-line comments, complexity analysis, and unlimited revisions.

Get Concurrency Help