- Overview
- Fundamentals
- Basic Synchronization Primitives
- Atomic Operations
- Advanced Lock Algorithms
- Barrier Synchronization
- Performance Considerations
- Implementation Examples
- Related Topics
- References
Synchronization is a fundamental mechanism in concurrent and parallel programming that coordinates the execution of multiple threads or processes to ensure proper order of execution, prevent race conditions, and avoid data corruption. As modern systems increasingly rely on parallelism across multiple cores and processors, effective synchronization becomes critical for correctness and performance.
Synchronization mechanisms range from basic primitives like mutexes and semaphores used in single-machine multithreading to advanced algorithms like queueing locks and distributed barriers designed for scalable parallel systems. The choice of synchronization mechanism depends on factors including contention levels, architectural characteristics (cache-coherent vs. non-cache-coherent), and scalability requirements.
Correctness:
- Preventing race conditions when multiple threads access shared data
- Ensuring atomicity of compound operations
- Maintaining data consistency across execution units
Performance:
- Minimizing latency when acquiring free locks
- Reducing contention when multiple threads compete for resources
- Avoiding excessive cache coherence traffic
Scalability:
- Ensuring performance improves with additional processors
- Managing overhead that grows with system size
- Balancing fairness with efficiency
A race condition occurs when multiple threads access shared resources simultaneously, and the final result depends on the unpredictable timing or ordering of thread execution. Race conditions lead to non-deterministic behavior and program errors.
Example Scenario:
// Shared variable
int counter = 0;
// Two threads executing simultaneously
Thread 1: counter = counter + 1; // Read: 0, Write: 1
Thread 2: counter = counter + 1; // Read: 0, Write: 1
// Expected result: 2, Actual result: 1Without synchronization, both threads may read the same initial value, perform their increment, and write back the same incremented value, losing one update.
A critical section is a segment of code that accesses shared resources and must not be executed by more than one thread at a time. Synchronization mechanisms protect critical sections to ensure mutual exclusion.
Properties of Correct Critical Section Implementation:
- Mutual Exclusion: Only one thread can execute in the critical section at a time
- Progress: If no thread is in the critical section and threads wish to enter, selection of the next thread cannot be postponed indefinitely
- Bounded Waiting: A limit exists on the number of times other threads can enter the critical section after a thread requests entry
- Performance: Overhead of entering and exiting the critical section should be minimal
Thread safety refers to code that behaves correctly when accessed by multiple threads simultaneously. Thread-safe code avoids data races and maintains invariants even under concurrent access.
Approaches to Thread Safety:
- Synchronization: Use locks, semaphores, or other primitives to serialize access
- Immutability: Use immutable data structures that cannot be modified after creation
- Thread-Local Storage: Give each thread its own copy of data
- Lock-Free Programming: Use atomic operations and careful algorithm design
Deadlock is a situation where two or more threads are permanently blocked, each waiting for a resource held by another thread in the cycle. Deadlocks result from improper synchronization and can freeze program execution.
Four Necessary Conditions for Deadlock:
- Mutual Exclusion: Resources cannot be shared
- Hold and Wait: Threads hold resources while waiting for others
- No Preemption: Resources cannot be forcibly taken from threads
- Circular Wait: A circular chain of threads exists, each waiting for a resource held by the next
Prevention Strategies:
- Lock ordering: Always acquire locks in the same order
- Timeout mechanisms: Release locks if acquisition takes too long
- Deadlock detection: Periodically check for cycles and break them
A mutex (mutual exclusion lock) is the most fundamental synchronization primitive. It allows threads to acquire exclusive access to shared resources, ensuring only one thread executes in the critical section at a time.
An exclusive or mutual exclusion lock can be held by only one thread at a time. While one thread holds the lock, all other threads attempting to acquire it must wait.
Operations:
lock(): Acquire the lock, blocking if already heldunlock(): Release the locktrylock(): Attempt to acquire without blocking, returns success/failure
Use Cases:
- Protecting shared data structures during modification
- Ensuring atomicity of compound operations
- Preventing race conditions
A shared lock allows multiple threads to hold the lock simultaneously for read-only access while providing exclusive access for writes. This enables concurrent readers while still providing mutual exclusion with writers.
Operations:
read_lock(): Acquire shared read access (multiple readers allowed)write_lock(): Acquire exclusive write access (only one writer)unlock(): Release the lock
Use Cases:
- Data structures with frequent reads and infrequent writes
- Database systems with read-heavy workloads
- Configuration data accessed by many threads
#include <pthread.h>
// Shared data
int sharedValue = 0;
pthread_mutex_t mutex;
void* incrementSharedValue(void* arg) {
for (int i = 0; i < 100000; i++) {
// Lock the mutex to protect the critical section
pthread_mutex_lock(&mutex);
// Critical section: increment the shared value
sharedValue++;
// Unlock the mutex to release the critical section
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
// Initialize the mutex
pthread_mutex_init(&mutex, NULL);
// Create two threads
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, incrementSharedValue, NULL);
pthread_create(&thread2, NULL, incrementSharedValue, NULL);
// Wait for threads to finish
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
// Destroy the mutex
pthread_mutex_destroy(&mutex);
printf("Final Value: %d\n", sharedValue);
return 0;
}#include <pthread.h>
int sharedData = 0;
pthread_rwlock_t rwlock;
void* reader(void* arg) {
pthread_rwlock_rdlock(&rwlock);
printf("Reader %ld: Read data: %d\n", (long)arg, sharedData);
pthread_rwlock_unlock(&rwlock);
return NULL;
}
void* writer(void* arg) {
pthread_rwlock_wrlock(&rwlock);
sharedData++;
printf("Writer %ld: Wrote data: %d\n", (long)arg, sharedData);
pthread_rwlock_unlock(&rwlock);
return NULL;
}
int main() {
pthread_rwlock_init(&rwlock, NULL);
// Create reader and writer threads
pthread_t readerThreads[3], writerThreads[2];
for (long i = 0; i < 3; i++) {
pthread_create(&readerThreads[i], NULL, reader, (void*)i);
}
for (long i = 0; i < 2; i++) {
pthread_create(&writerThreads[i], NULL, writer, (void*)i);
}
// Wait for threads to finish
for (long i = 0; i < 3; i++) {
pthread_join(readerThreads[i], NULL);
}
for (long i = 0; i < 2; i++) {
pthread_join(writerThreads[i], NULL);
}
pthread_rwlock_destroy(&rwlock);
return 0;
}Semaphores are synchronization primitives that maintain a count and allow a specified number of threads to access a shared resource concurrently. They are more general than mutexes and can coordinate multiple threads.
Types:
- Binary Semaphore: Count is either 0 or 1, similar to a mutex
- Counting Semaphore: Count can be any non-negative integer
Operations:
wait()orP(): Decrement the count, blocking if count is 0signal()orV(): Increment the count, potentially waking a waiting thread
Use Cases:
- Limiting concurrent access to a resource pool
- Producer-consumer synchronization
- Signaling between threads
#include <semaphore.h>
int sharedData = 0;
sem_t semaphore;
void* producer(void* arg) {
for (int i = 1; i <= 10; i++) {
// Produce data
sharedData = i;
printf("Produced: %d\n", sharedData);
// Signal the semaphore to indicate new data is available
sem_post(&semaphore);
sleep(1);
}
return NULL;
}
void* consumer(void* arg) {
for (int i = 1; i <= 10; i++) {
// Wait for new data to be available
sem_wait(&semaphore);
// Consume the data
printf("Consumed: %d\n", sharedData);
sharedData = 0;
sleep(1);
}
return NULL;
}
int main() {
// Initialize the semaphore with initial value 0
sem_init(&semaphore, 0, 0);
pthread_t producerThread, consumerThread;
pthread_create(&producerThread, NULL, producer, NULL);
pthread_create(&consumerThread, NULL, consumer, NULL);
pthread_join(producerThread, NULL);
pthread_join(consumerThread, NULL);
sem_destroy(&semaphore);
return 0;
}Condition variables provide a mechanism for threads to wait until a specific condition is met before proceeding. They are always used in conjunction with a mutex to protect the condition check.
Operations:
wait(mutex): Atomically release mutex and wait for signalsignal(): Wake up one waiting threadbroadcast(): Wake up all waiting threads
Use Cases:
- Producer-consumer patterns with complex conditions
- Implementing monitors
- Thread coordination based on state changes
#include <pthread.h>
int sharedData = 0;
pthread_mutex_t mutex;
pthread_cond_t condition;
void* producer(void* arg) {
for (int i = 1; i <= 10; i++) {
pthread_mutex_lock(&mutex);
sharedData = i;
printf("Produced: %d\n", sharedData);
// Signal the consumer that new data is available
pthread_cond_signal(&condition);
pthread_mutex_unlock(&mutex);
sleep(1);
}
return NULL;
}
void* consumer(void* arg) {
for (int i = 1; i <= 10; i++) {
pthread_mutex_lock(&mutex);
// Wait for new data to be available
while (sharedData == 0) {
pthread_cond_wait(&condition, &mutex);
}
printf("Consumed: %d\n", sharedData);
sharedData = 0;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&condition, NULL);
pthread_t producerThread, consumerThread;
pthread_create(&producerThread, NULL, producer, NULL);
pthread_create(&consumerThread, NULL, consumer, NULL);
pthread_join(producerThread, NULL);
pthread_join(consumerThread, NULL);
pthread_mutex_destroy(&mutex);
pthread_cond_destroy(&condition);
return 0;
}Individual load and store instructions are inherently atomic, but implementing synchronization primitives requires atomicity across multiple operations. This necessitates specialized Read-Modify-Write (RMW) instructions that execute as indivisible units.
RMW operations are sometimes generically called Fetch and φ instructions, where φ represents some operation. These instructions read from memory, modify the value in a register, and write it back atomically.
Returns the current value of a memory location and atomically sets it to one.
int TestAndSet(int *L) {
int old = *L;
*L = 1;
return old;
}Use Case: Basic spinlock implementation
Properties:
- Single atomic operation
- Returns previous state
- Sets to locked state (1)
Fetches the current value and atomically increments the memory location.
int FetchAndIncrement(int *L) {
int old = *L;
*L = old + 1;
return old;
}Use Case:
- Ticket locks for fair ordering
- Generating unique sequence numbers
- Atomic counters
Properties:
- Returns old value
- Increments by 1 (or specified amount)
- Useful for FIFO ordering
Atomically returns the old value and stores a new value.
int FetchAndStore(int *L, int new_value) {
int old = *L;
*L = new_value;
return old;
}Use Case:
- MCS lock implementation
- Atomic swaps
- Building lock-free data structures
Properties:
- Atomic swap operation
- Returns previous value
- Stores arbitrary new value
Conditionally updates a memory location if its current value matches an expected value.
bool CompareAndSwap(int *L, int expected, int new_value) {
if (*L == expected) {
*L = new_value;
return true;
}
return false;
}Use Case:
- Lock-free algorithms
- Handling race conditions in complex scenarios
- Building sophisticated synchronization primitives
Properties:
- Conditional atomic update
- Returns success/failure
- Enables optimistic concurrency
- Foundation for lock-free programming
Atomically fetches the current value and decrements it.
int FetchAndDecrement(int *L) {
int old = *L;
*L = old - 1;
return old;
}Use Case:
- Counting barriers
- Reference counting
- Resource allocation
Implementing scalable synchronization primitives requires careful algorithm design to minimize latency, contention, and network traffic in parallel systems.
Latency: Time required for a thread to acquire a free lock. This should be minimized for optimal performance in the uncontended case.
Waiting Time: Time a thread waits when the lock is held by another thread. Largely application-dependent but affected by implementation fairness.
Contention: Time required for one thread to acquire a lock when multiple threads simultaneously attempt acquisition after a release. Critical for scalability in high-contention scenarios.
The simplest spinlock implementation where threads repeatedly execute TestAndSet until successful.
void Lock(int *L) {
while (TestAndSet(L) == 1) {
// spin (busy wait)
}
}
void Unlock(int *L) {
*L = 0;
}Problems:
- Heavy network contention from continuous TestAndSet operations
- Cannot exploit caches (TestAndSet bypasses cache, accesses memory directly)
- Disrupts useful work on other processors
- O(N) bus transactions per lock acquisition under contention
When to Use:
- Very low contention scenarios
- When advanced RMW instructions are unavailable
- Short critical sections
Threads spin on a cached copy of the lock variable, only attempting TestAndSet when they observe the lock becoming free through cache coherence.
void Lock(int *L) {
while (true) {
// Spin on cached value
while (*L == 1) {
// read from cache
}
// Lock appears free, try to acquire
if (TestAndSet(L) == 0) {
return; // acquired
}
}
}
void Unlock(int *L) {
*L = 0;
}Improvement:
- Reduces contention while lock is held
- Exploits cache coherence mechanisms
- Less network traffic than native spinlock
Remaining Problem:
- Thundering herd when lock is released
- All waiters simultaneously attempt TestAndSet
- O(N²) bus transactions for write-invalidate coherence
- Still generates significant contention on release
After failing to acquire a lock, threads delay for a period before retrying. The delay increases exponentially if contention persists, dynamically adapting to load.
void Lock(int *L) {
int delay = BASE_DELAY;
while (TestAndSet(L) == 1) {
sleep(delay);
delay = min(delay * 2, MAX_DELAY);
}
}
void Unlock(int *L) {
*L = 0;
}Advantages:
- Works on non-cache-coherent (NCC) multiprocessors
- Adapts to contention dynamically
- Reduces network traffic significantly
- Simple to implement
Disadvantages:
- Non-deterministic acquisition order (unfair)
- Delay tuning affects performance
- May introduce unnecessary latency
- No guarantee of bounded waiting
When to Use:
- Systems with only TestAndSet available
- Variable contention scenarios
- When fairness is not critical
Uses two counters to ensure FIFO ordering: next_ticket (incremented for each requester) and now_serving (incremented on release).
typedef struct {
int next_ticket;
int now_serving;
} TicketLock;
void Lock(TicketLock *L) {
int my_ticket = FetchAndIncrement(&L->next_ticket);
while (L->now_serving != my_ticket) {
// spin
}
}
void Unlock(TicketLock *L) {
L->now_serving++;
}Advantages:
- Fair: first-come, first-served ordering
- Single RMW operation per acquisition
- Bounded waiting time
- Simple implementation
Disadvantages:
- Still noisy: all waiters observe
now_servingupdates - O(N) cache coherence traffic per release
- Not scalable for large N
- All threads spin on same variable
When to Use:
- Fairness is important
- Moderate number of threads
- Cache-coherent systems
Queueing locks address fairness and noisiness by giving each waiting thread a unique spin location and signaling only the next thread when the lock is released.
Uses an array of flags sized to the number of processors, functioning as a circular queue.
typedef struct {
int flags[N]; // has_lock or must_wait
int queue_last;
} AndersonLock;
void Lock(AndersonLock *L, int *my_slot) {
*my_slot = FetchAndIncrement(&L->queue_last) % N;
while (L->flags[*my_slot] == MUST_WAIT) {
// spin on unique location
}
L->flags[*my_slot] = MUST_WAIT;
}
void Unlock(AndersonLock *L, int my_slot) {
int next_slot = (my_slot + 1) % N;
L->flags[next_slot] = HAS_LOCK;
}Advantages:
- Fair: FIFO ordering
- Each thread spins on unique location
- Exactly one thread signaled on release
- Single RMW per critical section
- Predictable performance
Disadvantages:
- O(N) space per lock
- Wasteful for large N or many locks
- Array size must accommodate worst-case contention
- Poor cache utilization if N is large
When to Use:
- Cache-coherent systems
- Known maximum thread count
- High contention scenarios
- When fairness is critical
Uses a linked list of queue nodes with dynamic space allocation, developed by Mellor-Crummey and Scott.
typedef struct QNode {
bool got_it;
struct QNode *next;
} QNode;
typedef QNode* MCSLock;
void Lock(MCSLock *L, QNode *my_node) {
my_node->next = NULL;
my_node->got_it = false;
QNode *predecessor = FetchAndStore(L, my_node);
if (predecessor != NULL) {
predecessor->next = my_node;
while (!my_node->got_it) {
// spin on local variable
}
}
}
void Unlock(MCSLock *L, QNode *my_node) {
if (my_node->next == NULL) {
if (CompareAndSwap(L, my_node, NULL)) {
return; // no successor
}
// Wait for successor to link in
while (my_node->next == NULL) {
// spin briefly
}
}
my_node->next->got_it = true;
}Advantages:
- Fair: FIFO ordering
- Space proportional to actual contention (dynamic)
- Each thread spins on local variable
- Typically one RMW per critical section
- Excellent scalability
Disadvantages:
- More complex implementation
- Unlock requires CompareAndSwap for corner cases
- Two RMW operations in rare scenarios
- Link list maintenance overhead
When to Use:
- High scalability requirements
- Unknown or variable thread counts
- Cache-coherent systems
- Long critical sections where fairness matters
| Scenario | Recommended Algorithm | Rationale |
|---|---|---|
| Low contention, short critical sections | Exponential backoff | Simple, adapts well, low overhead |
| High contention, fairness required | Anderson or MCS | Fair ordering, scalable |
| Only TestAndSet available | Exponential backoff | Best option without advanced RMW |
| Variable contention | Exponential backoff | Adapts dynamically to load |
| Large-scale parallel systems | MCS Lock | Best scalability, dynamic space |
| Cache-coherent with known N | Anderson Lock | Excellent performance, simple |
| Non-cache-coherent systems | Exponential backoff | Doesn't rely on cache coherence |
Barrier synchronization ensures all participating threads reach a designated point before any proceed. This is essential for coordinating phases in parallel algorithms.
All threads must arrive at the barrier and wait until all threads have arrived before any can proceed to the next computation phase. This establishes a global synchronization point.
Uses a shared counter initialized to N (number of threads). Each arriving thread atomically decrements the counter and spins until it reaches zero.
typedef struct {
int count;
int N;
} CountingBarrier;
void Barrier(CountingBarrier *B) {
FetchAndDecrement(&B->count);
// Wait for all to arrive
while (B->count > 0) {
// spin
}
// Wait for reset (avoid race on re-entry)
while (B->count != B->N) {
// spin
}
// Last thread resets (not shown in simplified version)
}Problem: Race condition if threads proceed before count is reset. Requires two spinning episodes for safety.
Advantages:
- Simple implementation
- Easy to understand
Disadvantages:
- Two spin loops required
- Centralized counter creates hot spot
- Not scalable
Eliminates one spin loop by using a sense flag that reverses when all threads arrive.
typedef struct {
int count;
int N;
bool sense;
} SenseBarrier;
// Per-thread variable
thread_local bool local_sense = true;
void Barrier(SenseBarrier *B) {
local_sense = !local_sense;
if (FetchAndDecrement(&B->count) == 1) {
// Last thread: reset and release all
B->count = B->N;
B->sense = local_sense;
} else {
// Wait for sense to change
while (B->sense != local_sense) {
// spin
}
}
}Advantages:
- Single spinning episode
- Simpler than counting barrier
- Correct re-entry handling
Disadvantages:
- Centralized count and sense create hot spots
- Limited scalability for large N
- High cache coherence traffic
When to Use:
- Small to medium number of threads
- Cache-coherent systems
- Simple barrier needs
Hierarchical barrier using divide-and-conquer to limit sharing to small groups.
Structure:
- N processors grouped into clusters of K processors
- Forms a tree with log_K(N) levels
- Each cluster has local count and sense variables
Arrival Phase:
- Thread decrements local group counter
- Last thread in group propagates arrival to parent level
- Continues up to root
Wake-up Phase:
- Root (last arriving thread) reverses sense
- Signal propagates down tree level by level
- Each parent releases its children
Advantages:
- Reduces contention through hierarchical grouping
- O(log N) communication complexity
- More scalable than centralized barriers
Disadvantages:
- Spin locations dynamically determined (problematic for NCC NUMA)
- Arity K affects contention levels
- Complex implementation
Modified tree barrier with statically determined spin locations, using a 4-ary arrival tree and binary wake-up tree.
Arrival Tree (4-ary):
- Each child signals parent at unique, pre-assigned location
- Parent checks ChildNotReady vector
- Static assignment enables efficient NUMA implementation
Wake-up Tree (Binary):
- Parent uses ChildPointer structures to signal children
- Children spin on statically determined locations
- Binary structure based on theoretical performance results
Advantages:
- Static spin locations optimize for NUMA
- Reduces cache coherence traffic
- Suitable for various architectures
- Excellent scalability
Disadvantages:
- More complex implementation
- Less spatial locality than single cache line per group
- Requires careful initialization
Organizes barrier as a tournament with predetermined winners advancing each round.
Arrival Phase:
for round = 0 to log(N)-1:
if I am winner in this round:
wait for signal from opponent
advance to next round
else:
signal opponent
break
Wake-up Phase:
for round = log(N)-1 down to 0:
if I was winner:
signal opponent in this round
Advantages:
- Statically determined spin locations (excellent for NCC NUMA)
- Uses only atomic read and write (no RMW required)
- O(log N) communication complexity
- Exploits network parallelism
- Works with message passing (clusters)
Disadvantages:
- Cannot pack multiple spins in one cache line
- Less spatial locality than MCS
- Fixed tournament structure
Information diffusion where in round k, processor P_i sends to P_{(i + 2^k) mod N}.
for round = 0 to ceiling(log N)-1:
target = (my_id + 2^round) mod N
send message to target
wait for message from source
Advantages:
- No hierarchical structure
- Works on NCC NUMA and clusters
- Independent, distributed decisions
- N need not be a power of 2
- Simple implementation
Disadvantages:
- O(N log N) total messages
- Higher communication complexity than tree barriers
- More network traffic
| Algorithm | Communication | Spin Location | RMW Required | Best Architecture |
|---|---|---|---|---|
| Counting | O(N) | Dynamic | Yes | Small CC systems |
| Sense Reversal | O(N) | Dynamic | Yes | Small CC systems |
| Tree | O(log N) | Dynamic | Yes | CC systems |
| MCS | O(log N) | Static | Yes | NUMA, CC |
| Tournament | O(log N) | Static | No | NCC NUMA, Clusters |
| Dissemination | O(N log N) | Static | No | NCC NUMA, Clusters |
As the number of threads or processors increases, synchronization overhead can limit scalability. Key factors include:
Cache Coherence Traffic:
- Write operations to shared lock variables trigger invalidations
- More threads mean more cache coherence messages
- Can saturate interconnection network
Memory Latency:
- Access to shared synchronization variables may miss cache
- NUMA effects make remote access expensive
- Deep memory hierarchies increase latency variance
Fairness vs. Performance:
- Fair algorithms (FIFO ordering) may have higher overhead
- Unfair algorithms can lead to starvation
- Trade-off depends on application requirements
False sharing occurs when independent variables reside in the same cache line, causing unnecessary cache coherence traffic.
Example:
struct {
int counter_A; // Updated by Thread 1
int counter_B; // Updated by Thread 2
} shared_data; // Both in same cache lineSolutions:
- Pad structures to cache line boundaries (typically 64 bytes)
- Align frequently updated variables separately
- Use thread-local storage when possible
Lock-free programming designs algorithms that guarantee system-wide progress without using locks, typically using atomic operations like CAS.
Advantages:
- No deadlock possible
- Better scalability in some scenarios
- Eliminates lock contention overhead
Disadvantages:
- Complex algorithm design
- Difficult to verify correctness
- May have higher overhead in low-contention scenarios
- ABA problem requires careful handling
Use Cases:
- High-contention data structures
- Real-time systems requiring bounded latency
- Systems where lock overhead is prohibitive
Wait-free programming is stronger than lock-free: every operation completes in a bounded number of steps regardless of other threads' actions.
Properties:
- Strongest progress guarantee
- No thread can be indefinitely delayed
- Ideal for real-time systems
Challenges:
- Very difficult to implement
- Often requires helping mechanisms
- Higher overhead than lock-free
#include <semaphore.h>
#define BUFFER_SIZE 10
int buffer[BUFFER_SIZE];
int in = 0, out = 0;
sem_t empty, full;
pthread_mutex_t mutex;
void* producer(void* arg) {
for (int i = 0; i < 100; i++) {
int item = produce_item();
sem_wait(&empty);
pthread_mutex_lock(&mutex);
buffer[in] = item;
in = (in + 1) % BUFFER_SIZE;
pthread_mutex_unlock(&mutex);
sem_post(&full);
}
return NULL;
}
void* consumer(void* arg) {
for (int i = 0; i < 100; i++) {
sem_wait(&full);
pthread_mutex_lock(&mutex);
int item = buffer[out];
out = (out + 1) % BUFFER_SIZE;
pthread_mutex_unlock(&mutex);
sem_post(&empty);
consume_item(item);
}
return NULL;
}
int main() {
sem_init(&empty, 0, BUFFER_SIZE);
sem_init(&full, 0, 0);
pthread_mutex_init(&mutex, NULL);
pthread_t prod, cons;
pthread_create(&prod, NULL, producer, NULL);
pthread_create(&cons, NULL, consumer, NULL);
pthread_join(prod, NULL);
pthread_join(cons, NULL);
sem_destroy(&empty);
sem_destroy(&full);
pthread_mutex_destroy(&mutex);
return 0;
}#include <pthread.h>
int readers = 0;
pthread_mutex_t mutex;
pthread_mutex_t write_lock;
void* reader(void* arg) {
pthread_mutex_lock(&mutex);
readers++;
if (readers == 1) {
pthread_mutex_lock(&write_lock); // First reader locks writers
}
pthread_mutex_unlock(&mutex);
// Read shared data
read_data();
pthread_mutex_lock(&mutex);
readers--;
if (readers == 0) {
pthread_mutex_unlock(&write_lock); // Last reader unlocks writers
}
pthread_mutex_unlock(&mutex);
return NULL;
}
void* writer(void* arg) {
pthread_mutex_lock(&write_lock);
// Write to shared data
write_data();
pthread_mutex_unlock(&write_lock);
return NULL;
}#define N 5
pthread_mutex_t forks[N];
void* philosopher(void* arg) {
int id = *(int*)arg;
int left = id;
int right = (id + 1) % N;
while (1) {
think();
// Avoid deadlock: acquire lower-numbered fork first
if (left < right) {
pthread_mutex_lock(&forks[left]);
pthread_mutex_lock(&forks[right]);
} else {
pthread_mutex_lock(&forks[right]);
pthread_mutex_lock(&forks[left]);
}
eat();
pthread_mutex_unlock(&forks[left]);
pthread_mutex_unlock(&forks[right]);
}
return NULL;
}Course Materials:
- CS 6210: Advanced Operating Systems - Georgia Tech OMSCS
- CS 6200: Introduction to Operating Systems - Georgia Tech OMSCS
- COMS W4118: Operating Systems I - Columbia University