Readers-Writers Problem

Consider reading previous posts about threads in python and locks and previous post about condition variables for a necessary background if you are not familiar with python threading and concurrency control.

Introduction

The readers-writers problem is one of the classic synchronization problems in concurrency (together with mutual exclusion and the producer-consumer problem) that every computer scientist should know about. We start with the following observation: Operations on shared data should be done within critical sections, to exclude the possibility of two threads modifying the data simultaneously. On the other hand, multiple threads can simultaneously read the same data without any negative effects. On the still other hand, although a read concurrent with a write in another thread will not corrupt the data, the data read may still come out as inconsistent. Think of listing account data from several bank accounts concurrent with transfer of money between accounts: you may observe either missing on excess money even though the completed write operation does not change the total amount of money in the accounts. These considerations lead to the following general problem of readers and writers:

We have two types of threads: readers and writers. Both types of threads have a generalized critical section S for which the following conditions are met:

Two writers cannot be present in S at the same time

The writer and the reader cannot be present in S at the same time

However, we allow any number of readers to be in S at the same time

Usually, as the naming suggests, writers modify the shared data, whereas readers just read it. However, the problem is more general, and, in fact, in our nonsensical examples we will not modify nor read any shared data, aside from shared data used for synchronization.

The following program, and subsequent fixes to it, will illustrate the problem of readers and writers in this post:

import threading
import time
import random

class Reader(threading.Thread):
    def __init__(self, m):
        threading.Thread.__init__(self)
        self.m = m
    def run(self):
        for i in range(5):
            time.sleep(random.random())
            print('Reader', self.m, 'starts reading,', i)
            time.sleep(random.random())
            print('Reader', self.m, 'stops reading,', i)

class Writer(threading.Thread):
    def __init__(self, m):
        threading.Thread.__init__(self)
        self.m = m
    def run(self):
        for i in range(5):
            time.sleep(random.random())
            print('Writer', self.m, 'starts writing,', i)
            time.sleep(random.random())
            print('Writer', self.m, 'stops writing,', i)

def factory(m):
    if random.random() < 0.5:
        return Reader(m)
    else:
        return Writer(m)

threads = [factory(m) for m in range(10)]

for t in threads:
    t.start()

Of course, the above code does not contain any synchronization instructions and thus does not satisfy the synchronization constraints of the readers and writers problem. The role of the generalized critical section is fulfilled by a pair of messages starts writing/reading … stops writing/reading. Also note the function factory(m) which returns a reader or writer object (numbered m) depending on a random number from 0 to 1. For the code above, the cutoff is 0.5, so we randomly draw an equal number of readers and writers on average. By increasing the threshold value to 0.8 we will ensure that there are on average four times as many readers as writers.

There are many solutions to the problem of readers and writers using different techniques. In the following subsections, we will discuss two such solutions, one in its entirety, the other will be left as an exercise to do on your own. In both cases the alternate description can be found on readers-writers lock article on wiki First, however, we will formulate the skeleton of the solution using as yet unimplemented readers-writers lock (or RW lock, for brevity) object. Subsequent solutions will simply implement this lock object differently, with rest of the program remaining unchanged.

The RWLock class of RW locks must implement the following two pairs of methods:

rLock() and rUnLock() executed by reader threads at the beginning and the end of the critical section, respectively
wLock() and wUnLock() executed by writer threads at the beginning and the end of the critical section, respectively

Using the rw object of the RWLock class, we can modify the code of the reader and writer threads as follows:

rw = RWLock()

class Reader(threading.Thread):
    def __init__(self, m):
        threading.Thread.__init__(self)
        self.m = m
    def run(self):
        for i in range(20):
            time.sleep(random.random())
            rw.rLock()
            print('Reader', self.m, 'starts reading,', i)
            time.sleep(random.random())
            print('Reader', self.m, 'stops reading,', i)
            rw.rUnLock()

class Writer(threading.Thread):
    def __init__(self, m):
        threading.Thread.__init__(self)
        self.m = m
    def run(self):
        for i in range(5):
            time.sleep(random.random())
            rw.wLock()
            print('Writer', self.m, 'starts writing,', i)
            time.sleep(random.random())
            print('Writer', self.m, 'stops writing,', i)
            rw.wUnLock()

All that remains is to implement the RWLock class

Solution using two locks

In this solution, we use two locks (let’s call them rmut and wmut), but no condition variables. However, at least one of them (wmut) must permit release by a different thread than the one that acquired it. So, we finally get to use threading.Lock class instead of threading.RLock class we used previously in this series. You can recall that, locks of class Lock can be released by any thread.

In addition to both locks, we also use a shared counter of currently reading readers. The idea is as follows: Since the writer must be alone in the critical section, mutual exclusion of writers is implemented in the usual way: before entering the critical section the writer acquires `wmut’ lock, and releases it when leaving critical section.

In the case of readers it is more complicated: The shared counter is used to store information about the number of readers currently in the critical section. The first reader in the section must acquire wmut to exclude writers, but as long as there is at least one other reader in the critical section already, subsequent readers do not try to acquire the lock. Also, only the last reader to leave the critical section releases wmut. Note that the last reader does not need to be the same as the first reader, hence the requirement that wmut can be released by other thread than the one which acquired it. Of course, all the operations on the readers’ counter and wmut by readers must be executed atomically without any interleaving, so we need a separate rmut lock. The full implementation is shown in the listing below:

class RWLock:
    def __init__(self):
        self.rmut = threading.Lock()
        self.wmut = threading.Lock()
        self.readers = 0
    def rLock(self):
        with self.rmut:
            self.readers += 1
            if self.readers == 1:
                self.wmut.acquire()
    def rUnLock(self):
        with self.rmut:
            self.readers -= 1
            if self.readers == 0:
                self.wmut.release()
    def wLock(self):
        self.wmut.acquire()
    def wUnLock(self):
        self.wmut.release()

The above solution can lead to starvation of writers: it may happen that readers will enter and exit the critical section in such a way that the critical section never becomes empty and readers never release the wmut lock once they acquire it. This can be seen by running the modified program, especially if we change the probability of creating a reader thread to 0.8 in the factory(m) function. Then we’ll see that in some runs only after all or most reader threads exit, the writers can execute. In this case there is no actual starvation of writers only because readers execute only for a finite amount of time. Even in this case, there may be many reasons why “all readers first, then writers” might be undesirable.

Solution using condition variables

The idea is this: we use a conditional variable and an associated lock to wait for certain conditions to be met. The state of the system related to synchronization is stored in three shared variables:

readers stores the number of readers currently in the critical section.
waiting_writers stores the number of writers waiting to enter the critical section
writer_active is a boolean, True when there is a writer in the critical section, False otherwise (there can be no more than one writer in the critical section so boolean value suffices).

According to the algorithm, the reader can enter critical section only if

No writer is in critical section (not writer_active)
and no writer waits to enter the critical section (waiting_writers == 0)

The writer can enter the critical section only when

no reader is in the critical section (readers == 0)
and no writer is in the critical section (not writer_active)

Of course, readers and writers entering and leaving the critical section modify the readers, waiting_writers and writer_active variables accordingly. In addition, the last reader leaving the critical section signals a conditional variable waking up both the writers waiting outside the section and the late readers. The conditional variable is also signaled by the writer leaving the critical section.

It is easy to see that the above algorithm prefers writers. In particular, it can lead to starvation of readers when there is always a writer waiting outside the critical section (and another one waits before he enters the section).

TASK FOR YOURSELF Implement the RW lock in accordance with the given algorithm. The lock should have the same methods as the one in the previous subsection (only implemented differently.) Use the implemented lock in the sample program.

Timestamp: March 9, 2023
Tags: #python , #concurrency