Bartosz Zieliński

Threads in Python

Module threading in Python3 contains functions and objects supporting creation, usage and synchronization of threads. Importantly, this module is portable: scripts using threading should work the same way under Linux, Windows, Mac, and other operating systems.

One limitation of threading module (at least in the CPython implementation) is a global interpreter lock which must be acquired by any thread executing Python interpreter. This effectively makes it impossible for pure Python threads within the same process to execute concurrently on multiple processor cores. However, io operations release interpreter lock so using threading still makes sense if your program is io-bound. Incidentally, asyncio does not use true asynchronous operations for file-io. It turns out that while such operations are commonly supported by operating systems, they are not supported in a portable way. And so asyncio simulates asynchronous file input/output using threads anyway. Also, if your script is so computationally intensive that you want it parallelized you are probably not using pure Python, but some extensions written in C. And many of those extensions (e.g., numpy) are polite enough to release the interpreter lock during computation. Finally, if you really need CPU-bound pure python code to execute concurrently, you can use processes, since global interpreter lock applies to a single process only.

Class Thread

In module threading threads are represented as instances of classes inheriting from a class Thread or as instances of class Thread itself. Note that the creation of Thread object does not actually create the associated thread. We will use the following (argumentless) methods of Thread:

As a demonstration we will write now a simple script which creates a number of threads and waits until they finish. Each thread will display a sequence of messages on the terminal, and because we actually want to see some interleaving, we will introduce artificial random delays between messages. Thus, in addition to Thread class from threading module we will also need

from threading import Thread
from time import sleep
from random import random

Now we are ready to create a class CountingThread. Instances of this class will represent our message printing threads:

class CountingThread(Thread):
    def __init__(self, n):
        Thread.__init__(self)
        self.n = n
    def run(self):
        print(f'Starting thread {self.n}')
        for m in range(10):
            sleep(random())
            print(f'Thread {self.n}, {m}\'th iteration')
        print(f'Thread {self.n} exiting')

The class inherits from Thread, defines its own one-argument constructor and redefines method run(). Note that constructor and all the methods (even argumentless ones) are defined with an additional argument self through which a reference to a class instance will be passed. Thus, if we have the class defined with

class Test:
    def __init__(self, v):
        self.v=v 
    def f(a,b,c):
        print(f'{a.v}, {b}, {c}')

and we create an object of this class with x = Test(10), then x.f(1,2) is a syntactic sugar for Test.f(x,1,2). When executing Test(10) Python interpreter first creates an empty object y and then calls Test.__init__(y, 10).

Let us return to our definition of CountingThread. Constructor’s argument (i.e., n) is saved as a field of the instance (where it can be accessed by the associated thread). We also call a constructor of a parent class (explicitly passing self argument). In the run() method we simply print 10 messages with a random time delay between messages. Each message contains thread and iteration number.

Next, we create a list of 5 instances of CountingThread using list comprehension syntax and save it in variable threads:

threads = [CountingThread(n) for n in range(5)]

Note that no actual threads are active at this moment (aside from the main thread). To start the threads we have to execute start() on each thread object:

for t in threads:
    t.start()

The whole script is in the listing below:

from threading import Thread
from time import sleep
from random import random

class CountingThread(Thread):
    def __init__(self, n):
        Thread.__init__(self)
        self.n = n
    def run(self):
        print(f'Starting thread {self.n}')
        for m in range(10):
            sleep(random())
            print(f'Thread {self.n}, {m}\'th iteration')
        print(f'Thread {self.n} exiting')

threads = [CountingThread(n) for n in range(5)]

for t in threads:
    t.start()

If you run the script in Python3 interpreter (which under Windows will be called python, but under Linux or Mac python3) you will find that the execution of threads is interleaved. Note also that even though execution of a main thread ended after starting counting threads, the execution of the script does not end until the last thread finishes (more precisely, until the last non-demonic thread finishes, but threads are non-demonic by default).

Now, if you want to wait for all the counting threads to finish and then do some more work you can add something like that to the script:

for t in threads:
    t.join()

print('All the threads finished')
# some more work, perhaps using the 
# results of threads execution...

I noticed that in their own projects some students try to avoid code duplication by collapsing the last two loops into one:

for t in threads:
    t.start()
    t.join()

If you do that you will notice that the threads are now executed sequentially because we wait for a thread to finish before we start the next one.

Critical sections

Threads in the previous section did not communicate or shared resources. Thus, all interleavings of operations of threads were acceptable to us. However, when they modify shared data we might want to ensure that certain interleavings do not happen. Consider the following script which runs 20 threads. Each thread increases value of a global counter, and the script reports the final counter value after the last thread finishes:

from threading import Thread

counter = 0

def count():
    global counter 
    counter = counter + 1

threads = [Thread(target=count) for i in range(20)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(f'Threads finished, counter={counter}')

In this case our threads are so simple that we do not bother creating class inheriting from Thread. Instead, we use Thread class directly. Its constructor accepts a keyword argument target, which, if given, is expected to be a function which will be executed in the associated thread. In our case constructor receives count function which increases the value of a global variable counter. In Python, by default assignment to the variable in the body of a function creates a new local variable. To prevent that, and to assign to the global variable we need to declare it global.

If you run the above script it will most probably work as expected, i.e., it will report 20 as the final value of counter. However there is still a very small probability that the reported value will be smaller. Why? Because counter=counter+1 is not an atomic operation. It consists of reading the value of a variable, and then adding one to it, and writing the new value. Those operations in different threads can interleave, and hence it is possible (if not very probable) that two threads read the same value and then write the same incremented value. This leads to lost increments. The fact that it is not very probable is very bad because it means that some errors may be missed by testing and lay undiscovered for years. Until they hit hard.

To increase the probability of lost increments, let us introduce random time delays between reading a global variable and writing its new value (we also introduce a random delay between start of a thread and reading the counter):

from threading import Thread
from time import sleep
from random import random

counter = 0

def count():
    global counter 
    sleep(5*random())
    x = counter + 1
    sleep(random())
    counter = x 
    print(x)

threads = [Thread(target=count) for i in range(20)]
for t in threads:
    t.start()

for t in threads:
    t.join()

print(f'Threads finished, counter={counter}')

Now, instead of 20, the final value of a counter is usually around 10. Now it is clear that something is wrong here, and we need to ensure each counter increment happens without interleaving with counter increments in other threads. Usually it is framed in terms of critical sections.

A critical section can be understood as a section of code such that more than one thread or process must not simultanously execute code in this section. This property is called mutual exclusion. Here by execute I mean that the thread’s program counter points to some line of code in the critical section. The thread may be suspended, sleeping, waiting for io, etc., but still no other thread may enter the critical section until the program counter of the thread currently in the critical section is updated to point to some instruction outside the critical section. The idea is that each critical section is associated with some shared resources, and the access to those resources should happen without interleaving with accesses to the same resources in other threads.

Usually we ensure mutual exclusion property protecting critical sections with locks.

Locks

Locks are special system objects which can be acquired only by a single thread at a time. A thread which acquires a lock holds it until it releases it. A thread which tries to acquire a lock which is already held by another thread must is put to sleep until the current holder releases the lock when it tries to acquire it again. We can trust that the implementation guarantees that attempts to acquire a single lock do not lead to starvation, unless of course some thread never releases the lock after acquiring it.

The idea of using locks to protect the critical section is that there will be a lock associated with each critical section. A thread then is obliged to acquire the asssociated lock before entering the critical section, and it releases it when leaving the critical section:

# some code outside critical section...
lock.acquire() # we acquire the lock
# code inside critical section...
lock.release() # we release the lock
# code outside critical section...

This works because only one thread can acquire and hold a given lock. When a thread is inside critical section it holds a lock. Any other thread trying to enter a critical section will have to acquire the same lock which will not succeed after it is released by the thread currently holding it, which will not happen until that thread leaves the critical section.

In Python3 module threading provides two types of locks represented, respectively, by instances of classes Lock and RLock. Both classes support methods acquire() and release(). The differences between them are somewhat subtle, but occassionally important:

Let us use RLock to correct our counter code from above:

from threading import Thread, RLock
from time import sleep
from random import random

counter = 0
lock = RLock()

def count():
    global counter 
    sleep(5*random())
    lock.acquire() # we acquire lock before modifying the counter
    x = counter + 1
    sleep(random())
    counter = x 
    lock.release() # we release the lock after modifying the counter
    print(x)

threads = [Thread(target=count) for i in range(20)]
for t in threads:
    t.start()

for t in threads:
    t.join()

print(f'Threads finished, counter={counter}')

Now it reports only 20 in the end (no increments lost). One subtlety in the code above is that it is vital that the same RLock object is used by all the threads. If each thread created its own lock there would be no synchronization, and it is another error commonly made by students (or by exhausted professionals).

Locks and with

Locks provided by threading are context managers, i.e., they can be used with with. More precisely, instead of

lock.acquire()
# code in critical section
lock.release()

we may write

with lock:
    # code in critical section

and lock.acquire() and lock.release() will be executed automatically at the beginning and the end, respectively, of the with block. The advantage (aside from the shorter code) is that it is now impossible to forget about releasing an acquired lock. Also, lock.release() will be called even if the with block contains a return statement or throws an exception. The latter may or may not be an advantage as deadlocking the system may be preferable to releasing the lock while, perhaps, leaving the shared date in an inconsistent state.

There is nothing mysterious in with. You can use with with any object which implements two methods: __enter__() and __exit__(). The first one is called when entering the block, the other when leaving. In case of locks in the threading module, __enter__() and __exit__() methods of lock classes simply call acquire() and release(), respectively.

Other synchronization mechanisms

The problem of mutual exclusion is the most important, but by far not the only synchronization problem. Locks are specialized tools for solving mutual exclusion problem and protecting critical sections. More general synchronization problems require stronger synchronization objects, such as semaphores or condition variables.