multiprocessing | Binders Full of Codes

I was revisiting Jeff Knupp’s great article from 2012, Python’s hardest problem. It talks about the Global Interpreter Lock, or GIL, in python. It basically explains how the GIL works, and why it’s such an important problem for python coders.

Probably the most notable consequence of the GIL is that python cannot do “pure” multi-threaded operations, in the sense that only one thread can execute at any time. The GIL prevents strange things from happening when you can have more than one thread write to the same chunk of memory. Knupp also wrote a follow-up to that article, Python’s hardest problem, revisited, where he advises people who want to do many things at the same time (parallelism) to use the multiprocessing module.

It’s great advice, I’ve used multiprocessing in the wild. It needs a bit more effort to communicate data between the processes (typically using queues), but it’s well worth the security that separate processes afford you. In essence, every process can then run it’s own thread without sharing any memory.

As I was reading down the article, I noticed he didn’t have any examples! So I started playing around, just for fun. Let’s make a very simple program that appends integers from 0 to 999998 and discards the list, 50 times.

Version 1: simple single-threaded

import time
 
nb_repeat = 50
 
 
def a_complex_operation(*args):
    a = []
    for x in range(999999):
        a.append(x)
    return None
 
 
t1 = time.time()
for _ in range(nb_repeat):
    a_complex_operation()
print time.time()-t1

Running time: 4.82960796356 seconds

Version 2: with processes

from multiprocessing import Pool
nb_repeat = 50
 
def a_complex_operation(*args):
    a = []
    for x in range(999999):
        a.append(x)
    return None
pool = Pool(processes=nb_repeat)
results = pool.map(a_complex_operation, [None for _ in range(nb_repeat)])

Running time: 2.74916887283 seconds! Almost half the initial time.

Version 3: threaded version

from threading import Thread
import time
 
nb_repeat = 50
 
def a_complex_operation(*args):
    a = []
    for x in range(999999):
        a.append(x)
    return None
 
t1 = time.time()
threads = []
for _ in range(nb_repeat):
    threads.append(Thread(target=a_complex_operation))
 
[x.start() for x in threads]
[x.join() for x in threads]
print time.time()-t1

Running time: 14.0888431072 seconds!

Not extremely surprising, but quite interesting still. As expected, the version with processes is the fastest. But threading our program does not only not improve its running time, it actually slows it down quite a bit. This is probably due to the overhead involved in switching context between 50 threads. The multiprocess version is a nice little optimization, but it’s fair to say that the normal, single-threaded version is running pretty quickly too. When in doubt, keep it simple!