I was revisiting Jeff Knupp’s great article from 2012, Python’s hardest problem. It talks about the Global Interpreter Lock, or GIL, in python. It basically explains how the GIL works, and why it’s such an important problem for python coders.
Probably the most notable consequence of the GIL is that python cannot do “pure” multi-threaded operations, in the sense that only one thread can execute at any time. The GIL prevents strange things from happening when you can have more than one thread write to the same chunk of memory. Knupp also wrote a follow-up to that article, Python’s hardest problem, revisited, where he advises people who want to do many things at the same time (parallelism) to use the multiprocessing module.
It’s great advice, I’ve used multiprocessing in the wild. It needs a bit more effort to communicate data between the processes (typically using queues), but it’s well worth the security that separate processes afford you. In essence, every process can then run it’s own thread without sharing any memory.
As I was reading down the article, I noticed he didn’t have any examples! So I started playing around, just for fun. Let’s make a very simple program that appends integers from 0 to 999998 and discards the list, 50 times.
Version 1: simple single-threaded
import time nb_repeat = 50 def a_complex_operation(*args): a = [] for x in range(999999): a.append(x) return None t1 = time.time() for _ in range(nb_repeat): a_complex_operation() print time.time()-t1 |
Running time: 4.82960796356 seconds
Version 2: with processes
from multiprocessing import Pool nb_repeat = 50 def a_complex_operation(*args): a = [] for x in range(999999): a.append(x) return None pool = Pool(processes=nb_repeat) results = pool.map(a_complex_operation, [None for _ in range(nb_repeat)]) |
Running time: 2.74916887283 seconds! Almost half the initial time.
Version 3: threaded version
from threading import Thread import time nb_repeat = 50 def a_complex_operation(*args): a = [] for x in range(999999): a.append(x) return None t1 = time.time() threads = [] for _ in range(nb_repeat): threads.append(Thread(target=a_complex_operation)) [x.start() for x in threads] [x.join() for x in threads] print time.time()-t1 |
Running time: 14.0888431072 seconds!
Not extremely surprising, but quite interesting still. As expected, the version with processes is the fastest. But threading our program does not only not improve its running time, it actually slows it down quite a bit. This is probably due to the overhead involved in switching context between 50 threads. The multiprocess version is a nice little optimization, but it’s fair to say that the normal, single-threaded version is running pretty quickly too. When in doubt, keep it simple!