Create groups from lists with itertools.groupby

Simple lists

Given a list that looks like this:

animals = ['cow', 'cow', 'bird', 'pony', 'pony', 'pony', 'fish', 'cow']

Let’s say we wanted to ‘group’ together all the animals that are the same. Instead of looping through all the elements and keep temporary lists, let’s use itertools.groupby.

import itertools
for key, group in itertools.groupby(animals):
    print key, group

We effectively get :

cow <itertools._grouper object at 0x7faa6266af50>
bird <itertools._grouper object at 0x7faa6266af90>
pony <itertools._grouper object at 0x7faa6266af50>
fish <itertools._grouper object at 0x7faa6266af90>
cow <itertools._grouper object at 0x7faa6266af50>

From the docs:

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

You’ll notice we have multiple groups of the same animal. To group ALL of them together, just sort the list.

import itertools
for key, group in itertools.groupby(sorted(animals)):
    print key, group
bird <itertools._grouper object at 0x7f6e21c97f10>
cow <itertools._grouper object at 0x7f6e21c97f50>
fish <itertools._grouper object at 0x7f6e21c97f10>
pony <itertools._grouper object at 0x7f6e21c97f50>

And since we sorted, we also get things in alphabetical order.

Lists of dictionaries

Now given the following dict :

animals = [
    {'name':'cow', 'size':'large'}
    {'name':'bird', 'size':'small'},
    {'name':'fish', 'size':'small'},
    {'name':'rabbit', 'size':'medium'},
    {'name':'pony', 'size':'large'},
    {'name':'squirrel', 'size':'medium'},
    {'name':'fox', 'size':'medium'}
]

Let’s say we wanted to group by animal size. We could do something like:

for x in xrange(len(animals)):
    if x>0 and animals[x]['size'] == animals[x-1]['size']:
        #add to a new dict or something...

But that wouldn’t be very pythonic…With itertools:

import itertools
for key, group in itertools.groupby(animals, key=lambda x:x['size']):
    print key, group
large <itertools._grouper object at 0x7fce989a2f50>
small <itertools._grouper object at 0x7fce989a2f90>
medium <itertools._grouper object at 0x7fce989a2f50>
large <itertools._grouper object at 0x7fce989a2f90>
medium <itertools._grouper object at 0x7fce989a2f50>

Once again, let’s sort that list:

import itertools
from operator import itemgetter
sorted_animals = sorted(animals, key=itemgetter('size'))
for key, group in itertools.groupby(sorted_animals, key=lambda x:x['size']):
    print key,
    print list(group)

And the result:

large [{'name': 'cow', 'size': 'large'}, {'name': 'pony', 'size': 'large'}]
medium [{'name': 'rabbit', 'size': 'medium'}, {'name': 'squirrel', 'size': 'medium'}, {'name': 'fox', 'size': 'medium'}]
small [{'name': 'bird', 'size': 'small'}, {'name': 'fish', 'size': 'small'}]