Python: Tips, Tricks and Idioms

My programming language of preference is python because I feel I write better code faster with it than I do with other languages. However, it also has a lot of nice tricks and idioms to do things well. Therefore, and partly as a reminder to use them, and partly because I thought this might be of general interest, I have put together this collection of some of my favourite idioms. I am also putting this on gist.github.com so that anyone that wants to contribute their things can, and I will try and keep this post up to date.

enumerate

A fairly common thing to do is loop over a list while keeping track of what index we are up to. Now we could use a count variable, but python gives us a nicer syntax for this with the enumerate() function.

students = ('James', 'Andrew', 'Mark')
for i, student in enumerate(students):
    print i, student
# output:
# 0 James
# 1 Andrew
# 2 Mark

set

set is a useful little data structure, it is kind of like a list, but each value in it is unique. There are some valuable operations, besides creating a list of unique items, that we can do with it. For example, let us try different ways of validating input lists.

colours = set(['red', 'green', 'blue', 'yellow', 'orange', 'black', 'white'])

# or using the newer syntax to declare the set.
input_values = {'red', 'black', 'pizza'}

# get a list off the valid colours
valid_values = input_values.intersection(colours)

print valid_values
# output set(['black', 'red'])

# get a list of the invalid colours
invalid_values = input_values.difference(colours)

print invalid_values
# output set(['pizza'])

# throw exception if there is something invalid 
if not input_values.issubset(colours):
    raise ValueError("Invalid colour: " + ", ".join(input_values.difference(colours)))

Control statements

with

The with statement is useful when accessing anything that supports the context management protocol. This means open() for example. It ensures that any set-up and clean-up code, such as closing files, is run without worrying about it. So, for example, to open a file:

with open('/etc/passwd', 'r') as f:
    print f.read()

for … else

This is an interesting bit of syntax. It allows you to run some code if the loop never reached the break statement. It replaces the need to keep a tracking variable for if you broke or not. Just looking over my code, here is a pseudo version of something I was doing.

# some code

for file_name in file_list:
    if is_build_file(file_name):
        break
else: # no break
    make_built_file()

# something else here

Conditional Expressions

Python allows for conditional expressions, so instead of writing an if .. else with just one variable assignment in each branch, you can do the following:

# make number always be odd
number = count if count % 2 else count - 1

# call function if object is not None 
name = user.name() if user is not None else 'Guest'
print "Hello", name

This is one of the reasons I like python. The above is very readable, compared to the teneray operator that looks like a ? b : c that exits in other languages. It always confuses me.

List Comprehension

List comprehensions are supposed to replace building a list by looping and calling append. Compare the following.

numbers = [1, 2, 3, 4, 5, 6, 7]
squares = []
for num in numbers:
    squares.append(num * num)

# with a list compression 
squares = [num * num for num in numbers]

We can also make this more complicated by adding in filtering or putting a conditional assignment in:

numbers = [1, 2, 3, 4, 5, 6, 7]

# squares of all the odd numbers
squares = [num * num for num in numbers if num % 2]

# times even numbers by 2 and odd numbers by 3
mul = [num * 3 if num % 2 else num * 2 for num in numbers]

Generator expressions

List comprehensions have one possible problem: they build the list in memory right away. If you are dealing with big data sets, that can be a big problem, but even with small lists, it is still extra overhead that might not be needed if you are only going to loop over the results once there is no gain in building this list. So if you can give up being able to index into the result and do other list operations, you can use a generator expression, which uses very similar syntax, but creates a lazy object that computes nothing until you ask for a value.

# generator expression for the square of all the numbers
squares = (num * num for num in numbers)

# where you would likely get a memory problem otherwise

with open('/some/number/file', 'r') as f:
    squares = (int(num) * int(num) for num in f)
    # do something with these numbers

Generators using yield

Generator expressions are great, but sometimes you want something with similar properties but not limited by the syntax that generators use. Enter the yield statement. So, for example, the below will create a generator is an infinite series of random numbers. So as long as we keep asking for another random number, it will happily supply one.

import random
def random_numbers(high=1000):
    while True:
        yield random.randint(0, high)

Dictionary Comprehensions

One generator use can be to build a dictionary, like in the first example below. This proved itself to be common enough that now there is even a new dictionary comprehension syntax for it. Both of these examples swap the keys and values of the dictionary.

teachers = {
    'Andy': 'English',
    'Joan': 'Maths',
    'Alice': 'Computer Science',
}
# using a list comprehension
subjects = dict((subject, teacher) for teacher, subject in teachers.items())

# using a dictionary comprehension
subjects = {subject: teacher for teacher, subject in teachers.items()}

zip

If you thought that generating an infinite number of random int was not that useful, well, here I want to use it to show another function that I like to use zip(). zip() takes several iterables and joins the nth item of each into a tuple. So, for example:

names = ('James', 'Andrew', 'Mark')
for i, name in zip(random_numbers(), names):
    print i, name

# output:
# 288 James
# 884 Andrew
# 133 Mark

So basically, it prints out all the names with a random number (from our previous random number generator) next to a name. Notice that zip() will stop as soon as it reaches the end of the shortest iterable. However, if that is not desired, the itertools module has one that goes till the end of the longest.

We could also do something similar to get a dict of each name mapped to a random number like this.

dict(zip(names, random_numbers()))

# output: {'James': 992, 'Andrew': 173, 'Mark': 329}

itertools

I mentioned itertools before. It is worth reading through if you have not looked at it before. Plus, at the end, there is a whole section of recipes on how to use the module to create even more interesting operators on iterables.

Collections

Python comes with a module that contains several container data types called Collections. Though I only want to look at two right, now it also has three more called namedtuple(), deque (a linked list like structure), and OrderedDict.

defaultdict

This is a data type that I use a fair bit. One practical case is when you are appending to lists inside a dictionary. If you are using a dict() you would need to check if the key exists before appending, but with defaultdict, this is not required. So, for example.

from collections import defaultdict

order = (
    ('Mark', 'Steak'),
    ('Andrew', 'Veggie Burger'),
    ('James', 'Steak'),
    ('Mark', 'Beer'),
    ('Andrew', 'Beer'),
    ('James', 'Wine'),
)

group_order = defaultdict(list)

for name, menu_item in order:
    group_order[name].append(menu_item)

print group_order

# output
# defaultdict(<type 'list'>, {
#     'James': ['Steak', 'Wine'],
#     'Andrew': ['Veggie Burger', 'Beer'],
#     'Mark': ['Steak', 'Beer']
# })

We could also count them like this.

order_count = defaultdict(int)

for name, menu_item in order:
    order_count[menu_item] += 1

print order_count

# output
# defaultdict(<type 'int'>, {
#     'Beer': 2, 
#     'Steak': 2, 
#     'Wine': 1, 
#     'Veggie Burger': 1
# })

Counter

But the last example is redundant because Collections already contains a class for doing this, called Counter. In this case, I need to first extract the second item from each tuple, for which I can use a generator expression.

from collections import Counter

order_count =  Counter(menu_item for name, menu_item in order)
print order_count

# output
# Counter({
#    'Beer': 2,
#    'Steak': 2,
#    'Wine': 1,
#    'Veggie Burger': 1
# })

Another better example might be counting all the different lines that appear in a file. It becomes straightforward.

with open('/some/file', 'r') as f:
    line_count = Counter(f)

If you enjoyed this post or found it helpful, please leave a comment or share it on Twitter. Also, if people find this useful, I will try and do some follow-up posts explaining some things in more detail and with additional examples.

Edit: if you have found this helpful but want more, there is an excellent book, Python Cookbook, Third edition by O’Reilly Media, that has a whole lot more. If you want something simpler, try Learning Python, 5th Edition.