How to Create an Ordered Counter class in Python

by Michael

After I made my post How to Group and Count with Dictionaries I noticed there was something nice I could have added in. How to make a counter class that also remembers the order in which it first found something. It super easy to do in Python, see the following.

from collections import Counter, OrderedDict class OrderedCounter(Counter, OrderedDict): pass

This works because of the method resolution order. You can find out really easy what that is by trying help(OrderedCounter) into the python console. I have reproduced what you would get for Counter and OrderedCounter below.

""" Counter | Method resolution order: | Counter | __builtin__.dict | __builtin__.object OrderedCounter | Method resolution order: | OrderedCounter | collections.Counter | collections.OrderedDict | __builtin__.dict | __builtin__.object """

What has changed is that now OrderedDict has been inserted in the order in which methods which be searched for when using OrderedCounter just before dict. So any of the dictionary methods will come OrderedDict instead of dict giving us the result we want.

This can be used on any class that inherits from dict and you want to keep the order of its key as that in which they are inserted. And in the more general case it can be used to create a new class uses and alternative implementation of some interface.

For more information on this, there is a great post by Raymond Hettinger called Python’s super() considered super! that goes into all this in way more detail.

The property function: why Python does not have Private Methods

by Michael

Every now and then, I see it come up that Python should have private or protected methods on its classes. This typically comes from someone that is used to programming in Java or C++ or the like, who misses them and insists they are important for any programming language. But python does not have them and never will. Private methods are a solution to a problem that works in statically typed languages. It is not the solution for a highly dynamic language like Python.

tl;dr Python has a dynamic way of fixing this problem, just scroll down to the end to see the code.

But let us look at the problem starting with an example class. Lets do something non trivial, that might actually be used in the real world: a class that handles colour manipulation, maybe because I am building a colour picker, or doing some else cool with colours. So my code might start out like this.

class Colour(object): """A colour wrangler class""" def __init__(self, hexcode="ff0000"): self.hexcode = hexcode

I would then go add add a pile of other methods that will be needed, but let us focus on this. The class has this hexcode attribute, that anyone can now access. So I build the rest of my project around this, and access the hexcode in a lot of places in my code, and also others start doing it in theirs as well.

But then I hit a problem, storing the colour as a hex string is both clumsy, since I have to keep getting out the RGB components any time I want to manipulate it, and also when trying to convert to other colour spaces and back I get rounding errors. I now want to store it as three floats. So I update my code.

class Colour(object): """A colour wrangler class""" def __init__(self, hexcode="ff0000"): self.r = float(int(hexcode[0:2], 16)) self.g = float(int(hexcode[2:4], 16)) self.b = float(int(hexcode[4:6], 16))

Now everyone else's code that depended on the hexcode attribute is broken. At this point the the Java/C++ people would say I told you so, if only you had static methods, you could have had getter and setter methods, and kept the attribute private. But that is never going to happen in Python. Python is too dynamic to check this at compile time, and run time checking would be far to expensive. But python does have an elegant solution to this. It is the build in property function. This is how I would now fix my code using it.

class Colour(object): """A colour wrangler class""" def __init__(self, hexcode="ff0000"): self.hexcode = hexcode @property def hexcode(self): """Return the colour as a hex string""" return ("{:02x}" * 3).format(int(self.r), int(self.g), int(self.b)) @hexcode.setter def hexcode(self, hexcode): self.r = float(int(hexcode[0:2], 16)) self.g = float(int(hexcode[2:4], 16)) self.b = float(int(hexcode[4:6], 16))

I can now get and set the hexcode attribute as before, and it will all get transparently handled. I have even used it as a setter in my __init__ method, so I don't have to repeat the conversion.

Note that @property is put first, before the function that acts as the getter, and then @hexcode.setter not @property.setter is used to define the setter. You can't change the order and expect it to still work. Also note that the two functions have the same name. And the doc string from the getter will be given if you do help(Colour.hexcode).

There is also an alternative way of doing the same thing, this time not using property as a decorator.

class Colour(object): """A colour wrangler class""" def __init__(self, hexcode="ff0000"): self.hexcode = hexcode def gethexcode(self): return ("{:02x}" * 3).format(int(self.r), int(self.g), int(self.b)) def sethexcode(self, hexcode): self.r = float(int(hexcode[0:2], 16)) self.g = float(int(hexcode[2:4], 16)) self.b = float(int(hexcode[4:6], 16)) hexcode = property(gethexcode, sethexcode, doc="Return the colour as a hex string")

This gives the same result, except it also exposes the two functions as an alternative way of getting/setting the value. Also for completeness, I guess I should mention there is a deleter as well, which can be passed as the third argument to property, or used as @hexcode.deleter, but that seems less useful to me.

I would say I kind of prefer the python way of doing it. It means I only have to define the getter and setter functions after the fact of an API change, not as some pre-emptive move just in case it changes later. Also it looks cleaner to me to access attributes then call methods.

This can also be used to create read only attributes by leaving out the setter (you will get an AttributeError) or to add some validation in around an attribute to prevent some invalid values be added in. I could in this case make the self.r etc. into attributes as well to make sure they are in the correct range of 0 to 255.

This however is not going stop people messing with variables that you wanted to be protected (indicated by putting an underscore before it). But you told them they are playing with fire, if they get burned, that can't be helped. So this solution of python does not cover all use cases that private and protected methods are supposed to, but it covers enough that I think the need for them is not great in Python. Instead Python has its own and very powerful and flexible solution.

Python How To: Group and Count with Dictionaries

by Michael

In this post I want to have a look at the different possible solutions to two rather simple and closely related problems. How to group objects into a dictionary, or count them. It something that at least I have found I do every now an then in various forms. I want to start from the simplest good solution, and work towards better and faster solutions. As always, the code can be downloaded of gist.github.com.

Edit: Webucator who provides Python training courses, turned this post into a video. You can go check it out at youtu.be/dt4JCYtc1dE.

Grouping

So the problem that we want to solve is that we have some items that we want to group according to some criteria. So in python that is going to be turning a list or some other iterable into a dictionary of lists. We are also going to want some function to create the value we group by, and for the sake of simplicity we will use len() since that is the simplest built in function that gives a key we might group on. In your own code, you would replace that with some logic that gives you the key or tag that you want to group things by.

Or to put that all in pseudo code (also working python), that would be the following:

names = ['mark', 'henry', 'matthew', 'paul', 'luke', 'robert', 'joseph', 'carl', 'michael'] d = {} for name in names: key = len(name) if key not in d: d[key] = [] d[key].append(name) # result: d = {4: ['mark', 'paul', 'luke', 'carl'], # 5: ['henry'], 6: ['robert', 'joseph'], 7: ['matthew', 'michael']}

So that is a nice start, loop over the data, create a key value for each. Then we do a check to see if it already there or not. This here is the best way to check if a key is in a dictionary. The alternatives is either doing a try except around a key look up, which is really ugly and slow. Or checking the return value of d.get(key), but that prevents putting None values in the dictionary.

There is a down side to this, and that is there the key has to be hashed two or three times (python dictionaries are internally a kind of hash map, that gives them their almost linear lookup time). First in the if statement, and a possible second in the assignment to a empty list() and finally in the lookup for the append. That python has to hash the value has an overhead. So how might we do better? The following is one possible better solution.

d = {} for name in names: key = len(name) d.setdefault(key, []).append(name)

This uses the setdefault() method. This is a function, that even the developers of Python admit freely is confusingly named. The problem is any descriptive alternatives look like do_a_get_lookup_but_if_not_found_assign_the_second_argument(). So more or less, the same code as we wrote ourselves before, but since it is done by the dictionary itself, the key is only hashed once. It will be faster when we have lots of values.

This is still not the best code that we could do, there is a nicer way. It involves using a data structure called defaultdict that lives in the collections module. If you have not checked out the collections module recently, I recommend you read its docs, there are a number of very useful utilities in it. With that aside, defaultdict lets us create a dictionary like object that is different only in that if a lookup fails, it uses the argument passed to it during creation (in our case list) to fill that key in. It lets us now write code like this:

from collections import defaultdict d = defaultdict(list) for name in names: key = len(name) d[key].append(name)

So now we can just look up the key and append to it, not worrying about if it exists or not. If it does not, the defaultdict will create the value for us.

Counting

Now we have mastered grouping, counting should be simple. We just have to know that int() when called returns the value 0, so that can be passed to defaultdict. So here we have:

from collections import defaultdict d = defaultdict(int) for name in names: key = len(name) d[key] += 1

Here a common use case is not even to use a key, but to count just the number of times something appears. In that case, we could do the following simplified version.

from collections import defaultdict names = ["mark", "john", "mark", "fred", "paul", "john"] d = defaultdict(int) for name in names: d[name] += 1 #result: d = {'mark': 2, 'john': 2, 'fred': 1, 'paul': 1}

This was considered common enough that there is even a built in way to do this using Counter, so the above can be reduced to this.

from collections import Counter names = ["mark", "john", "mark", "fred", "paul", "john"] d = Counter(names)

Counter comes with some nice little extras, such as being able to add, or subtract results. So we could do something like the following.

from collections import Counter boys = ["mark", "john", "mark", "fred", "paul", "john"] girls = ["mary", "joan", "joan", "emma", "mary"] b = Counter(boys) g = Counter(girls) c = b + g #result: c = Counter({'mark': 2, 'joan': 2, 'john': 2, 'mary': 2, 'fred': 1, 'paul': 1, 'emma': 1})

But what happens if you want to use Counter but need to pass the result though some key function first? How would you do it? The solution would be to put a generator inside of it like the following.

from collections import Counter names = ['mark', 'henry', 'matthew', 'paul', 'luke', 'robert', 'joseph', 'carl', 'michael'] d = Counter(len(name) for name in names)

Useful key functions

Some possible common cases when grouping or counting, is you might want to do so based on some item in or attribute of the items you are grouping. So for examples, your data might be a tuple of first and last names, or a dictionaries with first and last name keys, or a class with first and last name attributes. If that is what you group or count by, there are two built in functions that can help do this, without needing to write our own functions. Those are itemgetter() and attrgetter from the operator module. Some examples might help.

from collections import defaultdict from operator import itemgetter, attrgetter # if names used tuples names = [('mary', 'smith'), ('mark', 'davis')] # the last_name function would look like last_name = itemgetter(1) # if names are dictionaries names = [{'first': 'mary', 'last':'smith'), ('first': 'mark', 'last': 'davis')] # the last_name function would look like last_name = itemgetter('last') # if names are classes with first and last as attributes names = [Person('mary', 'smith'), Person('mark', 'davis')] # the last_name function would look like last_name = attrgetter('last') d = defaultdict(list) for name in names: key = last_name(name) d[key].append(name)

Bonus

When I was studying Software Engineering I got a job tutoring for the first year programming course, which was in python and had 200-300 students depending on semester (hence the need for tutors to help with questions during practicals). One the challengers some of more curious students used to ask, is how I would do certain things in one line (I ended up doing their whole first in a single 1500 character line). Often really bad code, but also often rather interesting trying to reduce a problem to a single statement. I had a shot at doing it for this, and this was the solution that I came up with in a few minutes. I leave working out how it works as an exercise to the reader. I would never use it in production code.

names = ['mark', 'henry', 'matthew', 'paul', 'luke', 'robert', 'joseph', 'carl', 'michael'] # len is our 'key' function here d = {k: [i for x, i in v] for k, v in itertools.groupby(sorted((len(x), x) for x in names), key=operator.itemgetter(0))}

Python: Tips, Tricks and Idioms - Part 2 - Decorators and Context Managers

by Michael

Two weeks ago I wrote a post called Python: tips, tricks and idioms where I went though a whole lot of features of python. Now however I want to narrow down on just a few, and look at them in more depth. The first is decorators, which I did not cover at all, and the second is context managers which I only gave one example of. Again all the code samples are on gist.github.com.

There is a reason that I put them together; they both have the same goal. They can both help separate what your trying to do (the "business logic") from some extra code added for clean up or performance etc. (the "administrative logic"). So basically it helps package away in a reusable way code that really we don't care too much about.

Decorators

Decorators are really easy to use, and even if you have never seen them before. You could probably guess what is going on, even if not how or why. Take this for example:

Over looking for now exactly what library urlopen comes from... We can guess that the results of the web_lookup() are going to be cached, so that they are not fetched ever time we ask for the same url. So simple, just know we can put @some_decorator before our function and we can use any decorator. But how do we write one?

First we need to understand what the decorator is doing. @cache is actually just syntactic sugar for the following.

So this is important. What cache is, is a function that takes another function as an argument, and returns a new function, that can be used just like the function could be before, but presumably adding in some extra logic. So for our first decorator let us start with something simple, a decorator that squares the result of the function it wraps.

So in this little example every time we call plus() the number will have 1 added it it, but because of our decorator, the result will also be squared.

But there is a problem with this, plus() is no longer the plus() function that we defined, but another function wrapping it. So things like the doc string have gone missing. Things like help(plus) will no longer work. But in the functools library there is a decorator to fix that, functools.wraps(). Always use functools.wraps() when writing a decorator.

But did you notice? wraps() is a decorator that takes arguments. How do we write something like that? It gets a little more involved, but let us start with the code. It will be a function the now raises the number to some power.

So yes three functions, one inside the other. The result now, is that power() is not longer really the decorator, but a function that returns the decorator. See the issue is one of scoping, which we why we have to put the functions inside each other. When _pow() is called, the value of pow, comes from the scope of the power() function that contains it.

So now we know how to write highly reusable function decorators, or do we? There is a problem still, and that is our internal function _square() or _pow() only takes one argument, so that any function it wraps, can only take one argument as well. What we want is to be able to have any number of arguments. So that is where the star operator comes in.

Star operator

The * (star) operator can be used in a function definition so that it can take an arbitrary number of arguments, all of which are collapsed into a single tuple. An example might help.

The * operator can also be used for the reverse case, when we have a iterator, but want to pass that as the arguments to a function. This gets called argument unpacking.

The same basic idea, can also be used for keyword arguments, but for this we use the ** (double star) operator. But instead of getting a list of the arguments, we get a dictionary. We can also use both together. So some examples.

Better Decorators

So now we can go back and change our decorator to be truly generic. Lets do it with the simplest one, we wrote, @square.

Now no matter what arguments the function takes, we will happily just pass them though to the function that we are wrapping.

So let us go back to our web_lookup function, and write first it first with caching, and then write the decorator to see the difference.

That is how it might look, and our problem here is that the caching code is mixed up in with what web_lookup() is supposed to do. Makes it harder to maintain it, reuse it, and also harder if we want to update the way we cache it if we have done something like this all over our code. So our very generic decorator might look like this.

So that can wrap any function with any number of arguments just by putting @cache before it. But I did not write that function myself, I just lifted it right off the Python Decorator Library that has many many examples of decorators you can use.

Context Managers

In the previous post I did a single example of using a context manager, opening a file. It looked like this:

Admittedly the context manager is only a little shorter, and the file will be garbage collected (at least in CPython) but there are other cases where it might be a lot bigger problem if you get it wrong. For example the threading library can also use a context manager.

Which is nice and simple, and you can see from the indent what the critical section is, and also be sure the lock is released.

If you dealing with file like objects that don't have a content manager, the contextlib has the closing context manager. So let us go back an improve our web_lookup() function.

We can also write our own context managers. All that is needed to to used the @contextmanager decorator, and have a function with a yield in it. The yield marks the point at which the context manager stops while the code with in the with statement runs. So the following can be used to time how long it takes to do something.

The try finally in this case is kind of optional, but without it the time would not be printed if there was an exception raised inside the with statement.

We can also do more complicated context managers. The following is something like what was added in Python 3.4. It will take what every is printed to sysout, and put it in a file (or file like object). So for example if we had all our timeit() context managers in your code, and wanted to start putting the results into a log file. Here the yield is followed by a value, which is why we can then use the with ... as syntax.

The last with statement also shows off the use of compound with statements. It is really just the same as putting one with inside another.

Finally it is worth mentioning at least in passing, that any class can be turned into a context manager by adding the __enter__() and __exit__() methods. The code in either will do more or less what the code either side of the yield statement would do.

And that is all for this round, hope you learned something new and interesting. Don't forget to follow me on twitter if you want more python tips, such as when next time I write about sets and dictionaries. In the mean time if you looking for more there is a great book Python Cookbook, Third edition by O'Reilly Media. I have been reading parts of it, and might include a few things I learned from it in my next post. Or if you want something simpler try Learning Python, 5th Edition.

Python: Tips, Tricks and Idioms

by Michael

My programming language of preference is python for the simple reason that I feel I write better code faster with it then I do with other languages. However also has a lot of nice tricks and idioms to do things well. And partly as a reminder to myself to use them, and partly because I thought this might be of general interest I have put together this collection of some of my favourite idioms. I am also putting this on gist.github.com so that anyone that wants to contribute there own things can, and I will try and keep this post up to date.

enumerate

A fairly common thing to do is loop over a list while also keeping track of what index we are up to. Now we could use a count variable, but python gives us a nicer syntax for this with the enumerate() function.

set

set is a useful little data structure, it is kind of like a list but each value in it is unique. There are some useful operations, besides creating a list of unique items, that we can do with it. For example let try some different ways of validating lists of input.

Control statements

with

The with statement is a useful when accessing anything that supports the context management protocol. Which means open() for example. It basically makes sure that any set up and clean up code, such as closing files, is run without having to worry about it. So for example to open a file:

for ... else

This is a interesting bit of syntax, it allows you to run some code if the loop never reached the break statement. It basically replaces the need to keep a tracking variable for if you broke or not. Just looking over my code, here is a pseudo version of something I was doing.

Conditional Expressions

Python allows for conditional expressions, so instead of writing an if .. else with just one variable assignment in each branch, you can do the following:

This one of the reasons I really like python. That above is very readable, compared to the teneray operator that looks like a ? b : c that exits in other languages. It always confuses me.

List Comprension

List comprehensions are supposed to replace building a list by looping and calling append. Compare the following.

We can also make this more complicated, by adding in filtering or putting a conditional assignment in:

Generator expressions

List comprehensions have one possible problem, and that is they build the list in memory right away. If your dealing with big data sets, that can be a big problem, but even with small lists, it is still extra overhead that might not be needed. If you are only going to loop over the results once it has no gain in building this list. So if you can give up being able to index into the result, and do other list operations, you can use a generator expression, which uses very similar syntax, but creates a lazy object, that computes nothing until you ask for a value.

Generators using yield

Generator expressions are great, but sometimes you want something, with similar properties, but not limited by the syntax that generators use. Enter the yield statement. So for example the below will create generator is an infinite series of random numbers. So as long as we continue to keep asking for another random number, it will happy supply one.

Dictionary Comprehensions

One use for generators can be to build a dictionary, like in the first example below. This proved itself to be common enough that now there is even a new dictionary comprehension syntax for it. Both of these examples swap the keys and values of the dictionary.

zip

If you thought that generating an infinite number of random int was not that useful, well here I want to use it to show of another function that I like to use zip(). zip() takes a number of iterables, and joins the nth item of each into a tuple. So for example:

So basically, it prints out all the names with a random number (from our random number generator from before) next to a name. Notice that zip() will stop as soon as it reaches the end of the shortest iterable. If that is not desired, the itertools module, has one that goes till the end of the longest.

We could also do something similar to get a dict of each name mapped to a random number like this.

itertools

I mentioned itertools before. If you have not looked at it before, it is worth reading thought, plus at the end there is a whole section of recipes on how to use the module to create even more interesting operators on iterables.

Collections

Python comes with a module that contains a number of container data types called Collections. Though I only want to look at two right, now it also has three more called namedtuple(), deque (a linked list like structure), and OrderedDict.

defaultdict

This is a data type that I use a fair bit. One useful case is when you are appending to lists inside a dictionary. If the your using a dict() you would need to check if the key exists before appending, but with defaultdict this is not needed. So for example.

We could also count them like this.

Counter

But the last example is kind of redundant, in that Collections already contains a class for doing this, called Counter. In this case, I need to first extract the second item from each tuple, for which I can use a generator expression.

Another, maybe better, example might be counting all the different lines that appear in a file. It becomes very simple.

If you enjoyed this post, or found it useful, please do leave a comment or share it on twitter etc. If people find this useful I will try and do some follow up posts explaining some things in more detail and with extra examples.

Edit: if you have found this useful, but want more there is a great book Python Cookbook, Third edition by O'Reilly Media that has a whole lot more. If you want something simpler try Learning Python, 5th Edition.

CustomizableUI in Thunderbird and SeaMonkey

by Michael

I have been working towards trying to make my extension Toolbar Buttons restartless. One of the trick parts is need to create all the buttons by script. Fortunally There is CustomizableUI modules that handels all this really nicely in Firefox 29+. Unformtally, it only exists in Firefox 29+, and not in Thunderbird for example.

So what I have tried to do is impliment a module that has as close interface as possible to CustomizableUI, that works as a wrapper on top how toolbars function in Thunderbird. Really, it is just stuff you would have to write anyway to get a restartless extension adding buttons to Thunerbird, the only difference with this is that I am using almost the same interface. Once (if) CustomizableUI is added, there would be no need to change other code much.

The only difference in the interface is becuase Thunderbird can have 2 (or 3 if you have Lightening installed) toolbox's in the main window and there is a need to sellect what one you want to use. You can see the code below. Feedbox or comments welcome.

Resources in Bootstraped Extensions

by Michael

I have been working on trying to make Toolbar Buttons install without needing to restart Firefox. One thing I needed was resources:// urls. Problem being that they are not supported in bootstrapped extensions, so you have to do it manually.

I saw some code samples for how to do this, but I don't really like them. I took a slightly different approach. I put my resources somewhere in chrome:// so for example chrome://toolbar-buttons/content/resources/. Then want to be able to access the contents of this folder as resource://toolbar-buttons/. For that I use the following code in my bootstrap.js.