Closures are an interesting concept in computer programming. They can be powerful and useful, but they also can be tricky to understand and use well.
In this post, I try to provide a clear explanation of closures, and go into specifics on closures support in Python.
Being an “interesting programming concept” is nice, but not enough for me. I will present a real use-case for closures in Python.
No programming language is perfect (though Python comes close 😉 ). Programming languages that choose to support closures often need to make difficult tradeoffs. I will discuss some of the choices that were made in Python, and their implications.
Closures as a concept
Derived from the Wikipedia entry on Closure:
In computer programming languages, a closure is a function together with a referencing environment of that function. A closure function is any function that uses a variable that is defined in an environment (or scope) that is external to that function, and is accessible within the function when invoked from a scope in which that free variable is not defined.
To illustrate that, here’s a toy example of closures in Python:
def make_multiplier(factor): def multiply(number): return number * factor return multiply mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # prints 42 print mult7(6) # prints 42
In this example, mult6
and mult7
are function closures. They are functions because make_multiplier
returns the function multiply
. It is a closure because the returned function refers to the free variable factor
. The factor
variable is local in the scope of make_multiplier
, but not in the scope of multiply
. Since this variable is referenced by the returned function, Python knows it needs to store that variable along with the returned function, and not garbage-collect it once make_multiplier
completes.
When the function closure is invoked, it has access to the factor
variable as it was when the closure was created, even though the calling scope doesn’t know about any variable named factor
. Even if a variable named factor
was defined in the calling scope, it has nothing to do with the factor
variable in the closure:
def main(): factor = 10 mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # still prints 42 print mult7(6) # still prints 42 print 'my local factor is', factor # prints 10 main()
In the modified example, main
is a function with a local variable named factor
. The local factor
and the closure factor
have nothing to do with each other. For each instance of the multiply closure, factor
is the one from the closure, so the results of invoking it are unchanged. In main
, the factor
variable is assigned 10 and not changed, so it is still 10 when printed at the end.
Before digging deeper into the details of Python closures, I’d like to explore the actual usefulness of closures, setting useless toy examples aside.
Motivating Python use-case — Lazy evaluation
Consider the following scenario. You’re developing a media library manager and player. You write the following function to search the library for a media item whose names match a regular expression and “display” it:
def display_media_by_name(name_query, media_items): """Display media items who name matches the RegEx `name_query`. @param name_query RegEx matcher object over media item names @param media_items A dictionary of (item_name, callable_media_getter) """ for media_item_name in media_items: if name_query.match(media_item_name): print 'Query matched media item', media_item_name get_media = media_items[media_item_name] print 'Calling display on matched media item' display(get_media())
You may be thinking – why write it like that? Media items are probably files. Why not have the dictionary contain (item_name, media_file_path)
? That way the function can open and read the file, and pass the content to display()
.
That’s a valid approach, if you know in advance that you deal with nothing but local media files! The approach I suggest here scales better. It removes the coupling between what is a media item, and how to get the content of a media item. For example, there’s no need to change the display_media_by_name
function to support both local media files, and web-based media streams. As long as the dictionary entry value is a callable that returns the content of the media item – we’re good to go!
An alternative strategy may be to have the dictionary contain (item_name, media_content)
. That way it’s the responsibility of whoever populates the dictionary to get the content. The display_media_by_name
function has trivial access to the content.
That may have been valid, if “media” was only a few small local files. If the media library is expected to hold many large items, the dictionary would be prohibitively large. In addition, notice that it is reasonable to assume that “most” media items will never be read in a single session. In that case, opening and reading all of them is an abysmal waste of time of memory (think “online video streaming”). This is what “lazy evaluation” is all about!
Here’s a partial main
that tries to build a media items dictionary that can be used with the display_media_by_name
function:
def main(): media_items = dict() for media_file_path in os.listdir('.'): media_items[media_file_path] = ?! what to do here ?!
Not surprisingly, the answer to ?! what to do here ?! is to assign a function closure.
Now I have a real-world use-case! Before describing a solution, I’d like to stop and look deeper into how Python supports closures.
Python support — Statically Nested Scopes
Starting with version 2.1 of Python, the language implements “statically nested scopes”, also known as “lexical scoping”. This is described in detail in PEP 227.
The gist of the PEP describes a change in the rules of resolving “free variables” in a namespace (or block). A “free variable” is a variable that was not “bound” in a namespace that referenced it. Before version 2.1, Python defines exactly three namespaces to search for a variable — local, global, and builtin.
In the case of nested functions (say inner
is defined inside a function outer
), the local namespace of outer
is not visible inside inner
. According the the pre-2.1 rules, if inner
tries to reference a name bound in outer
, it would not see it.
The new rule, starting with 2.1, allows Python to also “search up” local namespaces of containing functions. This allows inner
to “see” variables bound in outer
.
I’ll use the toy example from the beginning of this post to clarify the details.
# This is the global scope def make_multiplier(factor): # This is the local scope of make_multiplier def multiply(number): # This is the local scope of multiply return number * factor return multiply def main(): # This is the local scope of main factor = 10 mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # still prints 42 print mult7(6) # still prints 42 print 'my local factor is', factor # prints 10 main()
Here are the namespaces and names in the code:
namespace | names bound in namespace | free variables in namespace |
---|---|---|
builtin namespace | All those builtin thingies (like print) | |
global namespace | make_multiplier, main | |
local namespace of make_multiplier |
factor(arg), multiply | |
local namespace of multiply |
number(arg) | factor |
local namespace of main |
factor(10), mult6, mult7 | make_multiplier, print |
A name in Python refers to an object (which may also be a function). A name is bound to an object by a “name binding operation”.
Name binding operations are: argument declaration, assignment, class and function definition, import statements, for statements, and except clauses.
References to names in the program text refer to the object bound to that name in the innermost function block containing the reference. If a name is used within a code block, but not bound there, the use is treated as a reference to the nearest enclosing function region.
This should provide the rules I used to construct the above table for the example. make_multiplier
is a name bound by def
in the global namespace. The reference to make_multiplier
in main
refers to that function because it was not bound in the local scope of main
, and the global namespace is the next one to search in.
How does Python resolve the reference to the name factor
in multiply
? Following the old rules, it searches the global and builtin namespaces. The name factor
is not bound to an object in these namespaces, so Python raises an error. According to the new rules, though, Python can use the binding of the name factor
introduced in the local namespace of make_multiplier
.
As a side note, in the pre-2.1 days it was possible to implement a similar behavior. This was done by passing the value of factor
as a default value for a factor
argument in multiply
: def multiply(number, factor=factor)
. This made factor
a name bound also in the local namespace of multiply
.
This should provide a preliminary understanding of PEP 227. I knowingly skipped exceptions to the rules and edge cases, so I can focus on the essence. I will cover these as well, in a later section. Before that, I want to go back to the motivating use-case, to demonstrate how the new rules allow an elegant implementation.
Implementing lazy evaluation with a closure
Now that you understand how lexical scoping works in Python, it should be straight forward to pick up where I stopped:
def main(): media_items = dict() for media_file_path in os.listdir('.'): def get_media(): """Open and return the content of a media file.""" print 'Opening and reading', media_file_path with open(media_file_path, 'r') as media_file: return media_file.read() media_items[media_file_path] = get_media
For each file, I bind the name get_media
to a function (object) the returns the content of that file. I then add a record (file_path, file_content_getter_function)
to the media items dictionary. It is important to note that the get_media
function is never executed during the for-loop. This means that at this stage, the file is never opened and read, achieving the “laziness” I wanted.
To wrap up the real-world example, here’s the full program:
import os import re def display(media): """Display media.""" print '{{{' print media print '}}}' def display_media_by_name(name_query, media_items): """Display media items who name matches the RegEx `name_query`. @param name_query RegEx matcher object over media item names @param media_items A dictionary of (item_name, callable_media_getter) """ for media_item_name in media_items: if name_query.match(media_item_name): print 'Query matched media item', media_item_name get_media = media_items[media_item_name] print 'Calling display on matched media item' display(get_media()) def main(): """Create media library from current dir and display some.""" media_items = dict() print 'Populating media items dictionary' for media_file_path in os.listdir('.'): def get_media(): """Open and return the content of a media file.""" print 'Opening and reading', media_file_path with open(media_file_path, 'r') as media_file: return media_file.read() media_items[media_file_path] = get_media query_str = r'toy_.*\.py' query = re.compile(query_str) print 'Displaying media that matches query', query_str display_media_by_name(query, media_items) print 'Done' if '__main__' == __name__: main()
For the sake of the example, the “media library” is the collection of plain text files in the current directory. “Displaying media” is just printing the content of a plain text file to STDOUT. Here’s the output of running the program in a directory that contains the toy_closure.py
script from the toy example used above:
itamar@legolas pyclosure $ python lazy.py Populating media items dictionary Displaying media that matches query toy_.*\.py Query matched media item toy_closure.py Calling display on matched media item Opening and reading toy_closure.py {{{ # This is the global scope def make_multiplier(factor): # This is the local scope of make_multiplier def multiply(number): # This is the local scope of multiply return number * factor return multiply def main(): # This is the local scope of main factor = 10 mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # still prints 42 print mult7(6) # still prints 42 print 'my local factor is', factor # prints 10 main() }}} Done
Note that the highlighted message about opening and reading the file appears only when the get_media
function is executed. This happens only after the dictionary entry is matched. Maximal laziness achieved.
Exceptions, pitfalls and caveats
An Ostrich always pays its debts. Here are some extra rules and exceptions from PEP 227 that I omitted above:
- If the
global *name*
statement appears in a code block, all uses of*name*
refer to the binding of*name*
in the top-level namespace (i.e. global namespace, followed by__builtin__
namespace). Theglobal
statement must precede all uses of*name*
. - It is an error to delete the name of a variable that is referenced in an enclosed scope.
- It is an error to use
exec
in a function with nested block with free variables. Unless theexec
explicitly specifies the local namespace. - Names in class scope are not accessible via the new rules.
- Variables are not declared.
There are two more related errors that I wasn’t able to create. Maybe it changed since version 2.1 / 2.2.
Some of these rules and errors show some of the tradeoffs that were made.
The global statement and lexical scoping
In its nature, Python doesn’t have variable declaration semantics. A variable is “declared” when it’s name is bound to an object. This is a major issue if you want to change the value of a global variable in a function! Consider the following code:
# global state STATE = 1 def change_state(new_state): print 'Changing state to', new_state STATE = new_state change_state(2) print 'State is', STATE # prints "1"
The global STATE
never changes! The expression STATE = new_state
in change_state
is a binding operation. It binds the name STATE
in the local namespace of change_state
, hiding the global name STATE
.
To address this issue, Python introduced the global
statement:
# global state STATE = 1 def change_state(new_state): global STATE print 'Changing state to', new_state STATE = new_state change_state(2) print 'State is', STATE # prints "2"
This was a good-enough solution before introducing lexical scoping. But what happens here?
def my_parser(): # my_parser state state = 1 def change_state(new_state): print 'Changing state to', new_state state = new_state change_state(2) print 'State is', state # prints "1" my_parser()
The state
name is now bound in both local scopes of my_parser
and change_state
. There is no syntax to tell Python that you want the state
name in change_state
to refer to the object that was bound to that name in my_parser
! To get the desired behavior, you either need to global state
in both scopes, or significantly change the code structure. The first option may result unwanted side effects, like other scopes messing with the global state. The second option may be a hassle.
That is one price for introducing lexical scoping in a language without variable declarations.
When the old gods and the new gods collide
The changed rules might result identical code to behave differently in pre-2.1 and post-2.1. Here’s an example, based on the toy example:
# This is the global scope factor = 100 def make_multiplier(factor): # This is the local scope of make_multiplier def multiply(number): # This is the local scope of multiply return number * factor return multiply def main(): # This is the local scope of main factor = 10 mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # still prints 42 with new rules, or 700 with old rules print mult7(6) # still prints 42 with new rules, or 600 with old rules print 'my local factor is', factor # prints 10 main() print 'my global factor is', factor # prints 100
The only difference in this example is the addition of a global factor
variable. But now, according to the old rules, the factor
name in multiply
refers to the global factor
, and not the factor
from the enclosing scope.
According to the PEP, Python should print a warning for this case. In my testing it didn’t. Maybe somewhere between 2.1 to 2.7 the warning was dropped.
Deleting a captured variable
Why is it an error to delete a name of a variable that is referenced in an enclosing scope? That should be obvious:
def make_multiplier(factor): # This is the local scope of make_multiplier def multiply(number): # This is the local scope of multiply return number * factor del factor return multiply def main(): # This is the local scope of main mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # still prints 42 print mult7(6) # still prints 42 main()
Running it:
itamar@legolas pyclosure $ python toy_closure.py SyntaxError: can not delete variable 'factor' referenced in nested scope
Duh. If Python needs to be able to access factor
when mult6
is invoked, it cannot have it deleted!
To exec or not to exec
The exec
statement in Python allows executing a string as Python code. What happens with exec
when lexical scoping is involved?
# This is the global scope MODIFIER = 1 def make_multiplier(factor, exec_str): # This is the local scope of make_multiplier exec exec_str def multiply(number): # This is the local scope of multiply return number * factor * MODIFIER return multiply def main(): # This is the local scope of main mult6 = make_multiplier(6, 'print "yo 6"') mult7 = make_multiplier(7, 'MODIFIER = 10') print mult6(7) # still prints 42 print mult7(6) # still prints 42
Running the script indeed results an error:
itamar@legolas pyclosure $ python toy_closure.py File "toy_closure.py", line 8 exec exec_str SyntaxError: unqualified exec is not allowed in function 'make_multiplier' it contains a nested function with free variables
But why? There’s no harm in printing yo 6
, but it is an issue to introduce a new bound name MODIFIER = 10
to the local scope! The resolution of MODIFIER
depends on the content of the exec_str
string, which is generally not known statically.
It is possible to “sandbox” exec
with exec exec_str in {}
. With this modification, we get back the expected behavior:
itamar@legolas pyclosure $ python toy_closure.py yo 6 42 42
Note that the exec 'MODIFIER = 10'
was sandboxed, not changing the second “42” to “420”.
Another cost of not declaring variables
This is a frequently quoted example of “the perils of Python”:
i = 1 def make_multiplier(factor): # This is the local scope of make_multiplier def multiply(number): # This is the local scope of multiply return number * factor * i # i comes from the global scope, right? # shitloads of code for i in xrange(5): # oops! i is locally bound in enclosing scope! print 'hi', i return multiply def main(): # This is the local scope of main mult6 = make_multiplier(6) mult7 = make_multiplier(7) print mult6(7) # what will I print?? print mult7(6) # what will I print?? main()
Running it:
itamar@legolas pyclosure $ python toy_closure.py hi 0 hi 1 hi 2 hi 3 hi 4 hi 5 hi 0 hi 1 hi 2 hi 3 hi 4 hi 5 210 210
Somewhat unexpectedly, we got 42*5 instead of 42*1. That’s because the for i
loop in make_multiplier
made i
a locally bound name in the scope of make_multiplier
– regardless of where in the block it was bound. Even though multiply
supposedly referenced i
“before” it was used in the for-loop, Python resolved that i
following the rules.
I agree this may be counter-intuitive, and it’s one of the tradeoffs made in Python. Would you prefer you code to be sprinkled with var i
instead?
Names in class scopes are not visible to nested scopes
If a class definition occurs in a chain of nested scopes, the resolution process skips the class definition. This exception was added to prevent odd interactions between class attributes and local variable access. Name binding operations in class definitions don’t create bound names in a scope – they create an attribute on the resulting class object! These attributes can be accessed in class methods with attribute references, either via self
or via the class name.
The alternative was to allow regular name binding rules. This would create multiple ways to reference class attributes – either as class attributes, or as free variables. The decision was to introduce this exception.
Summary
That concludes my take on Python support for closures, via lexical scoping. I hope I was able to provide a clean explanation, as I intended.
There is a delicate balance between deep understanding of “advanced programming concepts”, and actually being able to use them for something relevant. I hope you were able to relate to my “motivating example” of function closure use, and that it helped you understand the concept as as much as it helped me.
When planning this post, I had to choose one “motivating use case” to keep the length under control. In practice, I have various uses. Another use-case will be highlighted in an upcoming SCons
series episode.
Do you have interesting use-cases for Python closures? Something to say about my primer? The comments are open!
October 15, 2014
An alternative approach to closures is partial() from functools. This is similar in most if not all cases.