Python 101: iterators, generators, coroutines
In this post I’m going to be talking about what a generator is and how it compares to a coroutine, but to understand these two concepts (generators and coroutines) we’ll need to take a step back and understand the underlying concept of an Iterator.
We ultimately we’ll be discussing…
Each section leads onto the next, so it’s best to read this post in the order the sections are defined. Unless you’re already familiar with earlier segments and prefer to jump ahead.
Summary
The summary of everything we’ll be discussing below is this:
- Iterators let you iterate over your own custom object.
- Generators are built upon Iterators (they reduce boilerplate).
- Generator Expressions are even more concise Generators †
- Coroutines are Generators, but their
yield
accepts values. - Coroutines can pause and resume execution (great for concurrency).
† think comprehensions.
But before we get into it... time for some self-promotion 🙊
Iterators
According to the official Python glossary, an ‘iterator’ is…
An object representing a stream of data.
Why use Iterators?
An interator is useful because it enables any custom object to be iterated over using the standard Python for-in
syntax. This is ultimately how the internal list and dictionary types work, and how they allow for-in
to iterate over them.
More importantly, an iterator (as we’ll discover) is very memory efficient and means there is only ever one element being handled at once. Thus you could have an iterator object that provides an infinite sequence of elements and you’ll never find your program exhausting its memory allocation.
Iterator Implementation
An iterator is (typically) an object that implements both the __iter__
and __next__
‘dunder’ methods, although the __next__
method doesn’t have to be defined as part of the same object as where __iter__
is defined. Let me clarify…
An ‘iterator’ is really just a container of some data. This ‘container’ must have an __iter__
method which, according to the protocol documentation, should return an iterator object (i.e. something that has the __next__
method). It’s the __next__
method that moves forward through the relevant collection of data.
So you could design a single class that contains both the __iter__
and __next__
methods (like I demonstrate below), or you might want to have the __next__
method defined as part of a separate class (it’s up to you and whatever you feel works best for your project).
Note: the Python docs for
collections.abc
highlight the other ‘protocols’ that Python has and the various methods they require (see an earlier post of mine that discusses protocols + abstract classes in detail). If you’re unfamiliar with ‘dunder’ methods, then I’ll refer you to an excellent post: a guide to magic methods.
By implementing these two methods it enables Python to iterate over a ‘collection’. It doesn’t matter what the collection is, as long as the iterator object defines the behaviour that lets Python know how to iterate over it.
Iterator Example
Below is a contrived example that shows how to create such an object. In this example we pass in a list of strings to a class constructor and the class implements the relevant methods that allow for-in
to iterate over that collection of data:
class Foo:
def __init__(self, collection):
self.collection = collection
self.index = 0
def __iter__(self):
"""
we return self so the 'iterator object'
is the Foo class instance itself,
but we could have returned a new instance
of a completely different class, so long as
that other class had __next__ defined on it.
"""
return self
def __next__(self):
"""
this method is handling state and informing
the container of the iterator where we are
currently pointing to within our data collection.
"""
if self.index > len(self.collection)-1:
raise StopIteration
value = self.collection[self.index]
self.index += 1
return value
# we are now able to loop over our custom Foo class!
for element in Foo(["a", "b", "c"]):
print(element)
Note: raising the
StopIteration
exception is a requirement for implementing an iterator correctly.
With this example implementation, we can also iterate over our Foo
class manually, using the iter
and next
functions, like so:
foo = Foo(["a", "b", "c"])
iterator = iter(foo)
next(iterator) # 'a'
next(iterator) # 'b'
next(iterator) # 'c'
Note:
iter(foo)
is the same asfoo.__iter__()
, whilenext(iterator)
is the same asiterator.__next__()
– so these functions are basic syntactic sugar provided by the standard library that helps make our code look nicer.
This type of iterator is referred to as a ‘class-based iterator’ and isn’t the only way to implement an iterable object. Generators and Generator Expressions (see the following sections) are other ways of iterating over an object in a memory efficient way.
We can also realize the full collection by using the list
function, like so:
iterator = Foo(["a", "b", "c"])
list(iterator) # ["a", "b", "c"]
Note: be careful doing this, because if the iterator is yielding an unbounded number of elements, then this will exhaust your application’s memory!
Generators
According to the official Python documentation, a ‘generator’ provides…
A convenient way to implement the iterator protocol. If a container object’s
__iter__()
method is implemented as a generator, it will automatically return an iterator object.
Why use Generators?
They offer nice syntax sugar around creating a simple Iterator, but also help reduce the boilerplate code necessary to make something iterable.
A Generator can help reduce the code boilerplate associated with a ‘class-based’ iterator because they’re designed to handle the ‘state management’ logic you would otherwise have to write yourself.
Generator Implementation
A Generator is a function that returns a ‘generator iterator’, so it acts similar to how __iter__
works (remember it returns an iterator).
In fact a Generator is a subclass of an Iterator. The generator function itself should utilize a yield
statement to return control back to the caller of the generator function.
The caller can then advance the generator iterator by using either the for-in
statement or next
function (as we saw earlier with the ‘class-based’ Iterator examples), which again highlights how generators are indeed a subclass of an Iterator.
When a generator ‘yields’ it actually pauses the function at that point in time and returns a value. Calling next
(or as part of a for-in
) will move the function forward, where it will either complete the generator function or stop at the next yield
declaration within the generator function.
Generator Example
The following example prints a
, then b
, finally c
:
def generator():
yield "a"
yield "b"
yield "c"
for v in generator():
print(v)
If we used the next()
function instead then we would do something like the following:
gen = generator()
next(gen) # a
next(gen) # b
next(gen) # c
next(gen) # raises StopIteration
Notice that this has greatly reduced our code boilerplate compared to the custom ‘class-based’ Iterator we created earlier, as there is no need to define the __iter__
nor __next__
methods on a class instance (nor manage any state ourselves). We simple call yield
!
If our use case is simple enough, then Generators are the way to go. Otherwise we might need a custom ‘class-based’ Iterator if we have very specific logic we need to execute.
Remember, Iterators (and by extension Generators) are very memory efficient and thus we could have a generator that yields an unbounded number of elements like so:
def unbounded_generator():
while True:
yield "some value"
gen = unbounded_generator()
next(gen) # some value
next(gen) # some value
next(gen) # some value
next(gen) # some value
next(gen) # ...
So, as mentioned earlier, be careful when using list()
over a generator function (see below example), as that will realize the entire collection and could exhaust your application memory.
def generator():
yield "a"
yield "b"
yield "c"
gen = generator()
list(gen) # [a, b, c]
Generator Expressions
According to the official PEP 289 document for generator expressions…
Generator expressions are a high-performance, memory–efficient generalization of list comprehensions and generators.
In essence they are a way of creating a generator using a syntax very similar to list comprehensions.
Below is an example of a generator function that will print "foo"
five times:
def generator(limit):
for i in range(limit):
yield "foo"
for v in generator(5):
print(v)
Now here is is the same thing as a generator expression:
for v in ("foo" for i in range(5)):
print(v)
The syntax for a generator expression is also very similar to those used by comprehensions, except that instead of the boundary/delimeter characters being []
or {}
, we use ()
:
(expression for item in collection if condition)
Note: so although not demonstrated, you can also ‘filter’ yielded values due to the support for “if” conditions.
Nested Generators (i.e. yield from
)
Python 3.3 provided the yield from
statement, which offered some basic syntactic sugar around dealing with nested generators.
Let’s see an example of what we would have to do if we didn’t have yield from
:
def baz():
for i in range(10):
yield i
def bar():
for i in range(5):
yield i
def foo():
for v in bar():
yield v
for v in baz():
yield v
for v in foo():
print(v)
Notice how (inside the foo
generator function) we have two separate for-in
loops, one for each nested generator.
Now look at what this becomes when using yield from
:
def baz():
for i in range(10):
yield i
def bar():
for i in range(5):
yield i
def foo():
yield from bar()
yield from baz()
for v in foo():
print(v)
OK so not exactly a ground breaking feature, but if you were ever confused by yield from
you now know that it’s a simple facade over the for-in
syntax.
Although it’s worth pointing out that if we didn’t have yield from
we still could have reworked our original code using the itertool
module’s chain()
function, like so:
from itertools import chain
def baz():
for i in range(10):
yield i
def bar():
for i in range(5):
yield i
def foo():
for v in chain(bar(), baz()):
yield v
for v in foo():
print(v)
Note: refer to PEP 380 for more details on
yield from
and the rationale for its inclusion in the Python language.
Coroutines
Coroutines (as far as Python is concerned) have historically been designed to be an extension to Generators.
Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. – Wikipedia
Why use Coroutines?
Because coroutines can pause and resume execution context, they’re well suited to conconcurrent processing, as they enable the program to determine when to ‘context switch’ from one point of the code to another.
This is why coroutines are commonly used when dealing with concepts such as an event loop (which Python’s asyncio
is built upon).
Coroutines Implementation
Generators use the yield
keyword to return a value at some point in time within a function, but with coroutines the yield
directive can also be used on the right-hand side of an =
operator to signify it will accept a value at that point in time.
Coroutines Example
Below is an example of a coroutine. Remember! a coroutine is still a generator and so you’ll see our example uses features that are related to generators (such as yield
and the next()
function):
Note: refer to the code comments for extra clarity.
def foo():
"""
notice we use yield in both the
traditional generator sense and
also in the coroutine sense.
"""
msg = yield # coroutine feature
yield msg # generator feature
coro = foo()
# because a coroutine is a generator
# we need to advance the returned generator
# to the first yield within the generator function
next(coro)
# the .send() syntax is specific to a coroutine
# this sends "bar" to the first yield
# so the msg variable will be assigned that value
result = coro.send("bar")
# because our coroutine also yields the msg variable
# it means we can print that value
print(result) # bar
Note:
coro
is an identifier commonly used to refer to a coroutine. For more information on other available coroutine methods, please refer to the documentation.
Below is an example of a coroutine using yield
to return a value to the caller prior to the value received via a caller using the .send()
method:
def foo():
msg = yield "beep"
yield msg
coro = foo()
print(next(coro)) # beep
result = coro.send("bar")
print(result) # bar
You can see in the above example that when we moved the generator coroutine to the first yield
statement (using next(coro)
), that the value "beep"
was returned for us to print
.
Asyncio: generator based coroutines
When the asyncio
module was first released it didn’t support the async
/await
syntax, so when it was introduced, to ensure any legacy code that had a function that needed to be run concurrently (i.e. awaited) would have to use an asyncio.coroutine
decorator function to allow it to be compatible with the new async
/await
syntax.
Note: refer to the documentation for information on this deprecated (as of Python 3.10) feature, as well as some other functions like
asyncio.iscoroutine
that are specific to generator based coroutines.
The original generator based coroutines meant any asyncio
based code would have used yield from
to await on Futures and other coroutines.
The following example demonstrates how to use both the new async
coroutines with legacy generator based coroutines:
@asyncio.coroutine
def old_style_coroutine():
yield from asyncio.sleep(1)
async def main():
await old_style_coroutine()
Asyncio: new async coroutines
Coroutines created with async def
are implemented using the more recent __await__
dunder method (see documentation here), while generator based coroutines are using a legacy ‘generator’ based implementation.
Types of Coroutines
This has led to the term ‘coroutine’ meaning multiple things in different contexts. We now have:
- simple coroutines: traditional generator coroutine (no async io).
- generator coroutines: async io using legacy
asyncio
implementation. - native coroutines: async io using latest
async
/await
implementation.
Miscellaneous
There are a couple of interesting decorator functions provided by Python that can be a bit confusing, due to these functions appearing to have overlapping functionality.
They don’t overlap, but do appear to be used together:
types.coroutine
: converts generator function into a coroutine.asyncio.coroutine
: abstraction ensuringasyncio
compatibility.
Note: as we’ll see in a moment,
asyncio.coroutine
actually callstypes.coroutine
. You should ideally use the former when dealing withasyncio
code.
More specifically, if we look at the implementation of the asyncio.coroutine
code we can see:
- If decorated function is already a coroutine, then just return it.
- If decorated function is a generator, then convert it to a coroutine (using
types.coroutine
). - Otherwise wrap the decorated function such that when it’s converted to a coroutine it’ll await any resulting awaitable value.
What’s interesting about types.coroutine
is that if your decorated function were to remove any reference to a yield
, then the function will be executed immediately rather than returning a generator. See this Stack Overflow answer for more information as to where that behaviour was noticed.
But before we wrap up... time (once again) for some self-promotion 🙊