Python 101: iterators, generators, coroutines

In this post I’m going to be talking about what a generator is and how it compares to a coroutine, but to understand these two concepts (generators and coroutines) we’ll need to take a step back and understand the underlying concept of an Iterator.

Each section leads onto the next, so it’s best to read this post in the order the sections are defined. Unless you’re already familiar with earlier segments and prefer to jump ahead.

Summary

The summary of everything we’ll be discussing below is this:

Iterators let you iterate over your own custom object.
Generators are built upon Iterators (they reduce boilerplate).
Generator Expressions are even more concise Generators †
Coroutines are Generators, but their yield accepts values.
Coroutines can pause and resume execution (great for concurrency).

† think comprehensions.

Iterators

According to the official Python glossary, an ‘iterator’ is…

An object representing a stream of data.

Why use Iterators?

An interator is useful because it enables any custom object to be iterated over using the standard Python for-in syntax. This is ultimately how the internal list and dictionary types work, and how they allow for-in to iterate over them.

More importantly, an iterator (as we’ll discover) is very memory efficient and means there is only ever one element being handled at once. Thus you could have an iterator object that provides an infinite sequence of elements and you’ll never find your program exhausting its memory allocation.

Iterator Implementation

An iterator is (typically) an object that implements both the __iter__ and __next__ ‘dunder’ methods, although the __next__ method doesn’t have to be defined as part of the same object as where __iter__ is defined. Let me clarify…

An ‘iterator’ is really just a container of some data. This ‘container’ must have an __iter__ method which, according to the protocol documentation, should return an iterator object (i.e. something that has the __next__ method). It’s the __next__ method that moves forward through the relevant collection of data.

So you could design a single class that contains both the __iter__ and __next__ methods (like I demonstrate below), or you might want to have the __next__ method defined as part of a separate class (it’s up to you and whatever you feel works best for your project).

Note: the Python docs for collections.abc highlight the other ‘protocols’ that Python has and the various methods they require (see an earlier post of mine that discusses protocols + abstract classes in detail). If you’re unfamiliar with ‘dunder’ methods, then I’ll refer you to an excellent post: a guide to magic methods.

By implementing these two methods it enables Python to iterate over a ‘collection’. It doesn’t matter what the collection is, as long as the iterator object defines the behaviour that lets Python know how to iterate over it.

Iterator Example

Below is a contrived example that shows how to create such an object. In this example we pass in a list of strings to a class constructor and the class implements the relevant methods that allow for-in to iterate over that collection of data:

class Foo:
    def __init__(self, collection):
        self.collection = collection
        self.index = 0

    def __iter__(self):
        """
        we return self so the 'iterator object' 
        is the Foo class instance itself,
        
        but we could have returned a new instance 
        of a completely different class, so long as
        that other class had __next__ defined on it.
        """
        return self

    def __next__(self):
        """
        this method is handling state and informing
        the container of the iterator where we are
        currently pointing to within our data collection.
        """
        if self.index > len(self.collection)-1:
            raise StopIteration

        value = self.collection[self.index]
        self.index += 1

        return value

# we are now able to loop over our custom Foo class!
for element in Foo(["a", "b", "c"]):
    print(element)

Note: raising the StopIteration exception is a requirement for implementing an iterator correctly.

With this example implementation, we can also iterate over our Foo class manually, using the iter and next functions, like so:

foo = Foo(["a", "b", "c"])
iterator = iter(foo)

next(iterator)  # 'a'
next(iterator)  # 'b'
next(iterator)  # 'c'

Note: iter(foo) is the same as foo.__iter__(), while next(iterator) is the same as iterator.__next__() – so these functions are basic syntactic sugar provided by the standard library that helps make our code look nicer.

This type of iterator is referred to as a ‘class-based iterator’ and isn’t the only way to implement an iterable object. Generators and Generator Expressions (see the following sections) are other ways of iterating over an object in a memory efficient way.

We can also realize the full collection by using the list function, like so:

iterator = Foo(["a", "b", "c"])
list(iterator)  # ["a", "b", "c"]

Note: be careful doing this, because if the iterator is yielding an unbounded number of elements, then this will exhaust your application’s memory!

Generators

According to the official Python documentation, a ‘generator’ provides…

A convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object.

Why use Generators?

They offer nice syntax sugar around creating a simple Iterator, but also help reduce the boilerplate code necessary to make something iterable.

A Generator can help reduce the code boilerplate associated with a ‘class-based’ iterator because they’re designed to handle the ‘state management’ logic you would otherwise have to write yourself.

Generator Implementation

A Generator is a function that returns a ‘generator iterator’, so it acts similar to how __iter__ works (remember it returns an iterator).

In fact a Generator is a subclass of an Iterator. The generator function itself should utilize a yield statement to return control back to the caller of the generator function.

The caller can then advance the generator iterator by using either the for-in statement or next function (as we saw earlier with the ‘class-based’ Iterator examples), which again highlights how generators are indeed a subclass of an Iterator.

When a generator ‘yields’ it actually pauses the function at that point in time and returns a value. Calling next (or as part of a for-in) will move the function forward, where it will either complete the generator function or stop at the next yield declaration within the generator function.

Generator Example

The following example prints a, then b, finally c:

def generator():
  yield "a"
  yield "b"
  yield "c"

for v in generator():
     print(v)

If we used the next() function instead then we would do something like the following:

gen = generator()
next(gen)  # a
next(gen)  # b
next(gen)  # c
next(gen)  # raises StopIteration

Notice that this has greatly reduced our code boilerplate compared to the custom ‘class-based’ Iterator we created earlier, as there is no need to define the __iter__ nor __next__ methods on a class instance (nor manage any state ourselves). We simple call yield!

If our use case is simple enough, then Generators are the way to go. Otherwise we might need a custom ‘class-based’ Iterator if we have very specific logic we need to execute.

Remember, Iterators (and by extension Generators) are very memory efficient and thus we could have a generator that yields an unbounded number of elements like so:

def unbounded_generator():
    while True:
        yield "some value"

gen = unbounded_generator()

next(gen)  # some value
next(gen)  # some value
next(gen)  # some value
next(gen)  # some value
next(gen)  # ...

So, as mentioned earlier, be careful when using list() over a generator function (see below example), as that will realize the entire collection and could exhaust your application memory.

def generator():
  yield "a"
  yield "b"
  yield "c"

gen = generator()
list(gen)  # [a, b, c]

Generator Expressions

According to the official PEP 289 document for generator expressions…

Generator expressions are a high-performance, memory–efficient generalization of list comprehensions and generators.

In essence they are a way of creating a generator using a syntax very similar to list comprehensions.

Below is an example of a generator function that will print "foo" five times:

def generator(limit):
    for i in range(limit):
        yield "foo"

for v in generator(5):
    print(v)

Now here is is the same thing as a generator expression:

for v in ("foo" for i in range(5)):
    print(v)

The syntax for a generator expression is also very similar to those used by comprehensions, except that instead of the boundary/delimeter characters being [] or {}, we use ():

(expression for item in collection if condition)

Note: so although not demonstrated, you can also ‘filter’ yielded values due to the support for “if” conditions.

Nested Generators (i.e. `yield from`)

Python 3.3 provided the yield from statement, which offered some basic syntactic sugar around dealing with nested generators.

Let’s see an example of what we would have to do if we didn’t have yield from:

def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    for v in bar():
        yield v
    for v in baz():
        yield v

for v in foo():
    print(v)

Notice how (inside the foo generator function) we have two separate for-in loops, one for each nested generator.

Now look at what this becomes when using yield from:

def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    yield from bar()
    yield from baz()

for v in foo():
    print(v)

OK so not exactly a ground breaking feature, but if you were ever confused by yield from you now know that it’s a simple facade over the for-in syntax.

Although it’s worth pointing out that if we didn’t have yield from we still could have reworked our original code using the itertool module’s chain() function, like so:

from itertools import chain

def baz():
    for i in range(10):
        yield i

def bar():
    for i in range(5):
        yield i

def foo():
    for v in chain(bar(), baz()):
        yield v

for v in foo():
    print(v)

Note: refer to PEP 380 for more details on yield from and the rationale for its inclusion in the Python language.

Coroutines

Coroutines (as far as Python is concerned) have historically been designed to be an extension to Generators.

Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. – Wikipedia

Why use Coroutines?

Because coroutines can pause and resume execution context, they’re well suited to conconcurrent processing, as they enable the program to determine when to ‘context switch’ from one point of the code to another.

This is why coroutines are commonly used when dealing with concepts such as an event loop (which Python’s asyncio is built upon).

Coroutines Implementation

Generators use the yield keyword to return a value at some point in time within a function, but with coroutines the yield directive can also be used on the right-hand side of an = operator to signify it will accept a value at that point in time.

Coroutines Example

Below is an example of a coroutine. Remember! a coroutine is still a generator and so you’ll see our example uses features that are related to generators (such as yield and the next() function):

Note: refer to the code comments for extra clarity.

def foo():
    """
    notice we use yield in both the 
    traditional generator sense and
    also in the coroutine sense.
    """
    msg = yield  # coroutine feature
    yield msg    # generator feature

coro = foo()

# because a coroutine is a generator
# we need to advance the returned generator
# to the first yield within the generator function
next(coro)

# the .send() syntax is specific to a coroutine
# this sends "bar" to the first yield 
# so the msg variable will be assigned that value
result = coro.send("bar")

# because our coroutine also yields the msg variable
# it means we can print that value
print(result)  # bar

Note: coro is an identifier commonly used to refer to a coroutine. For more information on other available coroutine methods, please refer to the documentation.

Below is an example of a coroutine using yield to return a value to the caller prior to the value received via a caller using the .send() method:

def foo():
    msg = yield "beep"
    yield msg

coro = foo()

print(next(coro))  # beep

result = coro.send("bar")

print(result)  # bar

You can see in the above example that when we moved the generator coroutine to the first yield statement (using next(coro)), that the value "beep" was returned for us to print.

Asyncio: generator based coroutines

When the asyncio module was first released it didn’t support the async/await syntax, so when it was introduced, to ensure any legacy code that had a function that needed to be run concurrently (i.e. awaited) would have to use an asyncio.coroutine decorator function to allow it to be compatible with the new async/await syntax.

Note: refer to the documentation for information on this deprecated (as of Python 3.10) feature, as well as some other functions like asyncio.iscoroutine that are specific to generator based coroutines.

The original generator based coroutines meant any asyncio based code would have used yield from to await on Futures and other coroutines.

The following example demonstrates how to use both the new async coroutines with legacy generator based coroutines:

@asyncio.coroutine
def old_style_coroutine():
    yield from asyncio.sleep(1)

async def main():
    await old_style_coroutine()

Asyncio: new async coroutines

Coroutines created with async def are implemented using the more recent __await__ dunder method (see documentation here), while generator based coroutines are using a legacy ‘generator’ based implementation.

Types of Coroutines

This has led to the term ‘coroutine’ meaning multiple things in different contexts. We now have:

simple coroutines: traditional generator coroutine (no async io).
generator coroutines: async io using legacy asyncio implementation.
native coroutines: async io using latest async/await implementation.

Miscellaneous

There are a couple of interesting decorator functions provided by Python that can be a bit confusing, due to these functions appearing to have overlapping functionality.

They don’t overlap, but do appear to be used together:

types.coroutine: converts generator function into a coroutine.
asyncio.coroutine: abstraction ensuring asyncio compatibility.

Note: as we’ll see in a moment, asyncio.coroutine actually calls types.coroutine. You should ideally use the former when dealing with asyncio code.

More specifically, if we look at the implementation of the asyncio.coroutine code we can see:

If decorated function is already a coroutine, then just return it.
If decorated function is a generator, then convert it to a coroutine (using types.coroutine).
Otherwise wrap the decorated function such that when it’s converted to a coroutine it’ll await any resulting awaitable value.

What’s interesting about types.coroutine is that if your decorated function were to remove any reference to a yield, then the function will be executed immediately rather than returning a generator. See this Stack Overflow answer for more information as to where that behaviour was noticed.