Monday, December 8, 2014

Python Generators

Generators were first introduced to Python 2.2. Also referred to as "weightless threads", they allow you to replace threads or processes. Creation, entry and return are virtually free, unlike the alternatives, and encourages an asynchronous approach to handling background events. However, generators are single-threaded, and normally don't perform as well with intensive, blocking operations. We'll be going over the fundamentals, in this article. If, after reading this, you're interested in learning more, I highly recommend David Beazley and Brian K. Jones' Python Cookbook, Third edition.

Prerequisites: You'll want to familiarize yourself with list comprehensions by checking out this article.

To better understand generators, it helps to review some fundamental principles about functions. Whenever a function is called in Python, the program will execute, line-by-line, everything within that function's scope until a return statement or exception is reached. In Python, every function will either return what you've explicitly indicated or None, at which point, any local variables within the function get cleaned up. That's basically how functions in Python work, and we'll revisit this concept throughout the article.

After going through the prerequisites, you should be familiar with iterators. Generators are basically the same thing. Generators generate values as they go and do not store all the values in memory. Take the following Python3 example:

gen = (x*2 for x in range(5))
for i in gen:
  print(i)

# 0
# 2
# 4
# 6
# 8

Unlike an iterator, there is no list to be returned. Generators just step through each iteration, and return each value, one by one. Also notice you're required to use parenthesis instead of square brackets. Generators usually run faster than regular iterations with the same end result. You also save a lot of processing because you don't have to incur any computations beyond what you've specified in your call to the generator.

Yield and Next

yield is really all you need to define a generator. As long as yield exists somewhere inside your function, calling it will return a generator.

def generator():
yield

gen = generator()
print(gen)
<generator object generator at 0x101107fa0>
yield is a keyword, just like return, however, yield will return a generator at the point it was declared, and continue through the iteration where it left off during the next step. So whereas a return statement permanently hands over control to the caller of the function at the end, yield does so, temporarily, as it goes. The benefit of this is no longer having to keep track of state between calls or having to return large in-memory values at the end of a function call. You return values at each step of the iteration or by calling next() on the generator. This is best illustrated with an example:

# normal function which returns a list
def get_even(start):
  l = []
  for i in range(start):
    if i % 2 == 0:
      l.append(i)
  return l 

print(get_even(10)) # [2,4,6,8]

# list comprehension which returns a list
print [i in range(start) if i % 2 == 0] 

# -------------
# Using Generators
# -------------
# is_even function
def is_even(num):
  return num % 2 == 0

# generator which yields each value
def get_even(cap):
  for i in range(cap):
    if is_even(i):
      yield i

# create the generator
gen = get_even(10)

# call the generator with a loop...
for i in gen:
  print(i)

# 0
# 2
# 4
# 6
# 8

# or explicitly one by one...
# using Python 2.7
print(gen.next()) # 0
print(gen.next()) # 2

# using Python 3.x
print(next(gen)) # 0
print(next(gen)) # 2

Giving More Control of Returned Values to the Caller

So in the above example, we're just printing each value that gets returned with the generator, but of course you can pass these values into other functions. You also have more flexibility over the yielded values' types. So rather than explicitly defining all of the return types across multiple functions, generators allow you to take a more elegant approach, at the same level of performance.

# -------------
# Multiple Functions for Multiple types
# -------------
# normal function which returns a list
def get_even_list(start):
  l = []
  for i in range(start):
    if i % 2 == 0:
      l.append(i)
  return l 

print(get_even_list(10)) # returns list

# normal function which returns a tuple
def get_even_tuple(start):
  l = ()
  for i in range(start):
    if i % 2 == 0:
      l.add(i)
  return l 

print(get_even_tuple(10)) # returns tuple

# Note that we need two different functions, unless we cast
print(tuple(get_even_list(10))) # cast list to tuple

# -------------
# A More Elegant Solution Using Generators
# -------------

# generator which yields each value
# no need to indicate the return value's type...
# ...in the generator, not even once
def get_even(cap):
  for i in range(cap):
    if is_even(i):
      yield i

# call the generator with islice
from itertools import islice
print(tuple(islice(get_even(100), 100)))

Sending Values

By calling send() on a generator, you're able to pass values as you iterate:

## declare the generator with an infinite loop...
## ...that stores sent values
def generator():
  while True:
    next = yield
    print("Next:", item)

gen = generator() # instantiate the generate
print(gen)
next(gen) # enter into the first yield
gen.send('a'); # send values as you go
gen.send('b');

Terminating

You can also terminate generators by calling close() or throwing exceptions with throw().

# close
def generator():
  try:
    yield
  except GeneratorExit:
    print("Terminating")

gen = generator()
next(gen)
gen.close()

# throw
def generator():
  try:
    yield
  except RuntimeError as e:
  ...
  yield value # you may optionally yield another value

gen = generator()
next(gen)
val = gen.throw(RuntimeError, "Something went wrong")

Chaining and Delegation with Python 3.2+

Generators may also chain operations using the yield from <iterable>, syntax. Think of this as shorthand for for i in iterable: yield i:.

# chaining
def generator(n):
  yield from range(n)
  yield from range(n)

print(list(generator(3))) # [0,1,2,0,1,2]

yield from also has another added benefit over for loops by allowing subgenerators to receive sent values and thrown exceptions directly from the calling scope, and return a final value to the outer generator. This next example revisits some of what we learned about send.

# delegation

# generator 1
def counter():
  cnt = 0
  while True:
    next = yield
    if next is None:
      return cnt
    cnt += next

# generator 2
def store_totals(totals):
  while True:
    cnt = yield from counter()
    totals.append(cnt)

totals = [] # the list we'll pass to the generator
total = store_totals(totals)
next(total) # get ready to yield

for i in range(5):
  total.send(i) # send the values to be totaled up...
total.send(None) # ...and make sure stop the generator

for i in range(3):
  total.send(i) # start back up again...
total.send(None) # ...and finish the second count

print(totals) # [10, 3]

As you've learned in previous sections, all it takes to make a function into a generator is the yield keyword. So we have two generators in this example, with the second generator delegating to the first.

That concludes this article. I strongly recommend you watch David Beazley's Generators: A Final Frontier, if you'd like to learn some more of the advanced topics.

No comments:

Post a Comment