Source: Jeremy Bishop on Unsplash
A regular function that returns a list computes all the values and stores them in memory. A generator, however, yields one result at a time, avoiding the need to store the entire sequence in memory.
# Function returning a list
def square_numbers_list(nums):
result = []
for i in nums:
result.append(i*i)
return result
my_numbers_list = square_numbers_list([1, 2, 3, 4, 5])
print(my_numbers_list) # Output: [1, 4, 9, 16, 25]
# Generator function
def square_numbers_generator(nums):
for i in nums:
yield i*i
my_numbers_generator = square_numbers_generator([1, 2, 3, 4, 5])
print(my_numbers_generator) # Output: <generator object square_numbers_generator at 0x...>
To create a generator, you define a function that uses the yield
keyword instead of return
. Each time yield
is encountered, the function's state is saved, and the yielded value is returned. The function can be resumed later from where it left off.
def simple_generator():
yield 1
yield 2
yield 3
for value in simple_generator():
print(value)
# Output:
# 1
# 2
# 3
You can access values from a generator using the next()
function or, more commonly, by iterating over it with a for
loop. The for
loop automatically handles the iteration and the StopIteration
exception, which is raised when the generator has no more values to yield.
my_gen = square_numbers_generator([1, 2])
print(next(my_gen)) # Output: 1
print(next(my_gen)) # Output: 4
# print(next(my_gen)) # Raises StopIteration
my_gen = square_numbers_generator([1, 2, 3, 4, 5])
for num in my_gen:
print(num)
# Output:
# 1
# 4
# 9
# 16
# 25
Similar to list comprehensions, generator expressions provide a concise way to create generators. They use parentheses ()
instead of square brackets []
.
my_gen_exp = (x*x for x in range(1, 6))
print(my_gen_exp) # Output: <generator object <genexpr> at 0x...>
for num in my_gen_exp:
print(num)
# Output:
# 1
# 4
# 9
# 16
# 25
You can convert a generator to a list using the list()
function. However, this negates the memory-saving benefits of using a generator, as all values will be stored in memory at once.
my_gen = (x*x for x in range(1, 6))
my_list = list(my_gen)
print(my_list) # Output: [1, 4, 9, 16, 25]
Generators offer significant performance advantages, especially when dealing with large datasets. They don't store all values in memory, leading to lower memory usage and potentially faster initial execution times. The following example demonstrates this:
import random
import time
import psutil
names = ['John', 'Corey', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
num_people = 1000000
t1 = time.process_time()
people = people_list(num_people)
t2 = time.process_time()
print(f'Memory (After List): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
print(f'Time (List): {t2-t1} seconds')
t1 = time.process_time()
people = people_generator(num_people) # Only creates the generator object, doesn't generate data yet
t2 = time.process_time()
print(f'Memory (After Generator): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
print(f'Time (Generator Creation): {t2-t1} seconds') # Much faster, as no data is generated yet
t1 = time.process_time()
people_list_from_generator = list(people_generator(num_people)) # Generates all data and stores in a list
t2 = time.process_time()
print(f'Memory (After List from Generator): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
print(f'Time (List from Generator): {t2-t1} seconds') # Similar time to creating the list directly
While generators offer many advantages, it's important to be aware of potential drawbacks: