Programming
Python Generators: Memory-Efficient Iteration
Python generators provide a memory-efficient way to work with sequences of data. Unlike lists, which store all their elements in memory at once, generators produce values on demand, one at a time. This is particularly useful when dealing with large datasets or infinite sequences.
Ryan McBride
Ryan McBride
alt

Source: Jeremy Bishop on Unsplash

1. Generators vs. Lists

A regular function that returns a list computes all the values and stores them in memory. A generator, however, yields one result at a time, avoiding the need to store the entire sequence in memory.


    # Function returning a list
    def square_numbers_list(nums):
        result = []
        for i in nums:
            result.append(i*i)
        return result

    my_numbers_list = square_numbers_list([1, 2, 3, 4, 5])
    print(my_numbers_list)  # Output: [1, 4, 9, 16, 25]

    # Generator function
    def square_numbers_generator(nums):
        for i in nums:
            yield i*i

    my_numbers_generator = square_numbers_generator([1, 2, 3, 4, 5])
    print(my_numbers_generator)  # Output: <generator object square_numbers_generator at 0x...>
   

2. Creating Generators

To create a generator, you define a function that uses the yield keyword instead of return. Each time yield is encountered, the function's state is saved, and the yielded value is returned. The function can be resumed later from where it left off.


    def simple_generator():
        yield 1
        yield 2
        yield 3

    for value in simple_generator():
        print(value)
    # Output:
    # 1
    # 2
    # 3
   

3. Accessing Generator Values

You can access values from a generator using the next() function or, more commonly, by iterating over it with a for loop. The for loop automatically handles the iteration and the StopIteration exception, which is raised when the generator has no more values to yield.


    my_gen = square_numbers_generator([1, 2])
    print(next(my_gen))  # Output: 1
    print(next(my_gen))  # Output: 4
    # print(next(my_gen))  # Raises StopIteration

    my_gen = square_numbers_generator([1, 2, 3, 4, 5])
    for num in my_gen:
        print(num)
    # Output:
    # 1
    # 4
    # 9
    # 16
    # 25
   

4. Generator Expressions

Similar to list comprehensions, generator expressions provide a concise way to create generators. They use parentheses () instead of square brackets [].


    my_gen_exp = (x*x for x in range(1, 6))
    print(my_gen_exp)  # Output: <generator object <genexpr> at 0x...>

    for num in my_gen_exp:
        print(num)
    # Output:
    # 1
    # 4
    # 9
    # 16
    # 25
   

5. Converting to a List

You can convert a generator to a list using the list() function. However, this negates the memory-saving benefits of using a generator, as all values will be stored in memory at once.


    my_gen = (x*x for x in range(1, 6))
    my_list = list(my_gen)
    print(my_list)  # Output: [1, 4, 9, 16, 25]
   

6. Performance Advantages

Generators offer significant performance advantages, especially when dealing with large datasets. They don't store all values in memory, leading to lower memory usage and potentially faster initial execution times. The following example demonstrates this:


    import random
    import time
    import psutil

    names = ['John', 'Corey', 'Adam', 'Steve', 'Rick', 'Thomas']
    majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']

    def people_list(num_people):
        result = []
        for i in range(num_people):
            person = {
                        'id': i,
                        'name': random.choice(names),
                        'major': random.choice(majors)
                    }
            result.append(person)
        return result

    def people_generator(num_people):
        for i in range(num_people):
            person = {
                        'id': i,
                        'name': random.choice(names),
                        'major': random.choice(majors)
                    }
            yield person

    num_people = 1000000

    t1 = time.process_time()
    people = people_list(num_people)
    t2 = time.process_time()
    print(f'Memory (After List): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
    print(f'Time (List): {t2-t1} seconds')

    t1 = time.process_time()
    people = people_generator(num_people) # Only creates the generator object, doesn't generate data yet
    t2 = time.process_time()
    print(f'Memory (After Generator): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
    print(f'Time (Generator Creation): {t2-t1} seconds') # Much faster, as no data is generated yet

    t1 = time.process_time()
    people_list_from_generator = list(people_generator(num_people)) # Generates all data and stores in a list
    t2 = time.process_time()
    print(f'Memory (After List from Generator): {psutil.Process().memory_info().rss / 1024 ** 2} MB')
    print(f'Time (List from Generator): {t2-t1} seconds') # Similar time to creating the list directly
   

7. Disadvantages of Generators

While generators offer many advantages, it's important to be aware of potential drawbacks:

  • Not Suitable for All Scenarios: If you need to access all elements of a sequence multiple times or in a random order, a list might be more appropriate. Generators are best suited for situations where you iterate through the sequence once.
  • Loss of Efficiency When Converted to a List: Converting a generator to a list negates its memory efficiency, as all values are then stored in memory.
  • Can't Go Backwards: Once a generator has yielded a value, you can't go back and retrieve it again unless you restart the generator.