Programming
Python Threading: Achieving Concurrency for I/O-Bound Tasks
How to use Python's threading module to run code concurrently, improving the performance of I/O-bound tasks.
Ryan McBride
Ryan McBride
alt

Source: Warren Umoh on Unsplash

1. Why Use Threading?

Threading is beneficial for tasks that spend a significant amount of time waiting for input/output (I/O), such as network operations or file I/O. By running these tasks concurrently, your program can make progress on other tasks while waiting for I/O to complete, leading to faster overall execution.

2. Concurrency vs. Parallelism

  • Concurrency: Gives the illusion of running code at the same time. When one task is waiting, the program switches to another.
  • Parallelism: Involves actually running code at the same time, typically utilizing multiple CPU cores. Python's threading module primarily achieves concurrency, not true parallelism. Parallelism will be covered in the multiprocessing module.

3. Basic Threading Example

This example demonstrates how to create and start threads using the threading module.


import threading
import time

def do_something(seconds):
    print(f"Sleeping for {seconds} second(s)...")
    time.sleep(seconds)
    print("Done sleeping!")

start = time.perf_counter()

# Running the function sequentially
do_something(1)
do_something(1)

finish = time.perf_counter()
print(f"Finished in {round(finish-start, 2)} second(s)") # Approximately 2 seconds

start = time.perf_counter()

# Running the function concurrently using threads
t1 = threading.Thread(target=do_something, args=[1])
t2 = threading.Thread(target=do_something, args=[1])

t1.start()
t2.start()

t1.join() # Wait for t1 to complete
t2.join() # Wait for t2 to complete

finish = time.perf_counter()
print(f"Finished in {round(finish-start, 2)} second(s)") # Approximately 1 second
   

In this example, the threaded version completes faster because the threads run concurrently. While one thread is sleeping, the other thread can execute.

4. ThreadPoolExecutor

The ThreadPoolExecutor from the concurrent.futures module provides a more efficient and convenient way to manage threads.


import concurrent.futures
import time

def do_something(seconds):
    print(f"Sleeping for {seconds} second(s)...")
    time.sleep(seconds)
    return f"Done sleeping...{seconds}"

start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    f1 = executor.submit(do_something, 1)
    f2 = executor.submit(do_something, 1)

    print(f1.result()) # Waits for the result of f1
    print(f2.result()) # Waits for the result of f2

finish = time.perf_counter()
print(f"Finished in {round(finish-start, 2)} second(s)")

# Using as_completed to get results as they become available:
start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(do_something, 1) for _ in range(10)]

    for future in concurrent.futures.as_completed(futures):
        print(future.result())

finish = time.perf_counter()
print(f"Finished in {round(finish-start, 2)} second(s)")
   

5. Downloading Images Concurrently

This example demonstrates a practical use case of threading: downloading multiple images concurrently.


import time
import requests
import concurrent.futures

image_urls = [
    "https://images.unsplash.com/photo-1518791841217-8f162f1e1131?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=500&q=60",
    "https://images.unsplash.com/photo-1493663284031-b7e3a1a05820?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=500&q=60",
    "https://images.unsplash.com/photo-1522038992700-c151c4f47122?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=500&q=60",
    # Add more image URLs here
]

def download_image(image_url):
    img_bytes = requests.get(image_url).content
    img_name = image_url.split('/')[-1]
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f"{img_name} was downloaded...")

start = time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, image_urls)

finish = time.perf_counter()
print(f"Finished downloading {len(image_urls)} images in {round(finish - start, 2)} seconds")
   

This code downloads multiple images concurrently, significantly reducing the total download time compared to downloading them sequentially.