Parallel Computing using Python

Many scientific and engineering workloads now involve datasets, simulations, and analyses that are too large or too slow for straightforward serial Python code. This course introduces practical ways to improve Python performance by choosing the right parallel computing strategy for the work at hand.

The material covers shared-memory and distributed-memory parallelism, workflow parallelism, interactive parallel execution, GPU computing, and scalable analytics with Dask. Learners will compare threads, processes, MPI, Snakemake, ipyparallel, Numba/CUDA, and Dask, with attention to the tradeoffs that determine whether parallel code is faster, correct, and maintainable.

Prerequisites

  • Basic experience with Python

  • Basic experience working in a Linux-like terminal

  • Some prior experience working with datasets or computational scripts

Learning outcomes

This material is for researchers, engineers, and technical professionals who use Python for computational work and want to understand when and how to apply parallel computing techniques.

By the end of this module, learners should be able to:

  • Explain the differences between concurrency, multithreading, multiprocessing, distributed-memory parallelism, workflow parallelism, GPU acceleration, and task-based parallelism.

  • Describe how the Global Interpreter Lock affects CPU-bound Python code and when threads can still be useful.

  • Use multiprocessing concepts to parallelize independent CPU-bound work while accounting for process overhead and data movement.

  • Identify race conditions and choose safer designs using locks or avoiding shared mutable state.

  • Use basic MPI concepts such as ranks, point-to-point communication, broadcast, and gather.

  • Recognize embarrassingly parallel workflow patterns and use Snakemake-style dependencies to expose parallel execution.

  • Describe how ipyparallel can be used for interactive parallel execution from Jupyter.

  • Explain when GPU computing is appropriate and how CUDA kernels, blocks, threads, memory transfers, and data types affect performance.

  • Use Dask concepts such as lazy evaluation, task graphs, collections, chunking, schedulers, workers, and dashboards to reason about scalable analytics.

  • Measure performance carefully and choose a parallel strategy based on workload type, communication cost, memory use, correctness, and maintainability.

Credit

This module is part of the EVITA course material collection. Please contact EVITA if you want to reuse these course materials in your teaching or share feedback with the community.

License