Quiz¶

Multiple choice, single answer

What is the main effect of Python’s Global Interpreter Lock (GIL) on CPU-bound Python threads?

A) It prevents multiple Python bytecode-executing threads from running at the same time in one process
B) It prevents processes from running on multiple CPU cores
C) It makes all NumPy operations serial
D) It disables MPI communication

Which workload is most likely to benefit from Python threads?

A) A pure Python loop doing arithmetic on integers
B) A task that spends much of its time waiting for file, network, or other I/O
C) A program that needs separate memory spaces for each worker
D) A CUDA kernel running on a GPU

Why can multiprocessing speed up CPU-bound Python work?

A) Processes share one Python interpreter and one GIL
B) Processes can run in separate interpreters with separate GILs
C) Processes eliminate all communication overhead
D) Processes automatically move computation to the GPU

What is a race condition?

A) A program runs slower than expected because the CPU is old
B) Two or more concurrent tasks access shared state and the result depends on timing
C) MPI ranks have different numeric identifiers
D) Dask chooses chunks that are too small

In MPI, what does a rank identify?

A) The GPU thread index inside a CUDA block
B) The priority of a Snakemake rule
C) One process within an MPI communicator
D) The size of a Dask chunk

Which MPI operation is most appropriate when rank 0 needs to collect partial results from all ranks?

A) broadcast
B) gather
C) sleep
D) syncthreads

What makes a workflow embarrassingly parallel?

A) Each task depends on the output of all previous tasks
B) Tasks can run independently with little or no communication
C) It must run on a GPU
D) It must use Python threads

What does Snakemake use to decide which workflow steps can run in parallel?

A) Python’s GIL
B) CUDA block dimensions
C) Declared input and output dependencies between rules
D) Dask chunk sizes

Which computation is usually a good candidate for GPU acceleration?

A) Many independent arithmetic operations over large arrays
B) A small script dominated by printing text
C) A highly branching task with tiny input data
D) A workflow waiting on network downloads

In a CUDA kernel, what are blocks and threads used for?

A) Organizing parallel work on the GPU
B) Choosing Dask dataframe partitions
C) Assigning MPI ranks to nodes
D) Locking shared Python variables

Why can copying data between CPU memory and GPU memory reduce speedup?

A) Transfers are free but make code harder to read
B) Transfer time can dominate if the computation is too small
C) The GIL blocks all GPU memory transfers
D) MPI disables GPU memory

What does lazy evaluation mean in Dask?

A) Computations are run immediately when an array or dataframe is created
B) Dask builds a task graph and delays execution until a result is requested
C) Dask only runs one task at a time
D) Dask stores all data in one Python list

Why does Dask chunk size matter?

A) It controls the balance between memory use, parallelism, and scheduling overhead
B) It changes the number of MPI ranks
C) It removes the need for a scheduler
D) It determines the CUDA thread index

Which Dask scheduler is designed for scaling work across multiple worker processes and potentially multiple machines?

A) serial
B) threads
C) distributed
D) synchronous

Short conceptual questions

Explain why CPU-bound pure Python code may not speed up when using threads.
For each task below, choose a suitable parallel strategy and briefly justify it:

downloading many files,
computing independent numerical integrals in pure Python,
running the same analysis on many input files with clear input/output dependencies,
applying the same arithmetic operation to a very large array on a suitable accelerator,
processing an array too large to fit comfortably in memory.

Explain why parallel code can be slower than serial code for small problems.
Describe two ways to avoid or fix race conditions.
Compare MPI, Snakemake, and Dask at a high level. What kind of problem is each one well suited for?

Coding and performance-analysis questions

A serial program computes a function work(x) independently for every value in a list called items. Sketch how to parallelize it with multiprocessing.Pool.
The following threaded code updates shared state. Identify the problem and describe one fix.

counter = 0

def update():
    global counter
    for _ in range(1000):
        counter += 1

Sketch the MPI communication pattern for a program where each rank computes a partial NumPy array and rank 0 combines all partial arrays into one total result.
A Dask array computation with very small chunks creates millions of tasks and runs slowly. Explain what you would change and how you would use the dashboard or timing measurements to evaluate the change.