Sunday, January 05, 2025

MPI - Quick Tutorial

MPI is a C program that can run at multiple processors, and the processors can be at multiple machines. It will SSH into the remote machines and run the task on the processor

Install MPI

    sudo apt install openmpi-bin openmpi-common libopenmpi-dev

MPI has its own compiler and runtime:

     mpicc --version

     mpiexec --version


Compile MPI

    mpicc -o hello-mpi hello-mpi.c


Run MPI

    mpiexec  hello-mpi

    (This will execute hello with 16 processors because my machine has 16 processors. If ./hello-mpi is invoked directly, hello will be executed just once.)

Create a hostfile:

    localhost slots=4

Execute the hostfile to run with just 4 processors:

    mpirun  --hostfile hostfile hello-mpi

This will run in 4 processors.

You can directly run with just 4 processors without specifying a host file:

    mpirun -np 4 hello-mpi

If the number of processors exceeds the system has, the request will error out. For example:

    mpirun  -np 17  hello-mpi

    (This resulted in errors on my machine with "There are not enough slots available in the system to satisfy the 17 slots that were requested by the application." Similar error would happen if -np and --hostfile is combined and -np exceeds the slots on a machine defined in hostfile.)


mpirun and mpiexec can be used exchangeable.


How to Get Result from Remote

Results on different machines can be reduced with MPI_Reduce(), usually on machine 0. For example: to sum the result:

    double local_result = compute_local_result();

    double global_result;

    MPI_Reduce(&local_result, &global_result, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

MPI_Allreduce will reduce the result and send to all machines.


Use MPI_Gather to collect results from all nodes as an array of items into a node (node 0) without reducing. 

double local_result = compute_local_result(); 

double all_results[num_processes];

MPI_Gather(&local_result, 1, MPI_DOUBLE, all_results, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

MPI_Allgather will collect results in an array and make available in all machines.


You can also customize the communication among the nodes. For example, to collect results from all other MPI nodes in node 0.

if (rank != 0) { 

  MPI_Send(&local_result, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);

} else {

  for (int i = 1; i < num_processes; i++)

    MPI_Recv(&worker_result, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

}


How to trigger Python with MPI?

Instead of running a C program, use mpiexec to call a python program. For example,

     mpirun -np 4 python 1.py

Of course, your python script needs MPI support MPI; otherwise, it would run the same content. Install mpi4py by:

    sudo apt-get update

    sudo apt-get install libopenmpi-dev

    conda install python=3.10

    conda install mpi4py

Copy and run the basic example in https://mpi4py.readthedocs.io/en/stable/tutorial.html, and run:

    mpiexec -np 2 python 2.py


References:

https://mpitutorial.com/tutorials/mpi-hello-world/

https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/

https://mpi4py.readthedocs.io/en/stable/tutorial.html