Lecture MPI-intro

What is MPI?

“The goal of the Message-Passing Interface simply stated is to develop a widely used standard for writing message-passing programs. As such the interface should establish a practical, portable, efficient, and flexible standard for message passing.”

  • Design an application programming interface.
  • Allow efficient communication: Avoid memory-to-memory copying, allow overlap of computation and communication, and offload to communication co-processors, where available.
  • Allow for implementations that can be used in a heterogeneous environment.
  • Allow convenient C and Fortran bindings for the interface.
  • Define an interface that can be implemented on many vendor’s platforms, with no significant changes in the underlying communication and system software.
  • Semantics of the interface should be language independent.

What is MPI?

  • De-facto standard for distributed parallelism.
  • It is an interface not a language.
  • It is significantly bigger than openMP. The 3.0 standard is more than 800 pages.
  • In MPI there are multiple instances of the program at once (no fork/join).
  • In principle you can get by with 6 functions / subroutines
    • MPI_INIT – Start MPI.
    • MPI_COMM_SIZE – What resources do you have access to?
    • MPI_COMM_RANK – Who am I?
    • MPI_SEND – Send stuff.
    • MPI_RECV – Receive stuff.
    • MPI_FINALIZE – Close and clean up MPI.
  • In practice we will need more functionality for efficient implementations.

MPI on your computer

  • Get the latest stable release of mpich at https://www.mpich.org/downloads/
  • tar -xvf mpich-3.2.1.tar or tar -zxvf mpich-3.2.1.tar.gz – unpack.
  • mkdir /Users/appelo/libs/mpich-3.2.1-install – make an installation directory.
  • cd /Users/appelo/Downloads/mpich-3.2.1 make sure you are in the downloaded and unpacked directory.
  • ./configure --prefix=/Users/appelo/libs/mpich-3.2.1-install
  • make
  • make install
  • Add the /Users/appelo/libs/mpich-3.2.1-install/bin` to your path.
  • To compile use mpif90 and to run (with 2 processes) use mpirun -np 2 ./a.out

MPI on Summit

  • Remember ssh scompile
  • sbatch script.sh
  • Example in the class repository codes/mpi/example1
  • I could not get the intel version of MPI to work…

A first example

program HOWDY
 use mpi
 implicit none
 integer :: ierr, nprocs, myid

 call mpi_init(ierr)
 call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)
 call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)

 write(*,*) 'Howdy, I am ', myid, ' out of ',nprocs

 call mpi_finalize(ierr)

end program HOWDY

Communicators

In call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr) the variable MPI_COMM_WORLD is a communicator that is pre-defined in MPI and is setup by the MPI_INIT call.

It is possible to use/define other communicators (may cover later).

Sending messages

If you are sending messages:

  • What are you sending?
  • How much are you sending?
  • Who are you sending to?
  • Are you labeling the shipment?
  • Do you need to know that the message is received?

If you are receiving messages:

  • You need to know what you are receiving and perhaps how much.
  • You might know who is sending.
  • You might know the label so you can infer things about it.
  • Do you have to wait at the mailbox to see when the package arrives?

Sending messages

If you are sending messages:

  • What are you sending? – Type, MPI_INTEGER, MPI_REAL, etc.
  • How much are you sending? – Count, how many number of types.
  • Who are you sending to? – Destination, integer in range [0,nprocs-1]
  • Are you labeling the shipment? Tag integer tag.
  • Do you need to know that the message is received? Blocking / non-blocking

If you are receiving messages:

  • You need to know what you are receiving and perhaps how much.
  • You might know who is sending.
  • You might know the label so you can infer things about it.
  • Do you have to wait at the mailbox to see when the package arrives?

Point-to-point communication

Show simple example simple_p2p.f90.

Blocking send / recv

MPI_SEND(buf, count, datatype, dest, tag, comm, ierror)

buf      ! of type datatype
count    ! integer
datatype ! e.g. MPI_REAL etc.
dest     ! integer
tag      ! integer
comm     ! MPI Communicator
ierror   ! integer

MPI_RECV(buf, count, datatype, source, tag, comm, status, ierror)

buf      ! of type datatype
count    ! integer
datatype ! e.g. MPI_REAL etc.
source   ! integer
tag      ! integer
comm     ! MPI Communicator
status   ! integer of size MPI_STATUS_SIZE
ierror   ! integer

Communication modes of blocking send

The basic MPI_SEND is blocking meaning that the call does not return until the message data and envelope (tag,source,destination) have been safely stored away, e.g. the buffer can be modified.

“Message buffering decouples the send and receive operations. A blocking send can complete as soon as the message was buffered, even if no matching receive has been executed by the receiver.

On the other hand, message buffering can be expensive, as it entails additional memory-to-memory copying, and it requires the allocation of memory for buffering.

Communication modes of blocking send

MPI offers the choice of several communication modes that allow one to control the choice of the communication protocol.”

  • Standard communication mode is non-local. It may require a matching receive to be posted.
  • Buffered mode send, MPI_BSEND, is local and does not require a matching receive to be posted to start. It may finish before a matching receive has been posted. This requires a buffer that is either specified by the user or by MPI. If the buffer is not big enough an error will occur.
  • Synchronous mode send. Can start but will not finish unless a matching receive is posted and has started to receive. This is non-local.
  • Ready communication mode send can only start if the matching receive is posted.

The calls are: MPI_BSEND , MPI_SSEND and MPI_RSEND.

Deadlock

Show examples. codes/mpi/example2/deadlock_fixed_p2p.f90 and codes/mpi/example2/deadlock_p2p.f90