Salta ai contenuti. | Salta alla navigazione

Strumenti personali


Academic year and teacher
If you can't find the course description that you're looking for in the above list, please see the following instructions >>
Versione italiana
Academic year
Didactic period
Secondo Semestre

Training objectives

The main objective of this course is to provide students with the basics of parallel computing, for both distributed and shared memory architectures.

Acquired knowledge will be:
- characteristics of modern architecture for parallel computing;
- metrics for evaluating the performance of parallel codes through computing time measure (speedup, scaling and efficiency);
- introduction to MPI paradigm for distributed memory architectures: point-to-point and group communications, reduction and synchronization directives;
- introduction to OpenMP for shared memory architectures;
- examples of use of widely used libraries for linear algebra, factorization and eigenvalue problems: BLAS, ATLAS, LAPACK, MKL and their parallel implementation;
- examples of use of different profilers for CPU;
-scientific computing for the solution of partial differential equations

The main skills are:
- Analyze serial algorithms and identify a correct workload partitioning methodology to get a relevant speedup;
- Analyze serial algorithms and identify data access patterns to maximize aligned and contiguous memory accesses.
- Analyze code performances when executing on actual architectures.


Students need to master the following prerequisites, provided by the course of "Numerical Analysis":
- QR factorization, LU factorization and its use for solving linear systems;
- Floating point representation, machine precision, finite arithmetic.

Familiarity with FORTRAN language programming is advised.
Personal notebook is highly recommended.

Course programme

Introduction (10 hours)
- introduction to parallel computing, CPU architecture, computational cost;
- introduction to FORTRAN;
- performance measures (speedup, scaling and efficiency), FORTRAN code to measure serial computational time;
- Example: implementation of conjugate gradient method for the solution of linear systems;

Using MPI (12 hours)
- introduction to MPI, scripts for compiling and launching parallel codes, locking and not-locking point-to-point communications, definition of deadlock, send-receive deadlock and it solution using sendreceive function, collective communications, reductions, synchronization directives;
- Example: implementation of the conjugate gradient method;

 - OpenMP(8 hours)

Examples of parallelization for scientific computing applied to hyperbolic partial differential equations (18 hours):
- heat conduction equation on Cartesian meshes
- free surface Navier-Stokes equations on Cartesian meshes
- parallelization of general unstructured meshes using Metis library

Didactic methods

Main topics will be introduced thanks to slide presentations as well as practical examples.
Possible applications of parallel codes to the supercomputer CINECA.

Learning assessment procedures

Learning objectives level will be measured through an oral test, composed of two parts:
- talk on a project directly developed by the student;
- discussion on some topic of the course.
The instructions for the projects will be forwarded to the students during the teaching period.

Reference texts

- Lecture notes.
- Textbook: V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduction to Parallel Computing: design and analysis of Algorithms, Addison-Wesley, 2003
- Suggested for further study: D.P. Bertsekas, J. Tsisiklis, Parallel and Distributed Computation: Numerical Methods, Trentine-Hall, 1989.