On Monday and Tuesday, August 26-27, 2013, alongside the Euro-Par 2013 workshop program, we offer one track of tutorials on parallel programming models and tools. All tutorials will take place in the Center for Computing and Communication, Kopernikusstraße 6.
The OpenMP Tutorial Slides are available for download here.
Monday, August 26, 2013
Brian Wylie - Jülich Supercomputing Centre
Martin Schulz - Lawrence Livermore National Laboratory
Developers are challenged to improve the reliability, performance and scalability of their applications on computer systems with increasingly large numbers of processors and cores, requiring ever more parallelism and efficiency with many processes and threads of execution. Use of appropriate tools can greatly assist productivity of both developers and computer systems, and is the founding motivation for the Virtual Institute - High Productivity Supercomputing (VI-HPS). This full-day tutorial will give an overview and introduction to tools covering execution monitoring, correctness checking, and performance analysis of parallel applications at large to extreme scale. Primary focus is presenting and demonstrating tools provided by VI-HPS partners that are mostly available as open-source and suited for a range of current HPC platforms and Linux clusters, such that attendees will know which tools to look for on their development and production platforms and how they can be applied to improve productivity.
Tuesday, August 27, 2013
Jiri Kraus - NVIDIA
Sandra Wienke - RWTH Aachen University
On the way to exascale computing, the HPC community has to focus on energy-efficient architectures. Here, a promising performance per watt ratio motivates the usage of accelerators like GPUs. While programming accelerators with low-level APIs may be difficult and can couple the code to a particular accelerator vendor, the directive-based accelerator programming paradigm OpenACC aims at high development productivity and portability. OpenACC enables the offloading of loops and regions of C/C++ and Fortran code to recent architectures like NVIDIA GPUs, AMD GPUs or Intel’s Xeon Phi and delegates responsibility for low-level programming tasks to the compiler. This tutorial provides an introduction to OpenACC programming with focus on GPUs. It covers the overview of the NVIDIA GPU architecture and main concepts for code acceleration and data movement with OpenACC. We will introduce techniques to inter-operate with libraries to easily improve performance. Asynchronous data updates and kernel executions can be used to further increase performance and enable truly-heterogeneous programming. In a hands-on session, attendees develop their first OpenACC programs using the GPU cluster of RWTH Aachen University and PGI’s OpenACC compiler.
Tim Mattson - Intel Corporation
Christian Terboven - RWTH Aachen University
OpenMP is a popular, portable, widely supported and easy-to-use shared-memory model. Developers usually find OpenMP easy to learn. However, they are often disappointed with the performance and scalability of the resulting code. This disappointment stems not from shortcomings of OpenMP itself but rather from the lack of depth with which it is applied. Our advanced OpenMP programming tutorial addresses this critical need by exploring the implications of possible OpenMP parallelization strategies, both in terms of correctness and performance. We assume that attendees understand basic parallelization concepts and the fundamentals of OpenMP. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of vector units. We discuss language features in-depth, with emphasis on features recently added to OpenMP such as tasking. We close with an overview of the new OpenMP 4.0 directives for attached compute accelerators.