BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20190719T085744Z
LOCATION:HG EO Nord
DTSTART;TZID=Europe/Stockholm:20190613T195000
DTEND;TZID=Europe/Stockholm:20190613T215000
UID:submissions.pasc-conference.org_PASC19_sess179_post118@linklings.com
SUMMARY:CHM05 - Many-Body Perturbation Theory Towards the Exascale: Yambo 
 on GPUs
DESCRIPTION:Poster\n\n\nCHM05 - Many-Body Perturbation Theory Towards the 
 Exascale: Yambo on GPUs\n\nFerretti, Bonfa', Marri, Phillips, Romero...\n\
 nMany-body perturbation theory (MBPT) methods such as the GW approximation
  or the Bethe Salpeter Equation (BSE) are emerging approaches in the field
  of electronic structure simulations. Implemented since the 80s, nowadays 
 they have become a tool of choice for many computational works and are amo
 ng the methods that can best benefit from exascale in HPC because of 
 their workload. Here we focus on a modern MBPT tool, Yambo, to discuss and
  validate a porting strategy for GW and BSE methods on GPU accelerated sys
 tem.  Our approach is based on CUDA Fortran, with extensive use of CU
 F kernel directives to automatically generate GPU code and CUDA libraries 
 (FFT and BLAS). This strategy allows us to keep the software base as close
  as possible to the CPU version of the code, while fully exploiting the GP
 U accelerators. Hotspots were first identified and targeted for the portin
 g, limiting global impact on the code: the load of wavefunctions (overlapp
 ed with FFTs), the calculation of oscillator matrix elements, and the calc
 ulation of the response function (one of the main kernels in Yambo). Preli
 minary benchmarks have shown a speedup of 5-10x comparing a full socket CP
 U (IntelXeon E5-2690) vs GPU (NVIDIA P100) on PizDaint (CSCS).
END:VEVENT
END:VCALENDAR

