BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20190719T085744Z
LOCATION:HG E 3
DTSTART;TZID=Europe/Stockholm:20190613T151500
DTEND;TZID=Europe/Stockholm:20190613T154500
UID:submissions.pasc-conference.org_PASC19_sess166_msa271@linklings.com
SUMMARY:Lattice QCD on Modern GPU Systems
DESCRIPTION:Minisymposium\nComputer Science and Applied Mathematics, Physi
 cs\n\nLattice QCD on Modern GPU Systems\n\nWagner, Clark, Weinberg, Messme
 r\n\nQUDA, an open-source library for Lattice Quantum Chromodynamics (QCD)
 , has provided GPU acceleration for multiple Lattice Quantum Chromodynamic
 s applications like MILC and Chroma for close to a decade. We share our le
 arnings from running across six GPU generations with various network and n
 ode configurations, including IBM POWER9 and x86 based systems as well as&
 nbsp;NVLink and NVSwitch based systems. Beyond outstanding kernel performa
 nce, strong-scaling is a key development object. The technologies discusse
 d include peer-to-peer memory access, GPU Direct RDMA, and NVSHMEM. The ap
 plied techniques like auto-tuning of kernel launch configurations and comm
 unication policies are cross-cutting, and generally applicable across scal
 able HPC domains. Furthermore, we will discuss algorithmic developments li
 ke Multigrid for QCD, Block-Krylov as well as mixed-precision solvers and 
 their efficient implementation to prepare Lattice QCD applications for exa
 scale computing.
END:VEVENT
END:VCALENDAR

