BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20190719T085743Z
LOCATION:HG E 3
DTSTART;TZID=Europe/Stockholm:20190612T160000
DTEND;TZID=Europe/Stockholm:20190612T163000
UID:submissions.pasc-conference.org_PASC19_sess154_msa243@linklings.com
SUMMARY:Programming Model Design Tradeoffs of Global vs. Local Recovery fo
 r Algorithm Based Fault Tolerance
DESCRIPTION:Minisymposium\nComputer Science and Applied Mathematics, Emerg
 ing Application Domains, Chemistry and Materials, Climate and Weather, Phy
 sics, Solid Earth Dynamics, Life Sciences, Engineering\n\nProgramming Mode
 l Design Tradeoffs of Global vs. Local Recovery for Algorithm Based Fault 
 Tolerance\n\nKolla, Teranishi, Mayo, Salloum, Armstrong\n\nWe present desi
 gn tradeoffs between global and local recovery for algorithm based fault t
 olerance (ABFT), and implications for programming model and runtime design
 . For ABFT algorithms that may rely on global agreement or a collective ac
 tion (e.g. ABFT for linear system solvers that involve a global collective
 ), a global recovery from a soft error may be permissible from a performan
 ce and scalability perspective, even when the error may have occurred in a
  single element (local). On the other hand, we have recently designed ABFT
  algorithms that leverage the properties of conservation in PDEs and detec
 t an error from information that is entirely node local.  For such al
 gorithms, it is far more desirable for the error detection to induce a loc
 al recovery, and we address this problem through resilient extensions of t
 he existing parallel programming models. In the presentation, we introduce
  our ongoing effort on MPI, asynchronous many task (AMT) parallel programm
 ing models and Kokkos (abstraction of data representation and parallel com
 putation on for multiple node architectures), and discuss the design of th
 eir resilience capabilities and the performance tradeoffs with multiple be
 nchmark programs. We will also discuss our latest results of our ABFT PDE 
 solvers at scale.
END:VEVENT
END:VCALENDAR

