BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20190719T085744Z
LOCATION:HG F 3
DTSTART;TZID=Europe/Stockholm:20190613T141500
DTEND;TZID=Europe/Stockholm:20190613T144500
UID:submissions.pasc-conference.org_PASC19_sess130_msa140@linklings.com
SUMMARY:Cross-Architecture Affinity for Exascale Computing
DESCRIPTION:Minisymposium\nComputer Science and Applied Mathematics\n\nCro
 ss-Architecture Affinity for Exascale Computing\n\nLeon, Hautreux\n\nTo re
 ach exascale computing, vendors are developing complex systems that includ
 e many-core technologies, hybrid machines with throughput-optimized cores 
 and latency-optimized cores, and multiple levels of memory. The complexity
  involved in extracting the performance benefits these systems offer chall
 enges the productivity of computational scientists greatly. A significant 
 part of this challenge involves mapping parallel applications efficiently 
 to the underlying hardware. A poor mapping may result in dramatic performa
 nce loss. Furthermore, an application mapping is often machine-dependent b
 reaking portability to favor performance. In this presentation, we demonst
 rate that the memory hierarchy of a system is key to achieve performa
 nce-portable mappings of parallel applications. We leverage a memory-drive
 n algorithm called mpibind that uses a simple interface computational scie
 ntists can use to map applications and results in a full mapping of MPI ta
 sks, threads, and GPU kernels to hardware compute units and memory domains
 . The interface is simple and portable: a hybrid application and a number 
 of tasks. This work applies the concept of memory-driven mappings to three
  emerging system architectures: a latency-optimized cores system, a two-le
 vel memory system, and a hybrid CPU+GPU system. Three use cases demonstrat
 e the benefits of this approach: reducing system noise, improving applicat
 ion performance, and improving productivity.
END:VEVENT
END:VCALENDAR