BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20190719T085744Z
LOCATION:HG EO Nord
DTSTART;TZID=Europe/Stockholm:20190613T195000
DTEND;TZID=Europe/Stockholm:20190613T215000
UID:submissions.pasc-conference.org_PASC19_sess179_post131@linklings.com
SUMMARY:CSM15 - Tasking Meets GPUs: Fighting Deadlocks and Other Monsters
DESCRIPTION:Poster\n\n\nCSM15 - Tasking Meets GPUs: Fighting Deadlocks and
  Other Monsters\n\nMorgenstern, Beckmann, Kabadshow, Werner\n\nTask parall
 elism is omnipresent these days; whether in data mining or machine learnin
 g, for matrix factorization or even molecular dynamics (MD). Despite the s
 uccess of task parallelism on CPUs, there is currently no performant way t
 o exploit the task parallelism of synchronization-critical algorithms on G
 PUs. Due to this shortcoming, we develop a tasking approach for GPU archit
 ectures. Our use case is a fast multipole method for MD simulations. Since
  the problem size in MD is typically small, we have to target strong scali
 ng. Hence, the application tends to be latency- and synchronization-critic
 al. Therefore, offloading as the classical programming model for GPUs is u
 nfeasible. The poster highlights our experience with the design and implem
 entation of tasking as alternative programming model for GPUs using CUDA. 
 We describe the tasking approach for GPUs based on the design of our taski
 ng approach for CPUs. Following this, we reveal several pitfalls implement
 ing it. Among others, we consider warp-synchronous deadlocks, weak memory 
 consistency and hierarchical multi-producer multi-consumer queues. Finally
 , we provide first performance results of a prototypic implementation.
END:VEVENT
END:VCALENDAR

