PPoPP '14- Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Full Citation in the ACM Digital Library

SESSION: Session order 1: opening and conference keynote address

21st century computer architecture

SESSION: Session order 2: bugs session

PREDATOR: predictive false sharing detection

Concurrency testing using schedule bounding: an empirical study

Trace driven dynamic deadlock detection and reproduction

Efficient search for inputs causing high floating-point errors

SESSION: Session order 3: HPC session

X10 and APGAS at Petascale

Resilient X10: efficient failure-aware programming

Portable, MPI-interoperable coarray fortran

SESSION: Session order 4: GPU session

CUDA-NP: realizing nested thread-level parallelism in GPGPU applications

yaSpMV: yet another SpMV framework on GPUs

Singe: leveraging warp specialization for high performance on GPUs

SESSION: Session order 5: synchronization session

Eliminating global interpreter locks in ruby through hardware transactional memory

Leveraging hardware message passing for efficient thread synchronization

Well-structured futures and cache locality

Time-warp: lightweight abort minimization in transactional memory

SESSION: Session order 6: PPoPP keynote address

Beyond parallel programming with domain specific languages

SESSION: Session order 7: algorithms session

Designing and auto-tuning parallel 3-D FFT for computation-communication overlap

A decomposition for in-place matrix transposition

In-place transposition of rectangular matrices on accelerators

Parallelizing dynamic programming through rank convergence

SESSION: Session order 8: programming systems session

Revisiting loop fusion in the polyhedral framework

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

A tool to analyze the performance of multithreaded programs on NUMA architectures

SESSION: Session order 9: scheduling and determinism session

Towards fair and efficient SMP virtual machine scheduling

Efficient deterministic multithreading without global barriers

Race directed scheduling of concurrent programs

SESSION: Session order 10: conference keynote address

Heterogeneous computing: what does it mean for compiler research?

SESSION: Session order 11: non-blocking data structures session

Fast concurrent lock-free binary search trees

A general technique for non-blocking trees

Practical concurrent binary search trees via logical ordering

A practical wait-free simulation for lock-free data structures

POSTER SESSION: Session order 11: poster session

Lock contention aware thread migrations

Infrastructure-free logging and replay of concurrent execution on multiple cores

Parallelization hints via code skeletonization

Concurrency bug localization using shared memory access pairs

Task mapping stencil computations for non-contiguous allocations

Data structures for task-based priority scheduling

Detecting silent data corruption through data dynamic monitoring for scientific applications

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs

Automatic semantic locking

Optimistic transactional boosting

Provably good scheduling for parallel programs that use data structures through implicit batching

Theoretical analysis of classic algorithms on highly-threaded many-core GPUs

SCCMulti: an improved parallel strongly connected components algorithm

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems

Extracting logical structure and identifying stragglers in parallel execution traces