Nominated Papers (Reverse Chronological Order)
2008
Nominated November 2009:
Title: FastTrack: Efficient and Precise Dynamic Race Detection
Authors: Cormac Flanagan and Stephen N. Freund
Venue: PLDI 2009
Nomination Statement: Past work on dynamic data race detection had to choose between fast but imprecise race detectors that report false alarms vs. slow but precise race detectors that never report false alarms. The latter category is typically based on classical "vector clock" operations, where a single operation requires time linear in the number of program threads. The key insight behind the FastTrack dynamic data race detection algorithm introduced in this paper is that the full generality of vector clocks is not necessary in over 99% of these read and write operations in practice. Instead, a more lightweight representation of the happens-before information can be constructed using "epochs". Experimental results on Java benchmarks show that this approach is an order of magnitude faster than a traditional vector-clock race detector, thereby representing a significant advance over classical work in this area. This paper includes a comprehensive comparison with past work, including the previously nominated "Goldilocks" paper. Though both papers address the same high-level problem, they take very different approaches. FastTrack has shown how past algorithms based on vector clocks can be made significantly more efficient and scalable, thereby providing a perspective on a large body of past work in this area. Another SIGPLAN-nominated Research Highlights paper, “Goldilocks”, is also highly innovative and takes a different approach that is not based on vector clocks.
Nominated June 2009:
Title: The Theory of Deadlock Avoidance via Discrete Control
Authors: Yin Wang, Stephanie Lafortune, Terence Kelly, Manjunath Kudlur, and Scott Mahlke
Venue: POPL 2009
Nomination Statement: Deadlock is a difficult problem for the average programmer who must now confront multicore hardware. The Gadara project offers a very interesting possible solution. The attached POPL paper provides a theoretical foundation for dynamic deadlock avoidance in concurrent programs that employ conventional mutual exclusion and synchronization primitives. Beginning with control flow graphs extracted from program source code, it constructs a formal model of the program and then applies Discrete Control Theory to automatically synthesize deadlock avoidance control logic that is implemented by program instrumentation. At run time, the control logic avoids deadlocks by postponing lock acquisitions. Discrete Control Theory guarantees that the instrumented program cannot deadlock. The control logic is maximally permissive: it postpones lock acquisitions only when necessary to prevent deadlocks, and therefore permits maximal runtime concurrency. Additonal publications on the prototype system have appeared in top fora in DCT, systems, and programming languages: WODES '08, OSDI '08. The OSDI paper is attached as well. This research is a very nice combination of rigor and novelty. The authors have a working prototype and have applied it to real software including Apache, BIND, and OpenLDAP; it works as advertised, with very tolerable performance overhead. This approach potentially makes a significant contribution to enhancing the programmability of multicore processors for the mainstream.
Title: Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time
Author: Fritz Henglein
Venue: ICFP 2008
Nomination Statement: Everyone knows that sorting a collection of n elements requires a number of comparisons proportional to n log n. Distribution sort over certain kinds of data works in time linear in the number of elements, but is a "specialty" sort, useful only in special circumstances. After reading this paper, you may reverse this idea. Henglein shows how to generalize distribution sort to a wide range of data. Assuming a base type is suitable for distribution sort, he shows how products, sums, and lists of such data are also suitable for distribution sort, relegating n log n sorts to the "specialty" case. Henglein argues convincingly that the right interface for an abstract data type to supply is not comparison and not sorting, but a mild generalization of sorting which he calls a discriminator, in terms of which the other two may be defined. While the technique may be applied in any programming language, it fits particularly well with a functional language where it is easy to compose sort operators on component types to build sort operators on structures, and where advanced type systems allow one to ensure type safety while doing so. Henglein presents complete code in Haskell. We should expect the technique to spread to other languages; the new notion of C++ concepts, which resemble Haskell type classes, are well suited for supporting defining functions based on type structure. The paper is a joy to read. (Addendum)
Nominated January 2009:
Title: Goldilocks: A Race and Transaction-Aware Java Runtime
Authors: Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran
Venue: PLDI 2007
Nomination Statement: The paper describes a precise and reasonably fast data-race-detection algorithm. Their implementation takes the form of a modified JVM that detects all data races as defined by the Java memory model, and throws an exception when a race is encountered. It also supports transactional memory. As far as I know, this is the first such implementation that is fast enough to at least raise the question of whether this could be done routinely as part of production code execution. Although the performance measurements in the paper are not such that we can declare this a solved problem, I think this paper points at an interesting possible solution path to two hard (and, thanks to the widespread adoption of multicore processors) critical parallel programming problems: 1. Dealing with data races is the most difficult aspect of parallel program debugging. Since such bugs are usually not easily reproducible, they may escape testing, and may go unreported for long periods of time. Debugging may require months, since the actual data race is often no longer visible when a failure results, requiring the problem be reproduced repeatedly with increasingly instrumented programs. An exception at the time of the race avoids this problem. 2. (Not pointed out in the paper, since I think nobody appreciated it at the time.) It has become increasingly clear that we do not know how to fully specify the semantics of a multithreaded language like Java that, for security reasons, must specify some semantics for data races. The Java memory model (POPL 05) is the most successful attempt, but it has recently become clear that it too has some surprising and undesirable properties. If we could requrie that data races produce exceptions, these issues disappear. I believe there is now some ongoing work that looks at hardware support for data-race detection, that might greatly reduce the performance costs, and eventually make the "data-race as exception" model viable for a much larger class of applications. This paper is a great first step toward something that might actually result in a major advance in shared memory parallel programming, whether it uses transactional memory, or continues to be based on locks.
Title: Sound, Complete and Scalable Path-Sensitive Analysis
Author: Isil Dillig, Thomas Dillig, and Alex Aiken
Venue: PLDI 2008
Published as a CACM Research Highlight (August 2010). Retitled as: Reasoning About the Unknown in Static Analysis
Nomination Statement: This paper appeared in PLDI 2008 and was coauthored by Alex Aiken and two of his Ph.D. students, Isil Dillig and Thomas Dillig. It provides an elegant and scalable solution to a decades-old challenge in the area of program analysis viz., how to perform precise intra- and inter-procedural path-sensitive program analysis without incurring the exponential increase in space and time that arises in a naive code duplication approach? And how to perform such an analysis for recursive programs? They introduce a new approach based on constraint systems to address this problem by defining constraints that capture the exact path- and context-sensitive condition under which a program property holds. The resulting constraint system may not be solvable in general. However, to answer a "may" or a "must" query regarding a program property, they can extract necessary or sufficient conditions from the constraint system which can be solved i.e., they specialize the exact path condition to the query being asked. Further, these extracted necessary/sufficient conditions are typically small enough to make the approach scalable to large software -- they have implemented this approach in the Saturn system and used it to analyze over 6MLOC in the entire Linux kernel. They also use "unobservable variables" to model interactions with the environment (such as user input) as well as to support modular program analysis. Their first set of experimental results quantify the size of necessary and sufficient conditions for pointer operations. Their second set of results applies their approach to identifying null-dereference errors. Both experiments were performed on OpenSSH, Samba, and the Linux kernel. The results for null-dereference analysis in Figure 7 compare inter- and intra-procedural path-sensitive analysis with just intra-procedural path-sensitive analysis, and shows that the former results in order-of-magnitude reductions in false positive reports compared to the latter. Like my other nominations, I believe that the authors can do a good job of rewriting the content of this paper for the general CACM audience. Some of the formalism in the current version of the paper may not be accessible to a large audience, and will need to be toned down in the CACM version. However, the basic concepts of inter- and intra-procedural path-sensitive analysis can be conveyed by simple examples such as the queryUser() example discussed in Section 1. This paper is also a good illustration of how far approaches such as constraint systems have progressed for program analysis tools.
Nominated September 2008:
Title: Formal Certification of a Compiler Back-end or: Programming a Compiler with a Proof Assistant
Author: Xavier Leroy
Venue: POPL 2006
Published as a CACM Research Highlight (July 2009). Retitled as: Formal Verification of a Realistic Compiler
Nomination Statement: In 2003, JACM published Tony Hoare's grand challenge paper on The Verifying Compiler. A verifying compiler guarantees correctness of a program before running it. Meeting such a challenge would mean that researchers have to make great progress on both how compilers are written and on verification technology. Leroy's paper presents the first *formally-verified* proof of the correctness of a *complete compilation chain*. "Formally verified" means that the entire proof is verified by the Coq proof assistant. "Complete compilation chain" means that, instead of verifying an isolated transformation or optimisation, Leroy has verified the whole chain of a lightly optimising compiler from a (deliberately modest) high level language called CMinor to PowerPC machine code, by way of four different intermediate languages. The paper shows how to certify that when the compiler generates code, then safety properties proved on the source program will hold for the executable code as well. We believe many researchers and practitioners will be interested in a glimpse of the progress on verifying software we all depend on.
Title: The Next 700 Data Description Languages
Authors: Kathleen Fisher, Yitzhak Mandelbaum, and David Walker
Venue: POPL 2006
Nomination Statement: Network administrators, financial analysts, scientists, and many others represent data in a myriad of ad hoc formats such as variable-width ASCII records and data-dependent number of fixed-width binary records. Those formats typically don't have parsing, querying, analysis, or transformation tools readily available. Thus motivated, researchers (including Fisher et al) have developed a variety of ad hoc data description languages. What is still missing, though, is a semantics and meta-theory for such languages, or anything to play the role the lambda calculus plays in underpinning programming language design and implementation. That is the gap that this paper fills, by defining a core calculus called the Data Dependency Calculus, and giving it a formal semantics and meta-theory. The paper demonstrates a formal translation from one data description language (IPADS) into DDC, and illustrates how features from other languages might similarly be translated. This is the first time anyone has tried to give a formal treatment to the wild and woolly world of data description languages. We believe many researchers and practitioners will be interested in catching up with what programming language technology can offer for solving common real-world problems related to ad hoc data.
Title: Scalable Synchronous Queues
Authors: William N. Scherer III, Doug Lea, and Michael L. Scott
VENUE: PPoPP 2006
Published as a CACM Research Highlight (May 2009)
Nomination Statement: Describes "synchronous queues", a fundamental data structure used to implement hand-offs or "rendezvous" between threads. This implementation was immediately integrated into the java.util.concurrent component of Sun's standard Java library, and has thus already had practical impact. The data structure is commonly used with the Java Executor framework and, as is shown by measurements in the paper, may rather spectacularly impact the performance of some applications using it. This is in some sense the canonical data structure that involves blocking or waiting, and for which conventional sequential performance metrics are hence inapplicable. It's a great example of how to design a data structure, for which, in a sense, only concurrency considerations matter. We believe this will expose readers to an area of increasing importance that is very different from the kind of algorithmic problems most of them will be familiar with. We have confidence that the authors will be able to retarget the paper at a much broader audience than that targeted by the current paper.
Title: Java Takes Flight: Time-portable Real-time Programming with Exotasks
Authors: Joshua Auerbach, David F. Bacon, Daniel T. Iercan, Christoph M. Kirsch, V. T. Rajan, Harald Roeck and Rainer Trummer
Venue: LCTES 2007
Nomination Statement: Java and real-time are terms that aren't often used together. This paper changes that. Real-time systems programming involves low levels of abstraction, making real-time systems difficult to test and requiring them to be re-certified every time the software or hardware environment changes. This paper introduces Exotasks to Java to raise the level of abstraction. Exotasks add deterministic timing, even in the presence of other Java threads and across changes of hardware and software platform. The authors built a quad-rotor model helicopter, the JAviator, to test the concepts of the work. They tested the system with actual flights. When the system fails, the helicopter crashes! Beyond the richness of the technical details, this paper should appeal to CACM readers. There is a real problem, clever language solutions, and an interesting application demonstration complete with photographs. The work is conducted within the broader context of IBM's effort to establish a common Java-based real time programming environment.
Title: Fault-Tolerant Typed Assembly Language
Authors: Frances Perry, Lester Mackey, George A. Reis, Jay Ligattii, David I. August, and David Walker
VENUE: PLDI 2007
Nomination Statement: This paper was one of two best papers from PLDI 2007. It spans the areas of fault-tolerant computing, type systems, and production-strength compilers to provide a unique approach to formalization and detection of Single Event Upset (SEU) faults, motivated by the increasing number of transient hardware faults observed in real systems. Their approach builds on the general strategy of using redundancy for fault tolerance by executing two independent copies of the program in two separate threads. Prior to writing data out to a memory-mapped output device, the results of the two computations are checked for equivalence. (This research focuses on fault detection in processors, and assumes that other techniques such as error-correcting codes are used to achieve fault tolerance in memories.) They introduce a Fault-Tolerant Typed Assembly Language, TALFT, that ensures that all well-typed programs exhibit fail-safe behavior in the presence of transient faults i.e., a corrupt value can never be written to a memory-mapped output device. TALFT can help identify cases when conventional optimizations may not be sound from the viewpoint of fault tolerance. Finally, they implemented their reliability transformations in the Princeton VELOCITY compiler, and provided empirical results for 22 SPECint2000 and MediaBench benchmarks. These results show that the overhead of fault tolerance is about 34% on average instead of the 100% that may have been expected from dual redundancy. I especially like this paper for CACM because it spans a number of different areas, and I believe that the authors can do a good job of rewriting the content of this paper for the general CACM audience. The basic single-event-upset fault model is well motivated by current technology trends and yet simple to understand. Overall, the paper should be an eye-opener to folks who equate compilers with the Dragon book and haven't been keeping up with the latest research in compiler technology such as the fault tolerance application studied in this paper.
Nominated January 2008:
Title: Dynamic Multigrained Parallelization on the Cell Broadband Engine
Authors: Filip Blagojevic, Dimitris S. Nikolopoulos, Alexandros Stamatakis, and Christos D. Antonopoulos
VENUE: PPoPP 2007
Nomination Statement: With the advent of multicores, parallel programming is again one of the most important issues in computer science. This paper describes and evaluates a scheduling strategy for parallel programs executing on an IBM cell processor. It exploits coarse grain parallelism whenever there is an adequate supply of tasks. Otherwise, it attempts to exploit fine grain-parallelism. Fine grain parallelism is given lower priority because it tends to incur more overhead than coarse-grain parallelism. The target machine has an interesting, much talked about architecture. It is a heterogeneous machine, a likely characteristic of future multicores. The application is representative of an important class of applications that contain relatively simple parallelism at multiple levels. The paper describes an excellent case study in parallel programming that illustrates some of the immense difficulties that programmers are likely to face in the brave new world of multicore systems.
Note: This paper was nominated by an ad hoc SIGPLAN Committee prior to the creation of the SIGPLAN CACM Research Highlights Nominating Committee. |