Microarchitectural code analyzers, i.e., tools that estimate the throughput of machine code basic blocks, are important utensils in the tool belt of performance engineers. Recent tools like llvm-mca, uiCA, and Ithemal use a variety of techniques and different models for their throughput predictions. When put to the test, it is common to see these state-of-the-art tools give very different results. These inconsistencies are either errors, or they point to different and rarely documented assumptions made by the tool designers.
In this paper, we present AnICA, a tool taking inspiration from differential testing and abstract interpretation to systematically analyze inconsistencies among these code analyzers. Our evaluation shows that AnICA can summarize thousands of inconsistencies in a few dozen descriptions that directly lead to high-level insights into the different behavior of the tools. In several case studies, we further demonstrate how AnICA automatically finds and characterizes known and unknown bugs in llvm-mca, as well as a quirk in AMD's Zen microarchitectures.
Algebraic effects and handlers are a promising technique for incorporating composable computational effects into functional programming languages. Effect handlers enable concisely programming with different effects, but they do not offer a convenient way to program with different instances of the same effect. As a solution to this inconvenience, previous studies have introduced _named effect handlers_, which allow the programmer to distinguish among different effect instances. However, existing formalizations of named handlers are both involved and restrictive, as they employ non-standard mechanisms to prevent the escaping of handler names.
In this paper, we propose a simple and flexible design of named handlers. Specifically, we treat handler names as first-class values, and prevent their escaping while staying within the ordinary λ-calculus. Such a design is enabled by combining named handlers with _scoped effects_, a novel variation of effects that maintain a scope via rank-2 polymorphism. We formalize two combinations of named handlers and scoped effects, and implement them in the Koka programming language. We also present practical applications of named handlers, including a neural network and a unification algorithm.
Real-time systems power safety-critical applications that require strong isolation among each other. Such isolation needs to be enforced at two orthogonal levels. On the micro-architectural level, this mainly involves avoiding interference through micro-architectural states, such as cache lines. On the algorithmic level, this is usually achieved by adopting real-time partitions to reserve resources for each application. Implementations of such systems are often complex and require formal verification to guarantee proper isolation.
In this paper, we focus on algorithmic isolation, which is mainly related to scheduling-induced interferences. We address earliest-deadline-first (EDF) partitions to achieve compositionality and utilization, while imposing constraints on tasks' periods and enforcing budgets on these periodic partitions to ensure isolation between each other. The formal verification of such a real-time OS kernel is challenging due to the inherent complexity of the dynamic priority assignment on the partition level. We tackle this problem by adopting a dynamically constructed abstraction to lift the reasoning of a concrete scheduler into an abstract domain. Using this framework, we verify a real-time operating system kernel with budget-enforcing EDF partitions and prove that it indeed ensures isolation between partitions. All the proofs are mechanized in Coq.
Random data generators can be thought of as parsers of streams of randomness. This perspective on generators for random data structures is established folklore in the programming languages community, but it has never been formalized, nor have its consequences been deeply explored.
We build on the idea of freer monads to develop free generators, which unify parsing and generation using a common structure that makes the relationship between the two concepts precise. Free generators lead naturally to a proof that a monadic generator can be factored into a parser plus a distribution over choice sequences. Free generators also support a notion of derivative, analogous to the familiar Brzozowski derivatives of formal languages, allowing analysis tools to "preview" the effect of a particular generator choice. This gives rise to a novel algorithm for generating data structures satisfying user-specified preconditions.
We propose a family of logical theories for capturing an abstract notion of consistency and show how to build a generic and efficient theory solver that works for all members in the family. The theories can be used to model the influence of memory consistency models on the semantics of concurrent programs. They are general enough to precisely capture important examples like TSO, POWER, ARMv8, RISC-V, RC11, IMM, and the Linux kernel memory model. To evaluate the expressiveness of our theories and the performance of our solver, we integrate them into a lazy SMT scheme that we use as a backend for a bounded model checking tool. An evaluation against related verification tools shows, besides flexibility, promising performance on challenging programs under complex memory models.
This paper presents a Hoare-style calculus for formal reasoning about reconfiguration programs of distributed systems. Such programs create and delete components and/or interactions (connectors) while the system components change state according to their internal behaviour. Our proof calculus uses a resource logic, in the spirit of Separation Logic, to give local specifications of reconfiguration actions. Moreover, distributed systems with an unbounded number of components are described using inductively defined predicates. The correctness of reconfiguration programs relies on havoc invariants, that are assertions about the ongoing interactions in a part of the system that is not affected by the structural change caused by the reconfiguration. We present a proof system for such invariants in an assume/rely-guarantee style. We illustrate the feasibility of our approach by proving the correctness of real-life distributed systems with reconfigurable (self-adjustable) tree architectures.
A common approach to defining domain-specific languages (DSLs) is via a direct embedding into a host language. There are several well-known techniques to do such embeddings, including shallow and deep embeddings. However, such embeddings come with various trade-offs in existing programming languages. Owing to such trade-offs, many embedded DSLs end up using a mix of approaches in practice, requiring a substantial amount of code, as well as some advanced coding techniques.
In this paper, we show that the recently proposed Compositional Programming paradigm and the CP language provide improved support for embedded DSLs. In CP we obtain a new form of embedding, which we call a compositional embedding, that has most of the advantages of both shallow and deep embeddings. On the one hand, compositional embeddings enable various forms of linguistic reuse that are characteristic of shallow embeddings, including the ability to reuse host-language optimizations in the DSL and add new DSL constructs easily. On the other hand, similarly to deep embeddings, compositional embeddings support definitions by pattern matching or dynamic dispatching (including dependent interpretations, transformations, and optimizations) over the abstract syntax of the DSL and have the ability to add new interpretations. We illustrate an instance of compositional embeddings with a DSL for document authoring called ExT. The DSL is highly flexible and extensible, allowing users to create various non-trivial extensions easily. For instance, ExT supports various extensions that enable the production of wiki-like documents, LaTeX documents, vector graphics or charts. The viability of compositional embeddings for ExT is evaluated with three applications.
Invariant generation is a classical problem to automatically generate invariants to aid the formal analysis of programs. In this work, we consider the problem of generating tight linear-invariants over affine programs (i.e., programs with affine guards and updates) without a prescribed goal property. In the literature, the only known sound and complete characterization to solve this problem is via Farkas’ Lemma (FL), and has been implemented through either quantifier elimination or reasonable heuristics. Although FL-based approaches can generate highly accurate linear invariants from the completeness of FL, the main bottleneck to applying these approaches is the scalability issue caused by either non-linear constraints or combinatorial explosion. We base our approach on the only practical FL-based approach [Sankaranarayanan et al., SAS 2004] that applies FL with reasonable heuristics, and develop two novel and independent improvements to leverage the scalability. The first improvement is the novel idea to generate invariants at one program location in a single invariant-generation process, so that the invariants for each location are generated separately rather than together in a single computation. This idea naturally leads to a parallel processing that divides the invariant-generation task for all program locations by assigning the locations separately to multiple processors. Moreover, the idea enables us to develop detailed technical improvements to further reduce the combinatorial explosion in the original work [Sankaranarayanan et al., SAS 2004]. The second improvement is a segmented subsumption testing in the CNF-to-DNF expansion that allows discovering more local subsumptions in advance. We formally prove that our approach has the same accuracy as the original work and thus does not incur accuracy loss on the generated invariants. Moreover, experimental results on representative benchmarks involving non-trivial linear invariants demonstrate that our approach improves the runtime of the original work by several orders of magnitude, even in the non-parallel scenario that sums up the execution time for all program locations. Hence, our approach constitutes the first significant improvement in FL-based approaches for linear invariant generation after almost two decades.
Block-based programming environments, already popular in computer science education, have been successfully used to make programming accessible to end-users in domains like robotics, mobile apps, and even DevOps. Most studies of these applications have examined small programs that fit within a single screen, yet real-world programs often grow large, and editing these large block-based programs quickly becomes unwieldy. Traditional programming language features, like functions, allow programmers to decompose their programs. Unfortunately, both previous work, and our own findings, suggest that end-users rarely use these features, resulting in large monolithic code blocks that are hard to understand. In this work, we introduce a block-based system that provides users with a hierarchical, domain-specific program structure and requires them to decompose their programs accordingly. Through a user study with 92 users, we compared this approach, which we call guided program decomposition, to a traditional system that supports functions, but does not require decomposition. We found that while almost all users could successfully complete smaller tasks, those who decomposed their programs were significantly more successful as the tasks grew larger. As expected, most users without guided decomposition did not decompose their programs, resulting in poor performance on larger problems. In comparison, users of guided decomposition performed significantly better on the same tasks. Though this study investigated only a limited selection of tasks in one specific domain, it suggests that guided decomposition can benefit end-user programmers. While no single decomposition strategy fits all domains, we believe that similar domain-specific sub-hierarchies could be found for other application areas, increasing the scale of code end-users can create and understand.
Emerging quantum algorithms for problems such as element distinctness, subset sum, and closest pair demonstrate computational advantages by relying on abstract data structures. Practically realizing such an algorithm as a program for a quantum computer requires an efficient implementation of the data structure whose operations correspond to unitary operators that manipulate quantum superpositions of data.
To correctly operate in superposition, an implementation must satisfy three properties --- reversibility, history independence, and bounded-time execution. Standard implementations, such as the representation of an abstract set as a hash table, fail these properties, calling for tools to develop specialized implementations.
In this work, we present Core Tower, the first language for quantum programming with random-access memory. Core Tower enables the developer to implement data structures as pointer-based, linked data. It features a reversible semantics enabling every valid program to be translated to a unitary quantum circuit.
We present Boson, the first memory allocator that supports reversible, history-independent, and constant-time dynamic memory allocation in quantum superposition. We also present Tower, a language for quantum programming with recursively defined data structures. Tower features a type system that bounds all recursion using classical parameters as is necessary for a program to execute on a quantum computer.
Using Tower, we implement Ground, the first quantum library of data structures, including lists, stacks, queues, strings, and sets. We provide the first executable implementation of sets that satisfies all three mandated properties of reversibility, history independence, and bounded-time execution.
Hypersafety properties of arity n are program properties that relate n traces of a program (or, more generally, traces of n programs). Classic examples include determinism, idempotence, and associativity. A number of relational program logics have been introduced to target this class of properties. Their aim is to construct simpler proofs by capitalizing on structural similarities between the n related programs. We propose an unexplored, complementary proof principle that establishes hyper-triples (i.e. hypersafety judgments) as a unifying compositional building block for proofs, and we use it to develop a Logic for Hyper-triple Composition (LHC), which supports forms of proof compositionality that were not achievable in previous logics. We prove LHC sound and apply it to a number of challenging examples.
Today’s distributed systems must satisfy both qualitative and quantitative properties. These properties are analyzed using very different formal frameworks: expressive untimed and non-probabilistic frameworks, such as TLA+ and Hoare/separation logics, for qualitative properties; and timed/probabilistic-automaton-based ones, such as Uppaal and Prism, for quantitative ones. This requires developing two quite different models of the same system, without guarantees of semantic consistency between them. Furthermore, it is very hard or impossible to represent intrinsic features of distributed object systems—such as unbounded data structures, dynamic object creation, and an unbounded number of messages—using finite automata.
In this paper we bridge this semantic gap, overcome the problem of manually having to develop two different models of a system, and solve the representation problem by: (i) defining a transformation from a very general class of distributed systems (a generalization of Agha’s actor model) that maps an untimed non-probabilistic distributed system model suitable for qualitative analysis to a probabilistic timed model suitable for quantitative analysis; and (ii) proving the two models semantically consistent. We formalize our models in rewriting logic, and can therefore use the Maude tool to analyze qualitative properties, and statistical model checking with PVeStA to analyze quantitative properties. We have automated this transformation and integrated it, together with the PVeStA statistical model checker, into the Actors2PMaude tool. We illustrate the expressiveness of our framework and our tool’s ease of use by automatically transforming untimed, qualitative models of numerous distributed system designs—including an industrial data store and a state-of-the-art transaction system—into quantitative models to analyze and compare the performance of different designs.
The size of a data structure (i.e., the number of elements in it) is a widely used property of a data set. However, for concurrent programs, obtaining a correct size efficiently is non-trivial. In fact, the literature does not offer a mechanism to obtain a correct (linearizable) size of a concurrent data set without resorting to inefficient solutions, such as taking a full snapshot of the data structure to count the elements, or acquiring one global lock in all update and size operations. This paper presents a methodology for adding a concurrent linearizable size operation to sets and dictionaries with a relatively low performance overhead. Theoretically, the proposed size operation is wait-free with asymptotic complexity linear in the number of threads (independently of data-structure size). Practically, we evaluated the performance overhead by adding size to various concurrent data structures in Java−a skip list, a hash table and a tree. The proposed linearizable size operation executes faster by orders of magnitude compared to the existing option of taking a snapshot, while incurring a throughput loss of 1%−20% on the original data structure’s operations.
In this paper, we show that the unsoundness discovered by Amin and Tate (2016) in Java’s wildcards is avoidable, even in the absence of a nullness-aware type system. The key insight of this paper is that soundness in type systems that implicitly introduce existential types through subtyping hinges on still making sure there are suitable witness types when introducing existentially quantified type variables. To show that this approach is viable, this paper formalizes a core calculus and proves it sound. We used a static analysis based on our approach to look for potential issues in a vast corpus of Java code and found none (with 1 false positive). This confirms both that Java's unsoundness has minimal practical consequence, and that our approach can avoid it entirely with minimal false positives.
Integrated Development Environments (IDEs) provide tool support to automate many source code editing tasks. Traditionally, IDEs use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer’s next edit, and thus IDEs generate many suggestions at a location. Therefore, IDEs generally do not actively offer suggestions and instead, the developer is usually required to click on a specific icon or menu and then select from a large list of potential suggestions. As a consequence, developers often miss the opportunity to use the tool support because they are not aware it exists or forget to use it. To better understand common patterns in developer behavior and produce better edit recommendations, we can additionally use the temporal context, i.e., the edits that a developer was recently performing. To enable edit recommendations based on temporal context, we present Overwatch, a novel technique for learning edit sequence patterns from traces of developers’ edits performed in an IDE. Our experiments show that Overwatch has 78% precision and that Overwatch not only completed edits when developers missed the opportunity to use the IDE tool support but also predicted new edits that have no tool support in the IDE.
Fast analysis response times in IDEs are essential for a good editor experience. Incremental type-checking can provide that in a scalable fashion. However, existing techniques are not reusable between languages. Moreover, mutual and dynamic dependencies preclude traditional approaches to incrementality. This makes finding automatic approaches to incremental type-checking a challenging but important open question.
In this paper, we present a technique that automatically derives incremental type-checkers from type system specifications written in the Statix meta-DSL. We use name resolution queries in scope graphs (a generic model of name binding embedded in Statix) to derive dependencies between compilation units. A novel query confirmation algorithm finds queries for which the answer changed due to an edit in the program. Only units with such queries require reanalysis. The effectiveness of this algorithm is improved by (1) splitting the type-checking task into a context-free and a context-sensitive part, and (2) reusing a generic mechanism to resolve mutual dependencies. This automatically yields incremental type-checkers for any Statix specification.
Compared to non-incremental parallel execution, we achieve speedups up to 147x on synthetic benchmarks, and up to 21x on real-world projects, with initial overheads below 10%. This suggests that our framework can provide efficient incremental type-checking to the wide range of languages supported by Statix.
Intersection and union types are becoming more popular by the day, entering the mainstream in programming languages like TypeScript and Scala 3. Yet, no language so far has managed to combine these powerful types with principal polymorphic type inference. We present a solution to this problem in MLstruct, a language with subtyped records, equirecursive types, first-class unions and intersections, class-based instance matching, and ML-style principal type inference. While MLstruct is mostly structurally typed, it contains a healthy sprinkle of nominality for classes, which gives it desirable semantics, enabling the expression of a powerful form of extensible variants that does not need row variables. Technically, we define the constructs of our language using conjunction, disjunction, and negation connectives, making sure they form a Boolean algebra, and we show that the addition of a few nonstandard but sound subtyping rules gives us enough structure to derive a sound and complete type inference algorithm. With this work, we hope to foster the development of better type inference for present and future programming languages with expressive subtyping systems.
Interactive proofs of theorems often require auxiliary helper lemmas to prove the desired theorem. Existing approaches for automatically synthesizing helper lemmas fall into two broad categories. Some approaches are goal-directed, producing lemmas specifically to help a user make progress from a given proof state, but they have limited expressiveness in terms of the lemmas that can be produced. Other approaches are highly expressive, able to generate arbitrary lemmas from a given grammar, but they are completely undirected and hence not amenable to interactive usage.
In this paper, we develop an approach to lemma synthesis that is both goal-directed and expressive. The key novelty is a technique for reducing lemma synthesis to a data-driven program synthesis problem, whereby examples for synthesis are generated from the current proof state. We also describe a technique to systematically introduce new variables for lemma synthesis, as well as techniques for filtering and ranking candidate lemmas for presentation to the user. We implement these ideas in a tool called lfind, which can be run as a Coq tactic. In an evaluation on four benchmark suites, lfind produces useful lemmas in 68% of the cases where a human prover used a lemma to make progress. In these cases lfind synthesizes a lemma that either enables a fully automated proof of the original goal or that matches the human-provided lemma.
We propose a new technique based on program synthesis for automatically generating visualizations from natural language queries. Our method parses the natural language query into a refinement type specification using the intents-and-slots paradigm and leverages type-directed synthesis to generate a set of visualization programs that are most likely to meet the user's intent. Our refinement type system captures useful hints present in the natural language query and allows the synthesis algorithm to reject visualizations that violate well-established design guidelines for the input data set. We have implemented our ideas in a tool called Graphy and evaluated it on NLVCorpus, which consists of 3 popular datasets and over 700 real-world natural language queries. Our experiments show that Graphy significantly outperforms state-of-the-art natural language based visualization tools, including transformer and rule-based ones.
Since executing a smart contract on the Ethereum blockchain costs money (measured in gas), smart contract developers spend significant effort in reducing gas usage. In this paper, we propose a new technique for reducing the gas usage of smart contracts by changing the underlying data layout. Given a smart contract P and a type-level transformation, our method automatically synthesizes a new contract P′ that is functionally equivalent to P. Our approach provides a convenient DSL for expressing data type refactorings and employs program synthesis to generate the new version of the contract. We have implemented our approach in a tool called Solidare and demonstrate its capabilities on real-world smart contracts from Etherscan and GasStation. In particular, we show that our approach is effective at automating the desired data layout transformation and that it is useful for reducing gas usage of smart contracts that use rich data structures.
Quantum algorithms often apply classical operations, such as arithmetic or predicate checks, over a quantum superposition of classical data; these so-called oracles are often the largest components of a quantum program. To ease the construction of efficient, correct oracle functions, this paper presents VQO, a high-assurance framework implemented with the Coq proof assistant. The core of VQO is OQASM, the oracle quantum assembly language. OQASM operations move qubits between two different bases via the quantum Fourier transform, thus admitting important optimizations, but without inducing entanglement and the exponential blowup that comes with it. OQASM’s design enabled us to prove correct VQO’s compilers—from a simple imperative language called OQIMP to OQASM, and from OQASM to SQIR, a general-purpose quantum assembly language—and allowed us to efficiently test properties of OQASM programs using the QuickChick property-based testing framework. We have used VQO to implement a variety of arithmetic and geometric operators that are building blocks for important oracles, including those used in Shor’s and Grover’s algorithms. We found that VQO’s QFT-based arithmetic oracles require fewer qubits, sometimes substantially fewer, than those constructed using “classical” gates; VQO’s versions of the latter were nevertheless on par with or better than (in terms of both qubit and gate counts) oracles produced by Quipper, a state-of-the-art but unverified quantum programming platform.
Component-based synthesis seeks to build programs using the APIs provided by a set of libraries. Oftentimes, these APIs have effects, which make it challenging to reason about the correctness of potential synthesis candidates. This is because changes to global state made by effectful library procedures affect how they may be composed together, yielding an intractably large search space that can confound typical enumerative synthesis techniques. If the nature of these effects are exposed as part of their specification, however, deductive synthesis approaches can be used to help guide the search for components. In this paper, we present a new specification-guided synthesis procedure that uses Hoare-style pre- and post-conditions to express fine-grained effects of potential library component candidates to drive a bi-directional synthesis search strategy. The procedure alternates between a forward search process that seeks to build larger terms given an existing context but which is otherwise unaware of the actual goal, alongside a backward search mechanism that seeks terms consistent with the desired goal but which is otherwise unaware of the context from which these terms must be synthesized. To further improve efficiency and scalability, we integrate a conflict-driven learning procedure into the synthesis algorithm that provides a semantic characterization of previously encountered unsuccessful search paths that is used to prune the space of possible candidates as synthesis proceeds. We have implemented our ideas in a tool called and demonstrate its effectiveness on a number of challenging synthesis problems defined over OCaml libraries equipped with effectful specifications.
WebAssembly (Wasm) is a compact, well-specified bytecode format that offers a portable compilation target with near-native execution speed. The bytecode format was specifically designed to be fast to parse, validate, and compile, positioning itself as a portable alternative to native code. It was pointedly not designed to be interpreted directly. Instead, design considerations at the time focused on competing with native code, utilizing optimizing compilers as the primary execution tier. Yet, in JIT scenarios, compilation time and memory consumption critically impact application startup, leading many Wasm engines to later deploy faster single-pass (baseline) compilers. Though faster, baseline compilers still take time and waste code space for infrequently executed code. A typical interpreter being infeasible, some engines resort to compiling Wasm not to machine code, but to a more compact, but easy to interpret format. This still takes time and wastes memory. Instead, we introduce in this article a fast in-place interpreter for WebAssembly, where no rewrite and no separate format is necessary. Our evaluation shows that in-place interpretation of Wasm code is space-efficient and fast, achieving performance on-par with interpreting a custom-designed internal format. This fills a hole in the execution tier space for Wasm, allowing for even faster startup and lower memory footprint than previous engine configurations.
This paper presents SigVM, the first blockchain virtual machine that extends EVM to support an event-driven execution model, enabling developers to build truly decentralized smart contracts. Contracts in SigVM can emit signal events, on which other contracts can listen. Once an event is triggered, corresponding handler functions are automatically executed as signal transactions. We build an end-to-end blockchain platform SigChain and a contract language compiler SigSolid to realize the potential of SigVM. Experimental results show that our benchmark applications can be reimplemented with SigVM in a truly decentralized way, eliminating the dependency on centralized and unreliable mechanisms like off-chain relay servers. The development effort of reimplementing these contracts with SigVM is small, i.e., we modified on average 3.17% of the contract code. The runtime and the gas overhead of SigVM on these contracts is negligible.
Existing approaches for statically enforcing differential privacy in higher order languages use either linear or relational refinement types. A barrier to adoption for these approaches is the lack of support for expressing these “fancy types” in mainstream programming languages. For example, no mainstream language supports relational refinement types, and although Rust and modern versions of Haskell both employ some linear typing techniques, they are inadequate for embedding enforcement of differential privacy, which requires “full” linear types. We propose a new type system that enforces differential privacy, avoids the use of linear and relational refinement types, and can be easily embedded in richly typed programming languages like Haskell. We demonstrate such an embedding in Haskell, demonstrate its expressiveness on case studies, and prove soundness of our type-based enforcement of differential privacy.
An object under initialization does not fulfill its class specification yet and can be unsafe to use as it may have uninitialized fields. It can sometimes be useful to call methods on such a partially initialized object in order to compute a complex initial value, or to let the object escape its constructor in order to create mutually recursive objects. However, inadvertent usage of uninitialized fields can lead to run-time crashes. Those subtle programming errors are not statically detected by most modern compilers.
While many other features of object-oriented programming languages have been thoroughly studied over the years, object initialization lacks a simple, systematic, and principled treatment. Building on the insights of previous work, we identify a set of four core principles for safe initialization: monotonicity, authority, stackability, and scopability. We capture the essence of the principles with a minimal calculus, Celsius, and show that the principles give rise to a practical initialization system that strikes a balance between expressiveness and simplicity. The meta-theory of the system is entirely mechanized using the Coq proof assistant. We believe that our approach based on well-identified core principles sheds new light on the underlying mechanisms ensuring safety and could serve as a basis for language design when faced with similar challenges.
Multi-execution memory models, such as Promising and Weakestmo, are an advanced class of weak memory consistency models that justify certain outcomes of a concurrent program by considering multiple candidate executions collectively. While this key characteristic allows them to support effective compilation to hardware models and a wide range of compiler optimizations, it makes reasoning about them substantially more difficult. In particular, we observe that Promising and Weakestmo inhibit effective model checking because they allow some suprisingly weak behaviors that cannot be generated by examining one execution at a time.
We therefore introduce Weakestmo2, a strengthening of Weakestmo by constraining its multi-execution nature, while preserving the important properties of Weakestmo: DRF theorems, compilation to hardware models, and correctness of local program transformations. Our strengthening rules out a class of surprisingly weak program behaviors, which we attempt to characterize with the help of two novel properties: load buffering race freedom and certification locality. In addition, we develop WMC, a model checker for Weakestmo2 with performance close to that of the best tools for per-execution models.
Context-sensitive inter-procedural alias analyses are more precise than intra-procedural alias analyses. However, context-sensitive inter-procedural alias analyses are not scalable. As a consequence, most of the production compilers sacrifice precision for scalability and implement intra-procedural alias analysis. The alias analysis is used by many compiler optimizations, including loop transformations. Due to the imprecision of alias analysis, the program’s performance may suffer, especially in the presence of loops.
Previous work proposed a general approach based on code-versioning with dynamic checks to disambiguate pointers at runtime. However, the overhead of dynamic checks in this approach is O(log n), which is substantially high to enable interesting optimizations. Other suggested approaches, e.g., polyhedral and symbolic range analysis, have O(1) overheads, but they only work for loops with certain constraints. The production compilers, such as LLVM and GCC, use scalar evolution analysis to compute an O(1) range check for loops to resolve memory dependencies at runtime. However, this approach also can only be applied to loops with certain constraints.
In this work, we present our tool, Scout, that can disambiguate two pointers at runtime using single memory access. Scout is based on the key idea to constrain the allocation size and alignment during memory allocations. Scout can also disambiguate array accesses within a loop for which the existing O(1) range checks technique cannot be applied. In addition, Scout uses feedback from static optimizations to reduce the number of dynamic checks needed for optimizations.
Our technique enabled new opportunities for loop-invariant code motion, dead store elimination, loop vectorization, and load elimination in an already optimized code. Our performance improvements are up to 51.11% for Polybench and up to 0.89% for CPU SPEC 2017 suites. The geometric means for our allocator’s CPU and memory overheads for CPU SPEC 2017 benchmarks are 1.05%, and 7.47%, respectively. For Polybench benchmarks, the geometric mean of CPU and memory overheads are 0.21% and 0.13%, respectively.
Robust modules guarantee to do only what they are supposed to do – even in the presence of untrusted malicious clients, and considering not just the direct behaviour of individual methods, but also the emergent behaviour from calls to more than one method. Necessity is a language for specifying robustness, based on novel necessity operators capturing temporal implication, and a proof logic that derives explicit robustness specifications from functional specifications. Soundness and an exemplar proof are mechanised in Coq.
The emergence of propositions-as-sessions, a Curry-Howard correspondence between propositions of Linear Logic and session types for concurrent processes, has settled the logical foundations of message-passing concurrency. Central to this approach is the resource consumption paradigm heralded by Linear Logic. In this paper, we investigate a new point in the design space of session type systems for message-passing concurrent programs. We identify O’Hearn and Pym’s Logic of Bunched Implications (BI) as a fruitful basis for an interpretation of the logic as a concurrent programming language. This leads to a treatment of non-linear resources that is radically different from existing approaches based on Linear Logic. We introduce a new π-calculus with sessions, called πBI; its most salient feature is a construct called spawn, which expresses new forms of sharing that are induced by structural principles in BI. We illustrate the expressiveness of πBI and lay out its fundamental theory: type preservation, deadlock-freedom, and weak normalization results for well-typed processes; an operationally sound and complete typed encoding of an affine λ-calculus; and a non-interference result for access of resources.
In type-and-coeffect systems, contexts are enriched by coeffects modeling how they are actually used, typically through annotations on single variables. Coeffects are computed bottom-up, combining, for each term, the coeffects of its subterms, through a fixed set of algebraic operators. We show that this principled approach can be adopted to track sharing in the imperative paradigm, that is, links among variables possibly introduced by the execution. This provides a significant example of non-structural coeffects, which cannot be computed by-variable, since the way a given variable is used can affect the coeffects of other variables. To illustrate the effectiveness of the approach, we enhance the type system tracking sharing to model a sophisticated set of features related to uniqueness and immutability. Thanks to the coeffect-based approach, we can express such features in a simple way and prove related properties with standard techniques.
Data processing systems are a fundamental component of the modern computing stack. These systems are routinely deployed online: they continuously receive the requests of data processing operations, and continuously return the results to end users or client applications. Online data processing systems have unique features beyond conventional data processing, and the optimizations designed for them are complex, especially when data themselves are structured and dynamic. This paper describes DON Calculus, the first rigorous foundation for online data processing. It captures the essential behavior of both the backend data processing engine and the frontend application, with the focus on two design dimensions essential yet unique to online data processing systems: incremental operation processing (IOP) and temporal locality optimization (TLO). A novel design insight is that the operations continuously applied to the data can be defined as an operation stream flowing through the data structure, and this abstraction unifies diverse designs of IOP and TLO in one calculus. DON Calculus is endowed with a mechanized metatheory centering around a key observable equivalence property: despite the significant non-deterministic executions introduced by IOP and TLO, the observable result of DON Calculus data processing is identical to that of conventional data processing without IOP and TLO. Broadly, DON Calculus is a novel instance in the active pursuit of providing rigorous guarantees to the software system stack. The specification and mechanization of DON Calculus provide a sound base for the designers of future data processing systems to build upon, helping them embrace rigorous semantic engineering without the need of developing from scratch.
The happens-before orders have been widely adopted to model thread interleaving behaviors of concurrent programs. A dedicated ordering theory solver, usually composed of theory propagation, consistency checking, and conflict clause generation, plays a central role in concurrent program verification. We propose a novel preventive reasoning approach that automatically preserves the ordering consistency and makes consistency checking and conflict clause generation omissible. We implement our approach in a prototype tool and conduct experiments on credible benchmarks; results reveal a significant improvement over existing state-of-the-art concurrent program verifiers.
The floating-point representation provides widely-used data types (such as “float” and “double”) for modern numerical software. Numerical errors are inherent due to floating-point’s approximate nature, and pose an important, well-known challenge. It is nontrivial to fix/repair numerical code to reduce numerical errors — it requires either numerical expertise (for manual fixing) or high-precision oracles (for automatic repair); both are difficult requirements. To tackle this challenge, this paper introduces a principled dynamic approach that is fully automated and oracle-free for effectively repairing floating-point errors. The key of our approach is the novel notion of micro-structure that characterizes structural patterns of floating-point errors. We leverage micro-structures’ statistical information on floating-point errors to effectively guide repair synthesis and validation. Compared with existing state-of-the-art repair approaches, our work is fully automatic and has the distinctive benefit of not relying on the difficult to obtain high-precision oracles. Evaluation results on 36 commonly-used numerical programs show that our approach is highly efficient and effective: (1) it is able to synthesize repairs instantaneously, and (2) versus the original programs, the repaired programs have orders of magnitude smaller floating-point errors, while having faster runtime performance.
Garbage-collected language runtimes carefully tune heap limits to reduce garbage collection time and memory usage. However, there's a trade-off: a lower heap limit reduces memory use but increases garbage collection time. Classic methods for setting heap limits include manually tuned heap limits and multiple-of-live-size rules of thumb, but it is not clear when one rule is better than another or how to compare them.
We address this problem with a new framework where heap limits are set for multiple heaps at once. Our key insight is that every heap limit rule induces a particular allocation of memory across multiple processes, and this allocation can be sub-optimal. We use our framework to derive an optimal "square-root" heap limit rule, which minimizes total memory usage for any amount of total garbage collection time. Paradoxically, the square-root heap limit rule achieves this coordination without communication: it allocates memory optimally across multiple heaps without requiring any communication between those heaps.
We present a novel, general construction to abstractly interpret higher-order automatic differentiation (AD). Our construction allows one to instantiate an abstract interpreter for computing derivatives up to a chosen order. Furthermore, since our construction reduces the problem of abstractly reasoning about derivatives to abstractly reasoning about real-valued straight-line programs, it can be instantiated with almost any numerical abstract domain, both relational and non-relational. We formally establish the soundness of this construction.
We implement our technique by instantiating our construction with both the non-relational interval domain and the relational zonotope domain to compute both first and higher-order derivatives. In the latter case, we are the first to apply a relational domain to automatic differentiation for abstracting higher-order derivatives, and hence we are also the first abstract interpretation work to track correlations across not only different variables, but different orders of derivatives.
We evaluate these instantiations on multiple case studies, namely robustly explaining a neural network and more precisely computing a neural network’s Lipschitz constant. For robust interpretation, first and second derivatives computed via zonotope AD are up to 4.76× and 6.98× more precise, respectively, compared to interval AD. For Lipschitz certification, we obtain bounds that are up to 11,850× more precise with zonotopes, compared to the state-of-the-art interval-based tool.
Recently, Graph Neural Networks (GNNs) have been applied for scheduling jobs over clusters, achieving better performance than hand-crafted heuristics. Despite their impressive performance, concerns remain over whether these GNN-based job schedulers meet users’ expectations about other important properties, such as strategy-proofness, sharing incentive, and stability. In this work, we consider formal verification of GNN-based job schedulers. We address several domain-specific challenges such as networks that are deeper and specifications that are richer than those encountered when verifying image and NLP classifiers. We develop vegas, the first general framework for verifying both single-step and multi-step properties of these schedulers based on carefully designed algorithms that combine abstractions, refinements, solvers, and proof transfer. Our experimental results show that vegas achieves significant speed-up when verifying important properties of a state-of-the-art GNN-based scheduler compared to previous methods.
Many separation logics support fractional permissions to distinguish between read and write access to a heap location, for instance, to allow concurrent reads while enforcing exclusive writes. Fractional permissions extend to composite assertions such as (co)inductive predicates and magic wands by allowing those to be multiplied by a fraction. Typical separation logic proofs require that this multiplication has three key properties: it needs to distribute over assertions, it should permit fractions to be factored out from assertions, and two fractions of the same assertion should be combinable into one larger fraction.
Existing formal semantics incorporating fractional assertions into a separation logic define multiplication semantically (via models), resulting in a semantics in which distributivity and combinability do not hold for key resource assertions such as magic wands, and fractions cannot be factored out from a separating conjunction. By contrast, existing automatic separation logic verifiers define multiplication syntactically, resulting in a different semantics for which it is unknown whether distributivity and combinability hold for all assertions.
In this paper, we present a novel semantics for separation logic assertions that allows states to hold more than a full permission to a heap location during the evaluation of an assertion. By reimposing upper bounds on the permissions held per location at statement boundaries, we retain key properties of separation logic, in particular, the frame rule. Our assertion semantics unifies semantic and syntactic multiplication and thereby reconciles the discrepancy between separation logic theory and tools and enjoys distributivity, factorisability, and combinability. We have formalised our semantics and proved its properties in Isabelle/HOL.
Most users of low-code platforms, such as Excel and PowerApps, write programs in domain-specific formula languages to carry out nontrivial tasks. Often users can write most of the program they want, but introduce small mistakes that yield broken formulas. These mistakes, which can be both syntactic and semantic, are hard for low-code users to identify and fix, even though they can be resolved with just a few edits. We formalize the problem of producing such edits as the last-mile repair problem. To address this problem, we developed LaMirage, a LAst-MIle RepAir-engine GEnerator that combines symbolic and neural techniques to perform last-mile repair in low-code formula languages. LaMirage takes a grammar and a set of domain-specific constraints/rules, which jointly approximate the target language, and uses these to generate a repair engine that can fix formulas in that language. To tackle the challenges of localizing errors and ranking candidate repairs, LaMirage leverages neural techniques, whereas it relies on symbolic methods to generate candidate edits. This combination allows LaMirage to find repairs that satisfy the provided grammar and constraints, and then pick the most natural repair. We compare LaMirage to state-of-the-art neural and symbolic approaches on 400 real Excel and Power Fx formulas, where LaMirage outperforms all baselines. We release these benchmarks to encourage subsequent work in low-code domains.
The Solidity programming language is the most widely used language for smart contract development. Improving smart contracts’ correctness, security, and performance has been the driving force for research in vulnerability detection, program analysis, and compiler techniques for Solidity. Similar to system-level languages such as C, Solidity enables the embedding of low-level code in programs, in the form of inline assembly code. Developers use inline assembly for low-level optimizations, extending the Solidity language through libraries, and using blockchain-specific opcodes only available through inline assembly. Nevertheless, inline assembly fragments are not well understood by an average developer and can introduce security threats as well as affect the optimizations that can be applied to programs by the compiler; it also significantly limits the effectiveness of source code static analyzers that operate on the Solidity level. A better understanding of how inline assembly is used in practice could in turn increase the performance, security, and support for inline assembly in Solidity. This paper presents a large-scale quantitative study of the use of inline assembly in 6.8M smart contracts deployed on the Ethereum blockchain. We find that 23% of the analyzed smart contracts contain inline assembly code, and that the use of inline assembly has become more widespread over time. We further performed a manual qualitative analysis for identifying usage patterns of inline assembly in Solidity smart contracts. Our findings are intended to help practitioners understand when they should use inline assembly and guide developers of Solidity tools in prioritizing which parts of inline assembly to implement first. Finally, the insights of this study could be used to enhance the Solidity language, improve the Solidity compiler, and to open up new research directions by driving future researchers to build appropriate methods and techniques for replacing inline assembly in Solidity programs when there is no real necessity to use it.
Neural architecture search (NAS) has become an increasingly important tool within the deep learning community in recent years, yielding many practical advancements in the design of deep neural network architectures. However, most existing approaches operate within highly structured design spaces, and hence (1) explore only a small fraction of the full search space of neural architectures while also (2) requiring significant manual effort from domain experts. In this work, we develop techniques that enable efficient NAS in a significantly larger design space. In particular, we propose to perform NAS in an abstract search space of program properties. Our key insights are as follows: (1) an abstract search space can be significantly smaller than the original search space, and (2) architectures with similar program properties should also have similar performance; thus, we can search more efficiently in the abstract search space. To enable this approach, we also introduce a novel efficient synthesis procedure, which performs the role of concretizing a set of promising program properties into a satisfying neural architecture. We implement our approach, αNAS, within an evolutionary framework, where the mutations are guided by the program properties. Starting with a ResNet-34 model, αNAS produces a model with slightly improved accuracy on CIFAR-10 but 96% fewer parameters. On ImageNet, αNAS is able to improve over Vision Transformer (30% fewer FLOPS and parameters), ResNet-50 (23% fewer FLOPS, 14% fewer parameters), and EfficientNet (7% fewer FLOPS and parameters) without any degradation in accuracy.
We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient: it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful: humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.
Go is a popular statically-typed industrial programming language. To aid the type safe reuse of code, the recent Go release (Go 1.18) published early 2022 includes bounded parametric polymorphism via generic types. Go 1.18 implements generic types using a combination of monomorphisation and call-graph based dictionary-passing called hybrid. This hybrid approach can be viewed as an optimised form of monomorphisation that statically generates specialised methods and types based on possible instantiations. A monolithic dictionary supplements information lost during monomorphisation, and is structured according to the program’s call graph. Unfortunately, the hybrid approach still suffers from code bloat, poor compilation speed, and limited code coverage.
In this paper we propose and formalise a new non-specialising call-site based dictionary-passing translation. Our call-site based translation creates individual dictionaries for each type parameter, with dictionary construction occurring in place of instantiation, overcoming the limitations of hybrid. We prove it correct using a novel and general bisimulation up to technique. To better understand how different generics translation approaches work in practice, we benchmark five translators, Go 1.18, two existing monomorphisation translators, our dictionary-passing translator, and an erasure translator. Our findings reveal several suggestions for improvements for Go 1.18— specifically how to overcome the expressiveness limitations of generic Go and improve compile time and compiled code size performance of Go 1.18.
Programming languages and software engineering tools routinely encounter components that are difficult to reason on via formal techniques or whose formal semantics are not even available—third-party libraries, inline assembly code, SIMD instructions, system calls, calls to machine learning models, etc. However, often access to these components is available as input-output oracles—interfaces are available to query these components on certain inputs to receive the respective outputs. We refer to such functions as closed-box functions. Regular SMT solvers are unable to handle such closed-box functions.
We propose Sādhak, a solver for SMT theories modulo closed-box functions. Our core idea is to use a synergistic combination of a fuzzer to reason on closed-box functions and an SMT engine to solve the constraints pertaining to the SMT theories. The fuzz and the SMT engines attempt to converge to a model by exchanging a rich set of interface constraints that are relevant and interpretable by them. Our implementation, Sādhak, demonstrates a significant advantage over the only other solver that is capable of handling such closed-box constraints: Sādhak solves 36.45% more benchmarks than the best-performing mode of this state-of-the-art solver and has 5.72x better PAR-2 score; on the benchmarks that are solved by both tools, Sādhak is (on an average) 14.62x faster.
Scheduling transformations reorder operations in a program to improve locality and/or parallelism. There are mature loop transformation frameworks such as the polyhedral model for composing and applying instance-wise scheduling transformations for loop nests.In recent years, there have been efforts to build frameworks for composing and applying scheduling transformations for nested recursion and loops, but these frameworks cannot employ the full power of transformations for loop nests since they have overly-restrictive representations. This paper describes a new framework, UniRec, that not only generalizes prior frameworks for reasoning about transformations on recursion, but also generalizes the unimodular framework, and hence unifies reasoning about perfectly-nested loops and recursion.
This paper addresses the problem of creating abstract transformers automatically. The method we present automates the construction of static analyzers in a fashion similar to the way yacc automates the construction of parsers. Our method treats the problem as a program-synthesis problem. The user provides specifications of (i) the concrete semantics of a given operation op, (ii) the abstract domain A to be used by the analyzer, and (iii) the semantics of a domain-specific language L in which the abstract transformer is to be expressed. As output, our method creates an abstract transformer for op in abstract domain A, expressed in L (an “L-transformer for op over A”). Moreover, the abstract transformer obtained is a most-precise L-transformer for op over A; that is, there is no other L-transformer for op over A that is strictly more precise.
We implemented our method in a tool called AMURTH. We used AMURTH to create sets of replacement abstract transformers for those used in two existing analyzers, and obtained essentially identical performance. However, when we compared the existing transformers with the transformers obtained using AMURTH, we discovered that four of the existing transformers were unsound, which demonstrates the risk of using manually created transformers.
Dependency analysis is vital to several applications in computer science. It lies at the essence of secure information flow analysis, binding-time analysis, etc. Various calculi have been proposed in the literature for analysing individual dependencies. Abadi et. al., by extending Moggi’s monadic metalanguage, unified several of these calculi into the Dependency Core Calculus (DCC). DCC has served as a foundational framework for dependency analysis for the last two decades. However, in spite of its success, DCC has its limitations. First, the monadic bind rule of the calculus is nonstandard and relies upon an auxiliary protection judgement. Second, being of a monadic nature, the calculus cannot capture dependency analyses that possess a comonadic nature, for example, the binding-time calculus, λ∘, of Davies. In this paper, we address these limitations by designing an alternative dependency calculus that is inspired by standard ideas from category theory. Our calculus is both monadic and comonadic in nature and subsumes both DCC and λ∘. Our construction explains the nonstandard bind rule and the protection judgement of DCC in terms of standard categorical concepts. It also leads to a novel technique for proving correctness of dependency analysis. We use this technique to present alternative proofs of correctness for DCC and λ∘.
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
Verifying fine-grained optimistic concurrent programs remains an open problem. Modern program logics provide abstraction mechanisms and compositional reasoning principles to deal with the inherent complexity. However, their use is mostly confined to pencil-and-paper or mechanized proofs. We devise a new separation logic geared towards the lacking automation. While local reasoning is known to be crucial for automation, we are the first to show how to retain this locality for (i) reasoning about inductive properties without the need for ghost code, and (ii) reasoning about computation histories in hindsight. We implemented our new logic in a tool and used it to automatically verify challenging concurrent search structures that require inductive properties and hindsight reasoning, such as the Harris set.
Many applications, from social network graph analytics to control flow analysis, compute on sparse data that evolves over the course of program execution. Such data can be represented as dynamic sparse tensors and efficiently stored in formats (data layouts) that utilize pointer-based data structures like block linked lists, binary search trees, B-trees, and C-trees among others. These specialized formats support fast in-place modification and are thus better suited than traditional, array-based data structures like CSR for storing dynamic sparse tensors. However, different dynamic sparse tensor formats have distinct benefits and drawbacks, and performing different computations on tensors that are stored in different formats can require vastly dissimilar code that are not straightforward to correctly implement and optimize.
This paper shows how a compiler can generate efficient code to compute tensor algebra operations on dynamic sparse tensors that may be stored in a wide range of disparate formats. We propose a language for precisely specifying recursive, pointer-based data structures, and we show how this language can express many different dynamic data structures, including all the ones named above as well as many more. We then describe how, given high-level specifications of such dynamic data structures, a compiler can emit code to efficiently access and compute on dynamic sparse tensors that are stored in the aforementioned data structures.
We evaluate our technique and find it generates efficient dynamic sparse tensor algebra kernels that have performance comparable to, if not better than, state-of-the-art libraries and frameworks such as PAM, Aspen, STINGER, and Terrace. At the same time, our technique supports a wider range of tensor algebra operations---such as those that simultaneously compute with static and dynamic sparse tensors---than Aspen, STINGER, and Terrace, while also achieving significantly better performance than PAM for those same operations.
Many context-sensitive dataflow analyses can be formulated as an extended Dyck-CFL reachability problem, where function calls and returns are modeled as partially matched parentheses. Unfortunately, despite many works on the standard Dyck-CFL reachability problem, solving the extended version is still of quadratic space complexity and nearly cubic time complexity, significantly limiting the scalability of program analyses. This paper, for the first time to the best of our knowledge, presents a cheap approach to transforming the extended Dyck-CFL reachability problem to conventional graph reachability, a much easier and well-studied problem. This transformation allows us to benefit from recent advances in reachability indexing schemes, making it possible to answer any reachability query in a context-sensitive dataflow analysis within almost constant time plus only a few extra spaces. We have implemented our approach in two common context-sensitive dataflow analyses, one determines pointer alias relations and the other tracks information flows. Experimental results demonstrate that, compared to their original analyses, we can achieve orders of magnitude (102× to 105×) speedup at the cost of only a moderate space overhead. Our implementation is publicly available.
Program equivalence checking is the task of confirming that two programs have the same behavior on corresponding inputs. We develop a calculus based on symbolic execution and coinduction to check the equivalence of programs in a non-strict functional language. Additionally, we show that our calculus can be used to derive counterexamples for pairs of inequivalent programs, including counterexamples that arise from non-termination. We describe a fully automated approach for finding both equivalence proofs and counterexamples. Our implementation, Nebula, proves equivalences of programs written in Haskell. We demonstrate Nebula's practical effectiveness at both proving equivalence and producing counterexamples automatically by applying Nebula to existing benchmark properties.
We present a novel static analysis technique to derive higher moments for program variables for a large class of probabilistic loops with potentially uncountable state spaces. Our approach is fully automatic, meaning it does not rely on externally provided invariants or templates. We employ algebraic techniques based on linear recurrences and introduce program transformations to simplify probabilistic programs while preserving their statistical properties. We develop power reduction techniques to further simplify the polynomial arithmetic of probabilistic programs and define the theory of moment-computable probabilistic loops for which higher moments can precisely be computed. Our work has applications towards recovering probability distributions of random variables and computing tail probabilities. The empirical evaluation of our results demonstrates the applicability of our work on many challenging examples.
Many programming languages in the OO tradition now support pattern matching in some form. Historical examples include Scala and Ceylon, with the more recent additions of Java, Kotlin, TypeScript, and Flow. But pattern matching on generic class hierarchies currently results in puzzling type errors in most of these languages. Yet this combination of features occurs naturally in many scenarios, such as when manipulating typed ASTs. To support it properly, compilers needs to implement a form of subtyping reconstruction: the ability to reconstruct subtyping information uncovered at runtime during pattern matching. We introduce cDOT, a new calculus in the family of Dependent Object Types (DOT) intended to serve as a formal foundation for subtyping reconstruction. Being descended from pDOT, itself a formal foundation for Scala, cDOT can be used to encode advanced object-oriented features such as generic inheritance, type constructor variance, F-bounded polymorphism, and first-class recursive modules. We demonstrate that subtyping reconstruction subsumes GADTs by encoding λ2,Gµ, a classical constraint-based GADT calculus, into cDOT.
Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant derivations, i.e., identical edges can be derived multiple times due to the many paths between two reachable nodes. We observe that most redundancy arises from the derivations involving transitive relations of reachable node pairs. Unfortunately, existing techniques for reducing redundancy in transitive-closure-based problems are either ineffective or inapplicable to identifying and eliminating redundant derivations during on-the-fly CFL-reachability solving.
This paper proposes a scalable yet precision-preserving approach to all-pairs CFL-reachability analysis by taming its transitive redundancy. Our key insight is that transitive relations are intrinsically ordered, and utilizing the order for edge derivation can avoid most redundancy. To address the challenges in determining the derivation order from the dynamically changed graph during CFL-reachability solving, we introduce a hybrid graph representation by combining spanning trees and adjacency lists, together with a dynamic construction algorithm. Based on this representation, we propose a fast and effective partially ordered algorithm POCR to boost the performance of CFL-reachability analysis by reducing its transitive redundancy during on-the-fly solving. Our experiments on context-sensitive value-flow analysis and field-sensitive alias analysis for C/C++ demonstrate the promising performance of POCR. On average, POCR eliminates 98.50% and 97.26% redundant derivations respectively for the value-flow and alias analysis, achieving speedups of 21.48× and 19.57× over the standard CFL-reachability algorithm. We also compare POCR with two recent open-source tools, Graspan (a CFL-reachability solver) and Soufflé (a Datalog engine). The results demonstrate that POCR is over 3.67× faster than Graspan and Soufflé on average for both value-flow analysis and alias analysis.
We propose a symbolic execution method for programs that can draw random samples. In contrast to existing work, our method can verify randomized programs with unknown inputs and can prove probabilistic properties that universally quantify over all possible inputs. Our technique augments standard symbolic execution with a new class of probabilistic symbolic variables, which represent the results of random draws, and computes symbolic expressions representing the probability of taking individual paths. We implement our method on top of the KLEE symbolic execution engine alongside multiple optimizations and use it to prove properties about probabilities and expected values for a range of challenging case studies written in C++, including Freivalds’ algorithm, randomized quicksort, and a randomized property-testing algorithm for monotonicity. We evaluate our method against Psi, an exact probabilistic symbolic inference engine, and Storm, a probabilistic model checker, and show that our method significantly outperforms both tools.
Low-level systems code often needs to interact with data, such as page table entries or network packet headers, in which multiple pieces of information are packaged together as bitfield components of a single machine integer and accessed via bitfield manipulations (e.g., shifts and masking). Most existing approaches to verifying such code employ SMT solvers, instantiated with theories for bit vector reasoning: these provide a powerful hammer, but also significantly increase the trusted computing base of the verification toolchain.
In this work, we propose an alternative approach to the verification of bitfield-manipulating systems code, which we call BFF. Building on the RefinedC framework, BFF is not only highly automated (as SMT-based approaches are) but also foundational---i.e., it produces a machine-checked proof of program correctness against a formal semantics for C programs, fully mechanized in Coq. Unlike SMT-based approaches, we do not try to solve the general problem of arbitrary bit vector reasoning, but rather observe that real systems code typically accesses bitfields using simple, well-understood programming patterns: the layout of a bit vector is known up front, and its bitfields are accessed in predictable ways through a handful of bitwise operations involving bit masks. Correspondingly, we center our approach around the concept of a structured bit vector---i.e., a bit vector with a known bitfield layout---which we use to drive simple and predictable automation. We validate the BFF approach by verifying a range of bitfield-manipulating C functions drawn from real systems code, including page table manipulation code from the Linux kernel and the pKVM hypervisor.
Effect handlers allow the programmer to implement computational effects, such as custom error handling, various forms of lightweight concurrency, and dynamic binding, inside the programming language. We introduce cpp-effects, a C++ library for effect handlers with a typed high-level, object-oriented interface. We demonstrate that effect handlers can be successfully applied in imperative systems programming languages with manual memory management. Through a collection of examples, we explore how to program effectively with effect handlers in C++, discuss the intricacies and challenges of the implementation, and show that despite its limitations, cpp-effects performance is competitive and in some cases even outperforms state-of-the-art approaches such as C++20 coroutines and the libmprompt library for multiprompt delimited control.
A streaming probabilistic program receives a stream of observations and produces a stream of distributions that are conditioned on these observations. Efficient inference is often possible in a streaming context using Rao-Blackwellized particle filters (RBPFs), which exactly solve inference problems when possible and fall back on sampling approximations when necessary. While RBPFs can be implemented by hand to provide efficient inference, the goal of streaming probabilistic programming is to automatically generate such efficient inference implementations given input probabilistic programs.
In this work, we propose semi-symbolic inference, a technique for executing probabilistic programs using a runtime inference system that automatically implements Rao-Blackwellized particle filtering. To perform exact and approximate inference together, the semi-symbolic inference system manipulates symbolic distributions to perform exact inference when possible and falls back on approximate sampling when necessary. This approach enables the system to implement the same RBPF a developer would write by hand. To ensure this, we identify closed families of distributions – such as linear-Gaussian and finite discrete models – on which the inference system guarantees exact inference. We have implemented the runtime inference system in the ProbZelus streaming probabilistic programming language. Despite an average 1.6× slowdown compared to the state of the art on existing benchmarks, our evaluation shows that speedups of 3×-87× are obtainable on a new set of challenging benchmarks we have designed to exploit closed families.
Axioms and inference rules form the foundation of deductive systems and are crucial in the study of reasoning with logics over structures. Historically, axiomatizations have been discovered manually with much expertise and effort. In this paper we show the feasibility of using synthesis techniques to discover axiomatizations for different classes of structures, and in some contexts, automatically prove their completeness. For evaluation, we apply our technique to find axioms for (1) classes of frames in modal logic characterized in first-order logic and (2) the class of language models with regular operations.
There is an ongoing effort to provide programming abstractions that ease the burden of exploiting multicore hardware. Many programming abstractions (e.g., concurrent objects, transactional memory, etc.) simplify matters, but still involve intricate engineering. We argue that some difficulty of multicore programming can be meliorated through a declarative programming style in which programmers directly express the independence of fragments of sequential programs.
In our proposed paradigm, programmers write programs in a familiar, sequential manner, with the added ability to explicitly express the conditions under which code fragments sequentially commute. Putting such commutativity conditions into source code offers a new entry point for a compiler to exploit the known connection between commutativity and parallelism. We give a semantics for the programmer’s sequential perspective and, under a correctness condition, find that a compiler-transformed parallel execution is equivalent to the sequential semantics. Serializability/linearizability are not the right fit for this condition, so we introduce scoped serializability and show how it can be enforced with lock synthesis techniques.
We next describe a technique for automatically verifying and synthesizing commute conditions via a new reduction from our commute blocks to logical specifications, upon which symbolic commutativity reasoning can be performed. We implemented our work in a new language called Veracity, implemented in Multicore OCaml. We show that commutativity conditions can be automatically generated across a variety of new benchmark programs, confirm the expectation that concurrency speedups can be seen as the computation increases, and apply our work to a small in-memory filesystem and an adaptation of a crowdfund blockchain smart contract.
Static Analysis tools have rules for several code quality issues and these rules are created by experts manually. In this paper, we address the problem of automatic synthesis of code quality rules from examples. We formulate the rule synthesis problem as synthesizing first order logic formulas over graph representations of code. We present a new synthesis algorithm RhoSynth that is based on Integer Linear Programming-based graph alignment for identifying code elements of interest to the rule. We bootstrap RhoSynth by leveraging code changes made by developers as the source of positive and negative examples. We also address rule refinement in which the rules are incrementally improved with additional user-provided examples. We validate RhoSynth by synthesizing more than 30 Java code quality rules. These rules have been deployed as part of Amazon CodeGuru Reviewer and their precision exceeds 75% based on developer feedback collected during live code-reviews within Amazon. Through comparisons with recent baselines, we show that current state-of-the-art program synthesis approaches are unable to synthesize most of these rules.
Operation-based Conflict-free Replicated Data Types (op-based CRDTs) are a family of distributed data structures where all operations are designed to commute, so that replica states eventually converge. Additionally, op-based CRDTs require that operations be propagated between replicas in causal order. This paper presents a framework for verifying safety properties of CRDT implementations using separation logic. The framework consists of two libraries. One implements a Reliable Causal Broadcast (RCB) protocol so that replicas can exchange messages in causal order. A second “OpLib” library then uses RCB to simplify the creation and correctness proofs of op-based CRDTs. OpLib allows clients to implement new CRDTs as purely-functional data structures, without having to reason about network operations, concurrency control and mutable state, and without having to each time re-implement causal broadcast. Using OpLib, we have implemented 12 example CRDTs from the literature, including multiple versions of replicated registers and sets, two CRDT combinators for products and maps, and two example use cases of the map combinator. Our proofs are conducted in the Aneris distributed separation logic and are formalized in Coq. Our technique is the first work on verification of op-based CRDTs that satisfies both of the following properties: it is modular and targets executable implementations, as opposed to high-level protocols.
Transactional memory (TM) is an intensively studied synchronisation paradigm with many proposed implementations in software and hardware, and combinations thereof. However, TM under relaxed memory, e.g., C11 (the 2011 C/C++ standard) is still poorly understood, lacking rigorous foundations that support verifiable implementations. This paper addresses this gap by developing TMS2-ra, a relaxed operational TM specification. We integrate TMS2-ra with RC11 (the repaired C11 memory model that disallows load-buffering) to provide a formal semantics for TM libraries and their clients. We develop a logic, TARO, for verifying client programs that use TMS2-ra for synchronisation. We also show how TMS2-ra can be implemented by a C11 library, TML-ra, that uses relaxed and release-acquire atomics, yet guarantees the synchronisation properties required by TMS2-ra. We benchmark TML-ra and show that it outperforms its sequentially consistent counterpart in the STAMP benchmarks. Finally, we use a simulation-based verification technique to prove correctness of TML-ra. Our entire development is supported by the Isabelle/HOL proof assistant.
This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for long-running applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with long-running applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior state-of-art shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer.
Recursively defined linked data structures embedded in a pointer-based heap and their properties are naturally expressed in pure first-order logic with least fixpoint definitions (FO+lfp) with background theories. Such logics, unlike pure first-order logic, do not admit even complete procedures. In this paper, we undertake a novel approach for synthesizing inductive hypotheses to prove validity in this logic. The idea is to utilize several kinds of finite first-order models as counterexamples that capture the non-provability and invalidity of formulas to guide the search for inductive hypotheses. We implement our procedures and evaluate them extensively over theorems involving heap data structures that require inductive proofs and demonstrate the effectiveness of our methodology.
Specifying and mechanically verifying type safe programming languages requires significant effort. This effort can in theory be reduced by defining and reusing pre-verified, modular components. In practice, however, existing approaches to modular mechanical verification require many times as much specification code as plain, monolithic definitions. This makes it hard to develop new reusable components, and makes existing component specifications hard to grasp. We present an alternative approach based on intrinsically-typed interpreters, which reduces the size and complexity of modular specifications as compared to existing approaches. Furthermore, we introduce a new abstraction for safe-by-construction specification and composition of pre-verified type safe language components: language fragments. Language fragments are about as concise and easy to develop as plain, monolithic intrinsically-typed interpreters, but require about 10 times less code than previous approaches to modular mechanical verification of type safety.