We present Zar: a formally verified compiler pipeline from discrete probabilistic programs with unbounded loops in the conditional probabilistic guarded command language (cpGCL) to proved-correct executable samplers in the random bit model. We exploit the key idea that all discrete probability distributions can be reduced to unbiased coin-flipping schemes. The compiler pipeline first translates cpGCL programs into choice-fix trees, an intermediate representation suitable for reduction of biased probabilistic choices. Choice-fix trees are then translated to coinductive interaction trees for execution within the random bit model. The correctness of the composed translations establishes the sampling equidistribution theorem: compiled samplers are correct wrt. the conditional weakest pre-expectation semantics of cpGCL source programs. Zar is implemented and fully verified in the Coq proof assistant. We extract verified samplers to OCaml and Python and empirically validate them on a number of illustrative examples.

The cost of maintaining formally specified and verified software is widely considered prohibitively high due to the need to constantly keep code and the proofs of its correctness in sync—the problem known as *proof repair*. One of the main challenges in automated proof repair for evolving code is to infer invariants for a new version of a once verified program that are strong enough to establish its full functional correctness.

In this work, we present the first proof repair methodology for higher-order imperative functions, whose initial versions were verified in the Coq proof assistant and whose specifications remained unchanged. Our proof repair procedure is based on the combination of dynamic program alignment, enumerative invariant synthesis, and a novel technique for efficiently pruning the space of invariant candidates, dubbed *proof-driven testing*, enabled by the constructive nature of Coq’s proof certificates.

We have implemented our approach in a mostly-automated proof repair tool called Sisyphus. Given an OCaml function verified in Coq and its unverified new version, Sisyphus produces a Coq proof for the new version, discharging most of the new proof goals automatically and suggesting high-confidence obligations for the programmer to prove for the cases when automation fails. We have evaluated Sisyphus on 10 OCaml programs taken from popular libraries, that manipulate arrays and mutable data structures, considering their verified original and unverified evolved versions. Sisyphus has managed to repair proofs for all those functions, suggesting correct invariants and generating a small number of easy-to-prove residual obligations.

Monolithic control plane verification cannot scale to hyperscale network architectures with tens of thousands of nodes, heterogeneous network policies and thousands of network changes a day. Instead, modular verification offers improved scalability, reasoning over diverse behaviors, and robustness following policy updates. We introduce Timepiece, a new modular control plane verification system. While one class of verifiers, starting with Minesweeper, were based on analysis of stable paths, we show that such models, when deployed naïvely for modular verification, are unsound. To rectify the situation, we adopt a routing model based around a logical notion of time and develop a sound, expressive, and scalable verification engine.

Our system requires that a user specifies interfaces between module components. We develop methods for defining these interfaces using predicates inspired by temporal logic, and show how to use those interfaces to verify a range of network-wide properties such as reachability or access control. Verifying a prefix-filtering policy using a non-modular verification engine times out on an 80-node fattree network after 2 hours. However, Timepiece verifies a 2,000-node fattree in 2.37 minutes on a 96-core virtual machine. Modular verification of individual routers is embarrassingly parallel and completes in seconds, which allows verification to scale beyond non-modular engines, while still allowing the full power of SMT-based symbolic reasoning.

3D Computer-Aided Design (CAD) modeling is ubiquitous in mechanical engineering and design. Modern CAD models are programs that produce geometry and can be used to implement high-level geometric changes by modifying input parameters. While there has been a surge of recent interest in program-based tooling for the CAD domain, one fundamental problem remains unsolved. CAD programs pass geometric arguments to operations using references, which are queries that select elements from the constructed geometry according to programmer intent. The challenge is designing reference semantics that can express programmer intent across all geometric topologies achievable with model parameters, including topologies where the desired elements are not present. In current systems, both users and automated tools may create invalid models when parameters are changed, as references to geometric elements are lost or silently and arbitrarily switched. While existing CAD systems use heuristics to attempt to infer user intent in cases of this undefined behavior, this best-effort solution is not suitable for constructing automated tools to edit and optimize CAD programs. We analyze the failure modes of existing referencing schemes and formalize a set of criteria on which to evaluate solutions to the CAD referencing problem. In turn, we propose a domain-specific language that exposes references as a first-class language construct, using user-authored queries to introspect element history and define references safely over all paths. We give a semantics for fine-grained element lineage that can subsequently be queried; and show that our language meets the desired properties. Finally, we provide an implementation of a lineage-based referencing system in a 2.5D CAD kernel, demonstrating realistic referencing scenarios and illustrating how our system safely represents models that cause reference breakage in existing CAD systems.

We present WasmRef-Isabelle, a monadic interpreter for WebAssembly written in Isabelle/HOL and proven correct with respect to the WasmCert-Isabelle mechanisation of WebAssembly. WasmRef-Isabelle has been adopted and deployed as a fuzzing oracle in the continuous integration infrastructure of Wasmtime, a widely used WebAssembly implementation. Previous efforts to fuzz Wasmtime against WebAssembly's official OCaml reference interpreter were abandoned by Wasmtime's developers after the reference interpreter exhibited unacceptable performance characteristics, which its maintainers decided not to fix in order to preserve the interpreter's close definitional correspondence with the official specification. With WasmRef-Isabelle, we achieve the best of both worlds - an interpreter fast enough to be useable as a fuzzing oracle that also maintains a close correspondence with the specification through a mechanised proof of correctness.

We verify the correctness of WasmRef-Isabelle through a two-step refinement proof in Isabelle/HOL. We demonstrate that WasmRef-Isabelle significantly outperforms the official reference interpreter, has performance comparable to a Rust debug build of the industry WebAssembly interpreter Wasmi, and competes with unverified oracles on fuzzing throughput when deployed in Wasmtime's fuzzing infrastructure. We also present several new extensions to WasmCert-Isabelle which enhance WasmRef-Isabelle's utility as a fuzzing oracle: we add support for a number of upcoming WebAssembly features, and fully mechanise the numeric semantics of WebAssembly's integer operations.

CUDA, OpenCL, and OpenACC are the primary means of writing general-purpose software for NVIDIA GPUs, all of which are subject to the same well-documented memory safety vulnerabilities currently plaguing software written in C and C++. One can argue that the GPU execution environment makes software development more error prone. Unlike C and C++, CUDA features multiple, distinct memory spaces to map to the GPU’s unique memory hierarchy, and a typical CUDA program has thousands of concurrently executing threads. Furthermore, the CUDA platform has fewer guardrails than CPU platforms that have been forced to incrementally adjust to a barrage of security attacks. Unfortunately, the peculiarities of the GPU make it difficult to directly port memory safety solutions from the CPU space.

This paper presents cuCatch, a new memory safety error detection tool designed specifically for the CUDA programming model. cuCatch combines optimized compiler instrumentation with driver support to implement a novel algorithm for catching spatial and temporal memory safety errors with low performance overheads. Our experimental results on a wide set of GPU applications show that cuCatch incurs a 19% runtime slowdown on average, which is orders of magnitude faster than state-of-the-art debugging tools on GPUs. Moreover, our quantitative evaluation demonstrates cuCatch’s higher error detection coverage compared to prior memory safety tools. The combination of high error detection coverage and low runtime overheads makes cuCatch an ideal candidate for accelerating memory safety debugging for GPU applications.

We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.

The pervasive brittleness of deep neural networks has attracted significant attention in recent years. A particularly interesting finding is the existence of adversarial examples, imperceptibly perturbed natural inputs that induce erroneous predictions in state-of-the-art neural models. In this paper, we study a different type of adversarial examples specific to code models, called *discrete adversarial examples*, which are created through program transformations that preserve the semantics of original inputs.In particular, we propose a novel, general method that is highly effective in attacking a broad range of code models. From the defense perspective, our primary contribution is a theoretical foundation for the application of adversarial training — the most successful algorithm for training robust classifiers — to defending code models against discrete adversarial attack. Motivated by the theoretical results, we present a simple realization of adversarial training that substantially improves the robustness of code models against adversarial attacks in practice.

We extensively evaluate both our attack and defense methods. Results show that our discrete attack is significantly more effective than state-of-the-art whether or not defense mechanisms are in place to aid models in resisting attacks. In addition, our realization of adversarial training improves the robustness of all evaluated models by the widest margin against state-of-the-art adversarial attacks as well as our own.

Homomorphic encryption (HE) is an encryption scheme that provides arithmetic operations on the encrypted data without doing decryption. For Ring-based HE, an encryption scheme that uses arithmetic operations on a polynomial ring as building blocks, performance improvement of unit HE operations has been achieved by two kinds of efforts. The first one is through accelerating the building blocks, polynomial operations. However, it does not facilitate optimizations across polynomial operations such as fusing two polynomial operations. The second one is implementing highly optimized HE operations in an amalgamated manner. The written codes have superior performance, but they are hard to maintain.

To resolve these challenges, we propose HEaaN.MLIR, a compiler that performs optimizations across polynomial operations. Also, we propose Poly and ModArith, compiler intermediate representations (IRs) for integer polynomial arithmetic and modulus arithmetic on integer arrays. HEaaN.MLIR has compiler optimizations that are motivated by manual optimizations that HE developers do. These include optimizing modular arithmetic operations, fusing loops, and vectorizing integer arithmetic instructions. HEaaN.MLIR can parse a program consisting of the Poly and ModArith instructions and generate a high-performance, multithreaded machine code for a CPU. Our experiment shows that the compiled operations outperform heavily optimized open-source and commercial HE libraries by up to 3.06x in a single thread and 4.55x in multiple threads.

Region inference offers a mechanism to reduce (and sometimes entirely remove) the need for reference-tracing garbage collection by inferring where to insert allocation and deallocation instructions in a program at compile time. When the mechanism is combined with techniques for reference-tracing garbage collection, which is helpful in general to support programs with very dynamic memory behaviours, it turns out that region-inference is complementary to adding generations to a reference-tracing collector. However, region-inference and the associated region-representation analyses that make such a memory management strategy perform well in practice are complex, both from a theoretical point-of-view and from an implementation point-of-view.

In this paper, we demonstrate a soundness problem with existing theoretical developments, which have to do with ensuring that, even for higher-order polymorphic programs, no dangling-pointers appear during a reference-tracing collection. This problem has materialised as a practical soundness problem in a real implementation based on region inference. As a solution, we present a modified, yet simple, region type-system that captures garbage-collection effects, even for polymorphic higher-order code, and outline how region inference and region-representation analyses are adapted to the new type system. The new type system allows for associating simpler region type-schemes with functions, compared to original work, makes it possible to combine region-based memory management with partly tag-free reference-tracing (and generational) garbage-collection, and repairs previously derived work that is based on the erroneous published results.

Writing concurrent code that is both correct and efficient is notoriously difficult. Thus, programmers often prefer to use synchronization abstractions, which render code simpler and easier to reason about. Despite a wealth of work on this topic, there is still a gap between the rich semantics provided by synchronization abstractions in modern programming languages—specifically, *fair* FIFO ordering of synchronization requests and support for *abortable* operations—and frameworks for implementing it correctly and efficiently. Supporting such semantics is critical given the rising popularity of constructs for asynchronous programming, such as coroutines, which abort frequently and are cheaper to suspend and resume compared to native threads.

This paper introduces a new framework called CancellableQueueSynchronizer (CQS), which enables simple yet efficient implementations of a wide range of fair and abortable synchronization primitives: mutexes, semaphores, barriers, count-down latches, and blocking pools. Our main contribution is algorithmic, as implementing both fairness and abortability efficiently at this level of generality is non-trivial. Importantly, all our algorithms, including the CQS framework and the primitives built on top of it, come with *formal proofs* in the Iris framework for Coq for many of their properties. These proofs are modular, so it is easy to show correctness for new primitives implemented on top of CQS. From a practical perspective, implementation of CQS for native threads on the JVM improves throughput by up to two orders of magnitude over Java’s AbstractQueuedSynchronizer, the only practical abstraction offering similar semantics. Further, we successfully integrated CQS as a core component of the popular Kotlin Coroutines library, validating the framework’s practical impact and expressiveness in a real-world environment. In sum, CancellableQueueSynchronizer is the first framework to combine expressiveness with formal guarantees and solid practical performance. Our approach should be extensible to other languages and families of synchronization primitives.

As more organizations are leveraging third-party cloud and edge data centers to process data efficiently, the issue of preserving data confidentiality becomes increasingly important. In response, numerous security mechanisms have been introduced and promoted in recent years including software-based ones such as homomorphic encryption, as well as hardware-based ones such as Intel SGX and AMD SEV. However these mechanisms vary in their security properties, performance characteristics, availability, and application modalities, making it hard for programmers to judiciously choose and correctly employ the right one for a given data query.

This paper presents a mechanism-independent approach to distributed confidentiality-preserving data analytics. Our approach hinges on a core programming language which abstracts the intricacies of individual security mechanisms. Data is labeled using custom confidentiality levels arranged along a lattice in order to capture its exact confidentiality constraints. High-level mappings between available mechanisms and these labels are captured through a novel expressive form of security policy. Confidentiality is guaranteed through a type system based on a novel formulation of noninterference, generalized to support our security policy definition. Queries written in a largely security-agnostic subset of our language are transformed to the full language to automatically use mechanisms in an efficient, possibly combined manner, while provably preserving confidentiality in data queries end-to-end. We prototype our approach as an extension to the popular Apache Spark analytics engine, demonstrating the significant versatility and performance benefits of our approach over single hardwired mechanisms --- including in existing systems --- without compromising on confidentiality.

Persistent memory (PM) is an emerging class of storage technology that combines the performance of DRAM with the durability of SSD, offering the best of both worlds. This had led to a surge of research on persistent objects in PM. Among such persistent objects, concurrent data structures (DSs) are particularly interesting thanks to their performance and scalability. One of the most widely used correctness criteria for persistent concurrent DSs is *detectable recoverability*, ensuring both thread safety (for correctness in non-crashing concurrent executions) and crash consistency (for correctness in crashing executions). However, the existing approaches to designing detectably recoverable concurrent DSs are either limited to simple algorithms or suffer from high runtime overheads.

We present Memento: a *general* and *high-performance* programming framework for detectably recoverable concurrent DSs in PM. To ensure general applicability to various DSs, Memento supports primitive operations such as checkpoint and compare-and-swap and their composition with control constructs. To ensure high performance, Memento employs a timestamp-based recovery strategy that requires fewer writes and flushes to PM than the existing approaches. We formally prove that Memento ensures detectable recoverability in the presence of crashes. To showcase Memento, we implement a lock-free stack, list, queue, and hash table, and a combining queue that detectably recovers from random crashes in stress tests and performs comparably to existing hand-tuned persistent DSs with and without detectable recoverability.

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult.

In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges.

On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.

Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints—which cycle a signal arrives, when an input is read—and structural constraints—how often a multiplier accepts new inputs—are fundamental to hardware interfaces. Existing hardware design languages do not provide a way to encode these constraints; a user must read documentation, build scripts, or in the worst case, a module’s implementation to understand how to use it. We present Filament, a language for modular hardware design that supports the specification and enforcement of timing and structural constraints for statically scheduled pipelines. Filament uses *timeline types*, which describe the intervals of clock-cycle time when a given signal is available or required. Filament enables *safe composition* of hardware modules, ensures that the resulting designs are correctly pipelined, and predictably lowers them to efficient hardware.

Understanding and debugging the performance of distributed systems is a notoriously hard task, but a critical one. Traditional techniques like logging, tracing, and benchmarking represent a best-effort way to find performance bugs, but they either require a full deployment to be effective or can only find bugs after they manifest. Even with such techniques in place, real deployments often exhibit performance bugs that cause unwanted behavior.

In this paper, we present Performal, a novel methodology that leverages the recent advances in formal verification to provide rigorous latency guarantees for real, complex distributed systems. The task is not an easy one: it requires carefully decoupling the formal proofs from the execution environment, formally defining latency properties, and proving them on real, distributed implementations. We used Performal to prove rigorous upper bounds for the latency of three applications: a distributed lock, ZooKeeper and a MultiPaxos-based State Machine Replication system. Our experimental evaluation shows that these bounds are a good proxy for the behavior of the deployed system and can be used to identify performance bugs in real-world systems.

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic by adding new functions and bind a sub-expression to a function using a scheduling API. Mosaic substitutes the bound sub-expressions with calls to the external functions and automatically generates the remaining code using a default code generator. As the generated code is fused by default, users can productively leverage both fusion and calls to specialized functions within the same compiler. We demonstrate the benefits of our dual approach by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206× against fused code in some cases, while generating fused code can provide speedups of up to 3.57× against code that calls external functions in other cases. Mosaic also offers a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. Additionally, the interface for adding new external functions is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of code.

We introduce the new problem of *hardware decompilation*. Analogous to software decompilation, hardware decompilation is about analyzing a low-level artifact—in this case a *netlist*, i.e., a graph of wires and logical gates representing a digital circuit—in order to recover higher-level programming abstractions, and using those abstractions to generate code written in a hardware description language (HDL). The overall problem of hardware decompilation requires a number of pieces. In this paper we focus on one specific piece of the puzzle: a technique we call *hardware loop rerolling*. Hardware loop rerolling leverages clone detection and program synthesis techniques to identify repeated logic in netlists (such as would be synthesized from loops in the original HDL code) and *reroll* them into syntactic loops in the recovered HDL code. We evaluate hardware loop rerolling for hardware decompilation over a set of hardware design benchmarks written in the PyRTL HDL and industry standard SystemVerilog. Our implementation identifies and rerolls loops in 52 out of 53 of the netlists in our benchmark suite, and we show three examples of how hardware decompilation can provide concrete benefits: transpilation between HDLs, faster simulation times over netlists (with mean speedup of 6x), and artifact compaction (39% smaller on average).

Deep neural networks (DNNs) are becoming increasingly important components of software, and are considered the state-of-the-art solution for a number of problems, such as image recognition. However, DNNs are far from infallible, and incorrect behavior of DNNs can have disastrous real-world consequences. This paper addresses the problem of architecture-preserving V-polytope provable repair of DNNs. A V-polytope defines a convex bounded polytope using its vertex representation. V-polytope provable repair guarantees that the repaired DNN satisfies the given specification on the infinite set of points in the given V-polytope. An architecture-preserving repair only modifies the parameters of the DNN, without modifying its architecture. The repair has the flexibility to modify multiple layers of the DNN, and runs in polynomial time. It supports DNNs with activation functions that have some linear pieces, as well as fully-connected, convolutional, pooling and residual layers. To the best our knowledge, this is the first provable repair approach that has all of these features. We implement our approach in a tool called APRNN. Using MNIST, ImageNet, and ACAS Xu DNNs, we show that it has better efficiency, scalability, and generalization compared to PRDNN and REASSURE, prior provable repair methods that are not architecture preserving.

We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, egglog supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, egglog supports term rewriting, efficient congruence closure, and extraction of optimized terms.

We identify two recent applications -- a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter -- that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate our system by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems.

The conformance testing of programming language implementations is crucial to support correct and consistent execution environments. Because manually maintaining conformance tests for real-world programming languages is cumbersome and labor-intensive, researchers have presented various ways to make conformance tests effective and efficient. One such approach is to use graph coverage, one of the most widely-used coverage criteria, to generate tests that reach different parts of a mechanized language specification. Since mechanized specifications use functions or inductive definitions to describe the semantics of language features, traditional graph coverage criteria for software work as they are. However, they may not produce high-quality conformance tests because language implementations often have specialized execution paths for different features, even when their semantics descriptions use the same functions. Traditional graph coverage may not distinguish test requirements of such language features, which degrades the quality of conformance testing. Similarly, it may not distinguish test requirements of different parts of the same language feature when their semantics descriptions use the same functions.

We present feature-sensitive (FS) coverage as a novel coverage criterion to generate high-quality conformance tests for language implementations. It is a general extension of graph coverage, refining conventional test requirements using the innermost enclosing language features. We also introduce feature-call-path-sensitive (FCPS) coverage, a variant of FS coverage, and extend both coverage criteria using the 𝑘-limiting approach. To evaluate the effectiveness of the new coverage criteria for language implementations, we apply them to a mechanized specification of JavaScript. We extend JEST, the state-of-the-art JavaScript conformance test synthesizer using coverage-guided mutational fuzzing, with various FS and FCPS coverage criteria. For the latest JavaScript language specification (ES13, 2022), our tool automatically synthesizes 237,981 conformance tests in 50 hours with five coverage criteria. We evaluated the conformance of eight mainstream JavaScript implementations (four engines and four transpilers) with the synthesized conformance tests and discovered bugs in all of them. The tool detected 143 distinct conformance bugs (42 in engines and 101 in transpilers), 85 of which were confirmed by the developers and 83 of which were newly discovered bugs.

The *defunctionalization* translation that eliminates higher-order functions from programs forms a key part of many compilers. However, defunctionalization for dependently-typed languages has not been formally studied.

We present the first formally-specified defunctionalization translation for a dependently-typed language and establish key metatheoretical properties such as soundness and type preservation. The translation is suitable for incorporation into type-preserving compilers for dependently-typed languages

Over the past decades, context sensitivity has been considered as one of the most effective ideas for improving the precision of pointer analysis for Java. Different from the extremely fast context-insensitivity approach, context sensitivity requires every program method to be analyzed under different contexts for separating the static abstractions of different dynamic instantiations of the method’s variables and heap objects, and thus reducing spurious object flows introduced by method calls. However, despite great precision benefits, as each method is equivalently cloned and analyzed under each context, context sensitivity brings heavy efficiency costs. Recently, numerous selective context-sensitive approaches have been put forth for scaling pointer analysis to large and complex Java programs by applying contexts only to the selected methods while analyzing the remaining ones context-insensitively; however, because the selective approaches do not fundamentally alter the primary methodology of context sensitivity (and do not thus remove its efficiency bottleneck), they produce much improved but still limited results.

In this work, we present a fundamentally different approach called Cut-Shortcut for fast and precise pointer analysis for Java. Its insight is simple: the main effect of cloning methods under different contexts is to filter spurious object flows that have been merged inside a callee method; from the view of a typical pointer flow graph (PFG), such effect can be simulated by cutting off (Cut) the edges that introduce precision loss to certain pointers and adding Shortcut edges directly from source pointers to the target ones circumventing the method on PFG. As a result, we can achieve the effect of context sensitivity without contexts. We identify three general program patterns and develop algorithms based on them to safely cut off and add shortcut edges on PFG, formalize them and formally prove the soundness. To comprehensively validate Cut-Shortcut’s effectiveness, we implement two versions of Cut-Shortcut for two state-of-the-art pointer analysis frameworks for Java, one in Datalog for the declarative Doop and the other in Java for the imperative Tai-e, and we consider all the large and complex programs used in recent literatures that meet the experimental requirements. The evaluation results are extremely promising: Cut-Shortcut is even able to run faster than context insensitivity for most evaluated programs while obtaining high precision that is comparable to context sensitivity (if scalable) in both frameworks. This is for the first time that we have been able to achieve such a good efficiency and precision trade-off for those hard-to-analyze programs, and we hope Cut-Shortcut could offer new perspectives for developing more effective pointer analysis for Java in the future.

Modern applications, such as social networking systems and e-commerce platforms are centered around using large-scale databases for storing and retrieving data. Accesses to the database are typically enclosed in transactions that allow computations on shared data to be isolated from other concurrent computations and resilient to failures. Modern databases trade isolation for performance. The weaker the isolation level is, the more behaviors a database is allowed to exhibit and it is up to the developer to ensure that their application can tolerate those behaviors.

In this work, we propose stateless model checking algorithms for studying correctness of such applications that rely on dynamic partial order reduction. These algorithms work for a number of widely-used weak isolation levels, including Read Committed, Causal Consistency, Snapshot Isolation and Serializability. We show that they are complete, sound and optimal, and run with polynomial memory consumption in all cases. We report on an implementation of these algorithms in the context of Java Pathfinder applied to a number of challenging applications drawn from the literature of distributed systems and databases.

A growing number of libraries written in managed languages, such as Python and JavaScript, are bringing about new demand for a foreign language interface (FFI) between two managed languages. Such an FFI allows a host-language program to seamlessly call a library function written in a foreign language and exchange objects. It is often implemented by a user-level library but such implementation cannot reclaim cyclic garbage, or a group of objects with circular references, across the language boundary. This paper proposes *Refgraph GC*, which enables FFI implementation that can reclaim cyclic garbage. Refgraph GC coordinates the garbage collectors of two languages and it needs to modify the managed runtime of one language only. It does not modify that of the other language. This paper discusses the soundness and completeness of the proposed algorithm and also shows the results of the experiments with our implementation of FFI with Refgraph GC. This FFI allows a Ruby program to access a JavaScript library.

This paper presents ProbCompCert, a compiler for a subset of the Stan probabilistic programming language (PPL), in which several key compiler passes have been formally verified using the Coq proof assistant. Because of the probabilistic nature of PPLs, bugs in their compilers can be difficult to detect and fix, making verification an interesting possibility. However, proving correctness of PPL compilation requires new techniques because certain transformations performed by compilers for PPLs are quite different from other kinds of languages. This paper describes techniques for verifying such transformations and their application in ProbCompCert. In the course of verifying ProbCompCert, we found an error in the Stan language reference manual related to the semantics and implementation of a key language construct.

Modern web browsers rely on layout engines to convert HTML documents to layout trees that specify color, size, and position. However, existing layout engines are notoriously difficult to maintain because of the complexity of web standards. This is especially true for incremental layout engines, which are designed to improve performance by updating only the parts of the layout tree that need to be changed.

In this paper, we propose Medea, a new framework for automatically generating incremental layout engines. Medea separates the specification of the layout engine from its incremental implementation, and guarantees correctness through layout engine synthesis. The synthesis is driven by a new iterative algorithm based on detecting conflicts that prevent optimality of the incremental algorithm.

We evaluated Medea on a fragment of HTML layout that includes challenging features such as margin collapse, floating layout, and absolute positioning. Medea successfully synthesized an incremental layout engine for this fragment. The synthesized layout engine is both correct and efficient. In particular, we demonstrated that it avoids real-world bugs that have been reported in the layout engines of Chrome, Firefox, and Safari. The incremental layout engine synthesized by Medea is up to 1.82× faster than a naive incremental baseline. We also demonstrated that our conflict-driven algorithm produces engines that are 2.74× faster than a baseline without conflict analysis.

Verification of distributed systems using systematic exploration is daunting because of the many possible interleavings of messages and failures. When faced with this scalability challenge, existing approaches have traditionally mitigated state space explosion by avoiding exploration of redundant states (e.g., via state hashing) and redundant interleavings of transitions (e.g., via partial-order reductions). In this paper, we present an efficient symbolic exploration method that not only avoids redundancies in states and interleavings, but additionally avoids redundant computations that are performed during updates to states on transitions. Our symbolic explorer leverages a novel, fine-grained, canonical representation of distributed system configurations (states) to identify opportunities for avoiding such redundancies on-the-fly. The explorer also includes an interface that is compatible with abstractions for state-space reduction and with partial-order and other reductions for avoiding redundant interleavings. We implement our approach in the tool Psym and empirically demonstrate that it outperforms a state-of-the-art exploration tool, can successfully verify many common distributed protocols, and can scale to multiple real-world industrial case studies across

This paper presents a new synthesis-based approach for batch image processing. Unlike existing tools that can only apply global edits to the entire image, our method can apply fine-grained edits to individual objects within the image. For example, our method can selectively blur or crop specific objects that have a certain property. To facilitate such fine-grained image editing tasks, we propose a neuro-symbolic domain-specific language (DSL) that combines pre-trained neural networks for image classification with other language constructs that enable symbolic reasoning. Our method can automatically learn programs in this DSL from user demonstrations by utilizing a novel synthesis algorithm. We have implemented the proposed technique in a tool called ImageEye and evaluated it on 50 image editing tasks. Our evaluation shows that ImageEye is able to automate 96% of these tasks.

Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority inheritance, but priority inversions in the use of condition variables are not well-studied and dynamic techniques are not suitable.

In this work, we use a combination of static and dynamic techniques to prevent priority inversion in code that uses mutexes and condition variables. A type system ensures that condition variables are used safely, even while dynamic techniques change thread priorities at runtime to eliminate priority inversions in the use of mutexes. We prove the soundness of our system, using a model of priority inversions based on cost models for parallel programs. To show that the type system is practical to implement, we encode it within the type systems of Rust and C++, and show that the restrictions are not overly burdensome by writing sizeable case studies using these encodings, including porting the Memcached object server to use our C++ implementation.

Batteryless energy-harvesting devices enable computing in inaccessible environments, at a cost to programmability and correctness. These devices operate intermittently as energy is available, using a recovery system to save and restore state. Some program tasks must execute atomically w.r.t. power failures, re-executing if power fails before completion. Any re-execution should typically be *idempotent*—its behavior should match the behavior of a single execution. Thus, a key aspect of correct intermittent execution is identifying and recovering state causing undesired non-idempotence. Unfortunately, past intermittent systems take an ad-hoc approach, using unsound dataflow analyses or conservatively recovering all written state. Moreover, no prior work allows the programmer to directly specify idempotence requirements (including allowable non-idempotence).

We present curricle, the first *type system* approach to safe intermittence, for Rust. Type level reasoning allows programmers to express requirements and retains alias information crucial for sound analyses. Curricle uses information flow and type qualifiers to reject programs causing undesired non-idempotence. We implement Curricle’s type system on top of Rust’s compiler, evaluating the prototype on benchmarks from prior work. We find that Curricle benefits application programmers by allowing them to express idempotence requirements that are checked to be satisfied, and that targeting programs checked with Curricle allows intermittent system designers to write simpler recovery systems that perform better.

Over the years, several memory models have been proposed to capture the subtle concurrency semantics of C/C++. One of the most fundamental problems associated with a memory model *M* is consistency checking: given an execution *X*, is *X* consistent with *M*? This problem lies at the heart of numerous applications, including specification testing and litmus tests, stateless model checking, and dynamic analyses. As such, it has been explored extensively and its complexity is well-understood for traditional models like SC and TSO. However, less is known for the numerous model variants of C/C++, for which the problem becomes challenging due to the intricacies of their concurrency primitives. In this work we study the problem of consistency checking for popular variants of the C11 memory model, in particular, the *RC*20 model, its release-acquire (*RA*) fragment, the strong and weak variants of *RA* (*SRA* and *WRA*), as well as the *Relaxed* fragment of *RC*20.

Motivated by applications in testing and model checking, we focus on reads-from consistency checking. The input is an execution *X* specifying a set of events, their program order and their reads-from relation, and the task is to decide the existence of a modification order on the writes of *X* that makes *X* consistent in a memory model. We draw a rich complexity landscape for this problem; our results include (i) nearly-linear-time algorithms for certain variants, which improve over prior results, (ii) fine-grained optimality results, as well as (iii) matching upper and lower bounds (NP-hardness) for other variants. To our knowledge, this is the first work to characterize the complexity of consistency checking for C11 memory models. We have implemented our algorithms inside the TruSt model checker and the C11Tester testing tool. Experiments on standard benchmarks show that our new algorithms improve consistency checking, often by a significant margin.

We present a new abstract interpretation framework for the precise over-approximation of numerical fixpoint iterators. Our key observation is that unlike in standard abstract interpretation (AI), typically used to over-approximate all reachable program states, in this setting, one only needs to abstract the concrete fixpoints, i.e., the final program states. Our framework targets numerical fixpoint iterators with convergence and uniqueness guarantees in the concrete and is based on two major technical contributions: (i) theoretical insights which allow us to compute sound and precise fixpoint abstractions without using joins, and (ii) a new abstract domain, CH-Zonotope, which admits efficient propagation and inclusion checks while retaining high precision.

We implement our framework in a tool called CRAFT and evaluate it on a novel fixpoint-based neural network architecture (monDEQ) that is particularly challenging to verify. Our extensive evaluation demonstrates that CRAFT exceeds the state-of-the-art performance in terms of speed (two orders of magnitude), scalability (one order of magnitude), and precision (25% higher certified accuracies).

Fairness properties, which state that a sequence of bad events cannot happen infinitely before a good event takes place, are often crucial in program verification. However, general methods for expressing and reasoning about various kinds of fairness properties are relatively underdeveloped compared to those for safety properties. This paper proposes FOS (Fair Operational Semantics), a theory capable of expressing arbitrary notions of fairness as an operational semantics and reasoning about these notions of fairness. In addition, FOS enables thread-local reasoning about fairness by providing thread-local simulation relations equipped with separation- logic-style resource algebras. We verify a ticket lock implementation and a client of the ticket lock under weak memory concurrency as an example, which requires reasoning about different notions of fairness including fairness of a scheduler, fairness of the ticket lock implementation, and even fairness of weak memory. The theory of FOS, as well as the examples in the paper, are fully formalized in Coq.

Near-term quantum computers are expected to work in an environment where each operation is noisy, with no error correction. Therefore, quantum-circuit optimizers are applied to minimize the number of noisy operations. Today, physicists are constantly experimenting with novel devices and architectures. For every new physical substrate and for every modification of a quantum computer, we need to modify or rewrite major pieces of the optimizer to run successful experiments. In this paper, we present QUESO, an efficient approach for automatically synthesizing a quantum-circuit optimizer for a given quantum device. For instance, in 1.2 minutes, QUESO can synthesize an optimizer with high-probability correctness guarantees for IBM computers that significantly outperforms leading compilers, such as IBM's Qiskit and TKET, on the majority (85%) of the circuits in a diverse benchmark suite.

A number of theoretical and algorithmic insights underlie QUESO: (1) An algebraic approach for representing rewrite rules and their semantics. This facilitates reasoning about complex *symbolic* rewrite rules that are beyond the scope of existing techniques. (2) A fast approach for probabilistically verifying equivalence of quantum circuits by reducing the problem to a special form of *polynomial identity testing*. (3) A novel probabilistic data structure, called a *polynomial identity filter* (PIF), for efficiently synthesizing rewrite rules. (4) A beam-search-based algorithm that efficiently applies the synthesized symbolic rewrite rules to optimize quantum circuits.

We propose a novel trace-guided approach to tackle the challenges of ambiguity and generalization in synthesis of recursive functional programs from input-output examples. Our approach augments the search space of programs with recursion traces consisting of recursive subcalls of the programs. Our method is based on a new version space algebra (VSA) for succinct representation and efficient manipulation of pairs of recursion traces and programs that are consistent with each other. We have implemented this approach in a tool called SyRup and evaluated it on benchmarks from prior work. Our evaluation demonstrates that SyRup not only requires fewer examples to achieve a certain success rate than existing synthesizers, but is also less sensitive to the quality of the examples.

Region inference is a type-based program analysis that takes a non-annotated program as input and constructs a program that explicitly manages memory allocation and deallocation by dividing the heap into a stack of regions, each of which can grow and shrink independently from other regions, using constant-time operations.

Whereas region-based memory management has shown useful in the contexts of explicit region-based memory management, and in particular, in combination with parallel execution of code, combining region inference with techniques for higher-order parallel programming has not been investigated.

In this paper, we present an implementation of a fork-join parallel construct suitable for a compiler based on region inference. We present a minimal higher-order language incorporating the parallel construct, including typing rules and a dynamic semantics for the language, and demonstrate type soundness. We present a novel effect-based region-protection inference algorithm and discuss benefits and shortcomings of the approach. We also describe an efficient implementation embedded in the MLKit Standard ML compiler. Finally, we evaluate the approach and the implementation based on a number of parallel benchmarks, and thereby demonstrate that the technique effectively utilises multi-core architectures in a higher-order functional setting.

Sanitizers are widely used compiler features that detect undefined behavior and resulting vulnerabilities by injecting runtime checks into programs. For better performance, sanitizers are often used in conjunction with optimization passes. But doing so combines two compiler features with conflicting objectives. While sanitizers want to expose undefined behavior, optimizers often exploit these same properties for performance. In this paper, we show that this clash can have serious consequences: optimizations can remove sanitizer failures, thereby hiding the presence of bugs or even introducing new ones.

We present LookUB, a differential-testing based framework for finding optimizer transformations that elide sanitizer failures. We used our method to find 17 such sanitizer-eliding optimizations in Clang. Next, we used static analysis and fuzzing to search for bugs in open-source projects that were previously hidden due to sanitizer-eliding optimizations. This led us to discover 20 new bugs in Linux Containers, libmpeg2, NTFS-3G, and WINE. Finally, we present an effective mitigation strategy based on a customization of the Clang optimizer with an overhead increase of 4%.

We present an automated approach to verify the correctness of programming assignments, such as the ones that arise in a functional programming course. Our approach takes as input student submissions and reference solutions, and uses equivalence checking to automatically prove or disprove correctness of each submission. To be effective in the context of a real-world programming course, an automated grading system must be both robust, to support programs written in a variety of style, and scalable, to treat hundreds of submissions at once. We achieve robustness by handling recursion using functional induction and by handling auxiliary functions using function call matching. We achieve scalability using a clustering algorithm that leverages the transitivity of equivalence to discover intermediate reference solutions among student submissions. We implement our approach on top of the Stainless verification system, to support equivalence checking of Scala programs. We evaluate our system and its components on over 4000 programs drawn from a functional programming course and from the program equivalence checking literature; this is the largest such evaluation to date. We show that our system is capable of proving program correctness by generating inductive equivalence proofs, and providing counterexamples for incorrect programs, with a high success rate.

We present PureCake, a mechanically-verified compiler for PureLang, a lazy, purely functional programming language with monadic effects. PureLang syntax is Haskell-like and indentation-sensitive, and its constraint-based Hindley-Milner type system guarantees safe execution. We derive sound equational reasoning principles over its operational semantics, dramatically simplifying some proofs. We prove end-to-end correctness for the compilation of PureLang down to machine code---the first such result for any lazy language---by targeting CakeML and composing with its verified compiler. Multiple optimisation passes are necessary to handle realistic lazy idioms effectively. We develop PureCake entirely within the HOL4 interactive theorem prover.

Higher-order functions pose a challenge for both static program analyses and optimizing compilers.
To simplify the analysis and compilation of languages with higher-order functions, a rich body of
prior work has proposed a variety of *defunctionalization* techniques, which can eliminate
higher-order functions from a program by transforming the program to a semantically-equivalent
first-order representation. Several modern languages take this a step further, *specializing*
higher-order functions with respect to the functions on which they operate, and in turn allowing
compilers to generate more efficient code. However, existing specializing defunctionalization
techniques restrict how function values may be used, forcing implementations to fall back on costly
dynamic alternatives. We propose *lambda set specialization* (LSS), the first specializing
defunctionalization technique which imposes no restrictions on how function values may be used. We
formulate LSS in terms of a polymorphic type system which tracks the flow of function values through
the program, and use this type system to recast specialization of higher-order functions with
respect to their arguments as a form of type monomorphization. We show that our type system admits a
simple and tractable type inference algorithm, and give a formalization and fully-mechanized proof
in the Isabelle/HOL proof assistant showing soundness and completeness of the type inference
algorithm with respect to the type system. To show the benefits of LSS, we evaluate its impact on
the run time performance of code generated by the MLton compiler for Standard ML, the OCaml
compiler, and the new Morphic functional programming language. We find that pre-processing with LSS
achieves run time speedups of up to 6.85x under MLton, 3.45x for OCaml, and 78.93x for Morphic.

Secure multiparty computation (MPC) allows for joint computation over private data from multiple entities, usually backed by powerful cryptographic techniques that protect sensitive data. Several high-level programming languages have been proposed to make writing MPC applications accessible to non-experts. These languages typically require developers to enforce security policies within the logic of the secure application itself, making it difficult to update security requirements, or to experiment with different policies. This paper presents the design and implementation of Taype, a language that permits security concerns to be decoupled from the program logic. To do so, Taype provides the first implementation of oblivious algebraic data types and tape semantics, two language features recently proposed by a core calculus for oblivious computation, λ_{OADT+}. We evaluate our implementation of Taype on a range of benchmarks, demonstrating its ability to encode a range of security polices for a rich class of data types.

We develop a new derivative based theory and algorithm for nonbacktracking regex matching that supports anchors and counting, preserves backtracking semantics, and can be extended with lookarounds. The algorithm has been implemented as a new regex backend in .NET and was extensively tested as part of the formal release process of .NET7. We present a formal proof of the correctness of the algorithm, which we believe to be the first of its kind concerning industrial implementations of regex matchers. The paper describes the complete foundation, the matching algorithm, and key aspects of the implementation involving a regex rewrite system, as well as a comprehensive evaluation over industrial case studies and other regex engines.

In this work, we study the fully automated inference of expected result values of probabilistic programs in the presence of natural programming constructs such as procedures, local variables and recursion. While crucial, capturing these constructs becomes highly non-trivial. The key contribution is the definition of a term representation, denoted as infer[.], translating a pre-expectation semantics into first-order constraints, susceptible to automation via standard methods. A crucial step is the use of logical variables, inspired by previous work on Hoare logics for recursive programs. Noteworthy, our methodology is not restricted to tail-recursion, which could unarguably be replaced by iteration and wouldn't need additional insights. We have implemented this analysis in our prototype ev-imp. We provide ample experimental evidence of the prototype's algorithmic expressibility.

File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.

WebAssembly makes it possible to run C/C++ applications on the web with near-native performance. A WebAssembly program is expressed as a collection of higher-order ML-like modules, which are composed together through a system of explicit imports and exports using a host language, enabling a form of higher- order modular programming. We present Iris-Wasm, a mechanized higher-order separation logic building on a specification of Wasm 1.0 mechanized in Coq and the Iris framework. Using Iris-Wasm, we are able to specify and verify individual modules separately, and then compose them modularly in a simple host language featuring the core operations of the WebAssembly JavaScript Interface. Building on Iris-Wasm, we develop a logical relation that enforces robust safety: unknown, adversarial code can only affect other modules through the functions that they explicitly export. Together, the program logic and the logical relation allow us to formally verify functional correctness of WebAssembly programs, even when they invoke and are invoked by unknown code, thereby demonstrating that WebAssembly enforces strong isolation between modules.

We have extended the verified CakeML compiler with a new language primitive, Eval, which permits evaluation of new CakeML syntax at runtime. This new implementation supports an ambitious form of compilation at runtime and dynamic execution, where the original and dynamically added code can share (higher-order) values and recursively call each other. This is, to our knowledge, the first verified run-time environment capable of supporting a standard LCF-style theorem prover design.

Modifying the modern CakeML compiler pipeline and proofs to support a dynamic computation semantics was an extensive project. We review the design decisions, proof techniques, and proof engineering lessons from the project, and highlight some unexpected complications.

Today's mobile, desktop, and server processors are heterogeneous, consisting not only of CPUs but also GPUs and other accelerators. Such heterogeneous processors are starting to expose a shared memory interface across these devices.Given that each of these individual devices typically supports a distinct instruction set architecture and a distinct memory consistency model, it is not clear what the memory consistency model of the heterogeneous machine should be. In this paper, we answer this question by formalizing "compound" memory models: we present a compositional operational model describing the resulting model when devices with distinct consistency models are fused together. We instantiate our model with the compound x86TSO/PTX model -- a CPU enforcing x86TSO and a GPU enforcing the PTX model. A key result is that the x86TSO/PTX compound model retains compiler mappings from the language-based (scoped) C memory model. This means that threads mapped to the x86TSO device can continue to use the already proven C-to-x86TSO compiler mapping, and the same for PTX.

We introduce indexed streams, a formal operational model and intermediate representation that describes the fused execution of a contraction language that encompasses both sparse tensor algebra and relational algebra. We prove that the indexed stream model is correct with respect to a functional semantics. We also develop a compiler for contraction expressions that uses indexed streams as an intermediate representation. The compiler is only 540 lines of code, but we show that its performance can match both the TACO compiler for sparse tensor algebra and the SQLite and DuckDB query processing libraries for relational algebra.

Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost.

We show how to *fuse* separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.

We introduce a new paradigm for analysing and finding bugs in quantum circuits. In our approach, the problem is given by a triple {*P*} *C* {*Q*} and the question is whether, given a set *P* of quantum states on the input of a circuit *C*, the set of quantum states on the output is equal to (or included in) a set *Q*. While this is not suitable to specify, e.g., functional correctness of a quantum circuit, it is sufficient to detect many bugs in quantum circuits. We propose a technique based on tree automata to compactly represent sets of quantum states and develop transformers to implement the semantics of quantum gates over this representation. Our technique computes with an algebraic representation of quantum states, avoiding the inaccuracy of working with floating-point numbers. We implemented the proposed approach in a prototype tool and evaluated its performance against various benchmarks from the literature. The evaluation shows that our approach is quite scalable, e.g., we managed to verify a large circuit with 40 qubits and 141,527 gates, or catch bugs injected into a circuit with 320 qubits and 1,758 gates, where all tools we compared with failed. In addition, our work establishes a connection between quantum program verification and automata, opening new possibilities to exploit the richness of automata theory and automata-based verification in the world of quantum computing.

Test input generators are an important part of property-based testing (PBT) frameworks. Because PBT is intended to test deep semantic and structural properties of a program, the outputs produced by these generators can be complex data structures, constrained to satisfy properties the developer believes is most relevant to testing the function of interest. An important feature expected of these generators is that they be capable of producing *all* acceptable elements that satisfy the function’s input type and generator-provided constraints. However, it is not readily apparent how we might validate whether a particular generator’s output satisfies this *coverage* requirement. Typically, developers must rely on manual inspection and post-mortem analysis of test runs to determine if the generator is providing sufficient coverage; these approaches are error-prone and difficult to scale as generators become more complex. To address this important concern, we present a new refinement type-based verification procedure for validating the coverage provided by input test generators, based on a novel interpretation of types that embeds “*must*-style” underapproximate reasoning principles as a fundamental part of the type system. The types associated with expressions now capture the set of values *guaranteed* to be produced by the expression, rather than the typical formulation that uses types to represent the set of values an expression *may* produce. Beyond formalizing the notion of *coverage types* in the context of a rich core language with higher-order procedures and inductive datatypes, we also present a detailed evaluation study to justify the utility of our ideas.

Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best- known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel 12𝑡ℎ and 13𝑡ℎ generations.

Cloud developers have to build applications that are resilient to failures and interruptions. We advocate for a fault-tolerant programming model for the cloud based on actors, retry orchestration, and tail calls. This model builds upon persistent data stores and message queues readily available on the cloud. Retry orchestration not only guarantees that (1) failed actor invocations will be retried but also that (2) completed invocations are never repeated and (3) it preserves a strict happen-before relationship across failures within call stacks. Tail calls can break complex tasks into simple steps to minimize re-execution during recovery. We review key application patterns and failure scenarios. We formalize a process calculus to precisely capture the mechanisms of fault tolerance in this model. We briefly describe our implementation. Using an application inspired by a typical enterprise scenario, we validate the functional correctness of our implementation and assess the impact of fault preparedness and recovery on performance.

Regular expression inference (REI) is a supervised machine learning and program synthesis problem that takes a cost metric for regular expressions, and positive and negative examples of strings as input. It outputs a regular expression that is precise (i.e., accepts all positive and rejects all negative examples), and minimal w.r.t. to the cost metric. We present a novel algorithm for REI over arbitrary alphabets that is enumerative and trades off time for space. Our main algorithmic idea is to implement the search space of regular expressions succinctly as a contiguous matrix of bitvectors. Collectively, the bitvectors represent, as characteristic sequences, all sub-languages of the infix-closure of the union of positive and negative examples. Mathematically, this is a semiring of (a variant of) formal power series. Infix-closure enables bottom-up compositional construction of larger from smaller regular expressions using the operations of our semiring. This minimises data movement and data-dependent branching, hence maximises data-parallelism. In addition, the infix-closure remains unchanged during the search, hence search can be staged: first pre-compute various expensive operations, and then run the compute intensive search process. We provide two C++ implementations, one for general purpose CPUs and one for Nvidia GPUs (using CUDA). We benchmark both on Google Colab Pro: the GPU implementation is on average over 1000x faster than the CPU implementation on the hardest benchmarks.

Concurrent separation logic has been responsible for major advances in the formal verification of fine-grained concurrent algorithms and data structures such as locks, barriers, queues, and reference counters. The key ingredient of the verification of a fine-grained program is an invariant, which relates the physical data representation (on the heap) to a logical representation (in mathematics) and to the state of the threads (using a form of ghost state). An invariant is typically represented as a disjunction of logical states, but this disjunctive nature makes invariants a difficult target for automated verification. Current approaches roughly suffer from two problems. They use backtracking to introduce disjunctions in an uninformed manner, which can lead to unprovable goals if an appropriate case analysis has not been made before choosing the disjunct. Moreover, they eliminate disjunctions too eagerly, which can cause poor efficiency.

While disjunctions are no problem for automated provers based on classical (i.e., non-separating) logic, the challenges with disjunctions are prominent in the study of proof automation for intuitionistic logic. We take inspiration from that area—specifically, based on ideas from *connection calculus*, we design a simple multi-succedent calculus for separation logic with disjunctions featuring a novel concept of a *connection*. While our calculus is not complete, it has the advantage that it can be extended with features of the state-of-the-art concurrent separation logic Iris (such as modalities, higher-order quantification, ghost state, and invariants), and can be implemented effectively in the Coq proof assistant with little need for backtracking. We evaluate the practicality on 24 challenging benchmarks, 14 of which we can verify fully automatically.

Conflict-Free Replicated Data Types (CRDTs) are a recent approach for keeping replicated data consistent while guaranteeing the absence of conflicts among replicas. For correct operation, CRDTs rely on a merge function that is commutative, associative and idempotent. Ensuring that such algebraic properties are satisfied by implementations, however, is left to the programmer, resulting in a process that is complex and error-prone. While techniques based on testing, automatic verification of a model, and mechanized or handwritten proofs are available, we lack an approach that is able to verify such properties on concrete CRDT implementations.

In this paper, we present Propel, a programming language with a type system that captures the algebraic properties required by a correct CRDT implementation. The Propel type system deduces such properties by case analysis and induction: sum types guide the case analysis and algebraic properties in function types enable induction for free. Propel’s key feature is its capacity to reason about algebraic properties (a) in terms of rewrite rules and (b) to derive the equality or inequality of expressions from the properties. We provide an implementation of Propel as a Scala embedding, we implement several CRDTs, verify them with Propel and compare the verification process with four state-of-the-art verification tools. Our evaluation shows that Propel is able to automatically deduce the properties that are relevant for common CRDT implementations found in open-source libraries even in cases in which competitors timeout.

We propose algebraic protocols that enable the definition of protocol templates and session types analogous to the definition of domain-specific types with algebraic datatypes. Parameterized algebraic protocols subsume all regular as well as most context-free and nested session types and, at the same time, replace the expensive superlinear algorithms for type checking by a nominal check that runs in linear time. Algebraic protocols in combination with polymorphism increase expressiveness and modularity by facilitating new ways of parameterizing and composing session types.

The Rust type system guarantees memory safety and data-race freedom. However, to satisfy Rust's type rules, many familiar implementation patterns must be adapted substantially. These necessary adaptations complicate programming and might hinder language adoption. In this paper, we demonstrate that, in contrast to manual programming, automatic synthesis is not complicated by Rust's type system, but rather benefits in two major ways. First, a Rust synthesizer can get away with significantly simpler specifications. While in more traditional imperative languages, synthesizers often require lengthy annotations in a complex logic to describe the shape of data structures, aliasing, and potential side effects, in Rust, all this information can be inferred from the types, letting the user focus on specifying functional properties using a slight extension of Rust expressions. Second, the Rust type system reduces the search space for synthesis, which improves performance.

In this work, we present the first approach to automatically synthesizing correct-by-construction programs in safe Rust. The key ingredient of our synthesis procedure is Synthetic Ownership Logic, a new program logic for deriving programs that are guaranteed to satisfy both a user-provided functional specification and, importantly, Rust's intricate type system. We implement this logic in a new tool called RusSOL. Our evaluation shows the effectiveness of RusSOL, both in terms of annotation burden and performance, in synthesizing provably correct solutions to common problems faced by new Rust developers.

Thin hypervisors make it possible to isolate key security components like keychains, fingerprint readers, and digital wallets from the easily-compromised operating system. To work together, virtual machines running on top of the hypervisor can make hypercalls to the hypervisor to share pages between each other in a controlled way. However, the design of such hypercall ABIs remains a delicate balancing task between conflicting needs for expressivity, performance, and security. In particular, it raises the question of what makes the specification of a hypervisor, and of its hypercall ABIs, good enough for the virtual machines. In this paper, we validate the expressivity and security of the design of the hypercall ABIs of Arm's FF-A. We formalise a substantial fragment of FF-A as a machine with a simplified ISA in which hypercalls are steps of the machine. We then develop VMSL, a novel separation logic, which we prove sound with respect to the machine execution model, and use it to reason modularly about virtual machines which communicate through the hypercall ABIs, demonstrating the hypercall ABIs' expressivity. Moreover, we use the logic to prove robust safety of communicating virtual machines, that is, the guarantee that even if some of the virtual machines are compromised and execute unknown code, they cannot break the safety properties of other virtual machines running known code. This demonstrates the intended security guarantees of the hypercall ABIs. All the results in the paper have been formalised in Coq using the Iris framework.

We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability.

Information leaks are a significant problem in modern software systems. In recent years, information theoretic concepts, such as Shannon entropy, have been applied to quantifying information leaks in programs. One recent approach is to use symbolic execution together with model counting constraints solvers in order to quantify information leakage. There are at least two reasons for unsoundness in quantifying information leakage using this approach: 1) Symbolic execution may not be able to explore all execution paths, 2) Model counting constraints solvers may not be able to provide an exact count. We present a sound symbolic quantitative information flow analysis that bounds the information leakage both for the cases where the program behavior is not fully explored and the model counting constraint solver is unable to provide a precise model count but provides an upper and a lower bound. We implemented our approach as an extension to KLEE for computing sound bounds for information leakage in C programs.

As zero-knowledge proofs gain increasing adoption, the cryptography community has designed domain-specific languages (DSLs) that facilitate the construction of zero-knowledge proofs (ZKPs). Many of these DSLs, such as Circom, facilitate the construction of arithmetic circuits, which are essentially polynomial equations over a finite field. In particular, given a program in a zero-knowledge proof DSL, the compiler automatically produces the corresponding arithmetic circuit. However, a common and serious problem is that the generated circuit may be underconstrained, either due to a bug in the program or a bug in the compiler itself. Underconstrained circuits admit multiple witnesses for a given input, so a malicious party can generate bogus witnesses, thereby causing the verifier to accept a proof that it should not. Because of the increasing prevalence of such arithmetic circuits in blockchain applications, several million dollars worth of cryptocurrency have been stolen due to underconstrained arithmetic circuits.

Motivated by this problem, we propose a new technique for finding ZKP bugs caused by underconstrained polynomial equations over finite fields. Our method performs semantic reasoning over the finite field equations generated by the compiler to prove whether or not each signal is uniquely determined by the input. Our proposed approach combines SMT solving with lightweight uniqueness inference to effectively reason about underconstrained circuits. We have implemented our proposed approach in a tool called *QED*^{2} and evaluate it on 163 Circom circuits. Our evaluation shows that *QED*^{2} can successfully solve 70% of these benchmarks, meaning that it either verifies the uniqueness of the output signals or finds a pair of witnesses that demonstrate non-uniqueness of the circuit. Furthermore, *QED*^{2} has found 8 previously unknown vulnerabilities in widely-used circuits.

We introduce Flux, which shows how logical refinements can work hand in glove with Rust's ownership mechanisms to yield ergonomic type-based verification of low-level pointer manipulating programs. First, we design a novel refined type system for Rust that indexes mutable locations, with pure (immutable) values that can appear in refinements, and then exploits Rust's ownership mechanisms to abstract sub-structural reasoning about locations within Rust's polymorphic type constructors, while supporting strong updates. We formalize the crucial dependency upon Rust's strong aliasing guarantees by exploiting the Stacked Borrows aliasing model to prove that "well-borrowed evaluations of well-typed programs do not get stuck". Second, we implement our type system in Flux, a plug-in to the Rust compiler that exploits the factoring of complex invariants into types and refinements to efficiently synthesize loop annotations-including complex quantified invariants describing the contents of containers-via liquid inference. Third, we evaluate Flux with a benchmark suite of vector manipulating programs and parts of a previously verified secure sandboxing library to demonstrate the advantages of refinement types over program logics as implemented in the state-of-the-art Prusti verifier. While Prusti's more expressive program logic can, in general, verify deep functional correctness specifications, for the lightweight but ubiquitous and important verification use-cases covered by our benchmarks, liquid typing makes verification ergonomic by slashing specification lines by a factor of two, verification time by an order of magnitude, and annotation overhead from up to 24% of code size (average 14%), to nothing at all.

Although functional programming languages simplify writing safe parallel programs by helping programmers to avoid data races, they have traditionally delivered poor performance. Recent work improved performance by using a hierarchical memory architecture that allows processors to allocate and reclaim memory independently without any synchronization, solving thus the key performance challenge afflicting functional programs. The approach, however, restricts mutation, or memory effects, so as to ensure "disentanglement", a low-level memory property that guarantees independence between different heaps in the hierarchy.

This paper proposes techniques for supporting entanglement and for allowing functional programs to use mutation at will. Our techniques manage entanglement by distinguishing between disentangled and entangled objects and shielding disentangled objects from the cost of entanglement management. We present a semantics that formalizes entanglement as a property at the granularity of memory objects, and define several cost metrics to reason about and bound the time and space cost of entanglement. We present an implementation of the techniques by extending the MPL compiler for Parallel ML. The extended compiler supports all features of the Parallel ML language, including unrestricted effects. Our experiments using a variety of benchmarks show that MPL incurs a small time and space overhead compared to sequential runs, scales well, and is competitive with languages such as C++, Go, Java, OCaml. These results show that our techniques can marry the safety benefits of functional programming with performance.

Synthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular language model. In this paper, we propose Absynthe, an alternative approach based on user-defined abstract semantics that aims to be both lightweight and language agnostic, yet effective in guiding the search for programs. A synthesis goal in Absynthe is specified as an abstract specification in a lightweight user-defined abstract domain and concrete test cases. The synthesis engine is parameterized by the abstract semantics and independent of the source language. Absynthe validates candidate programs against test cases using the actual concrete language implementation to ensure correctness. We formalize the synthesis rules for Absynthe and describe how the key ideas are scaled-up in our implementation in Ruby. We evaluated Absynthe on SyGuS strings benchmark and found it competitive with other enumerative search solvers. Moreover, Absynthe's ability to combine abstract domains allows the user to move along a cost spectrum, i.e., expressive domains prune more programs but require more time. Finally, to verify Absynthe can act as a general purpose synthesis tool, we use Absynthe to synthesize Pandas data frame manipulating programs in Python using simple abstractions like types and column labels of a data frame. Absynthe reaches parity with AutoPandas, a deep learning based tool for the same benchmark suite. In summary, our results demonstrate Absynthe is a promising step forward towards a general-purpose approach to synthesis that may broaden the applicability of synthesis to more full-featured languages.

With the growing practice of mechanizing language metatheories, it has become ever more pressing that interactive theorem provers make it easy to write reusable, extensible code and proofs. This paper presents a novel language design geared towards extensible metatheory mechanization in a proof assistant. The new design achieves reuse and extensibility via a form of family polymorphism, an object-oriented idea, that allows code and proofs to be polymorphic to their enclosing families. Our development addresses technical challenges that arise from the underlying language of a proof assistant being simultaneously functional, dependently typed, a logic, and an interactive tool. Our results include (1) a prototypical implementation of the language design as a Coq plugin, (2) a dependent type theory capturing the essence of the language mechanism and its consistency and canonicity results, and (3) case studies showing how the new expressiveness naturally addresses real programming challenges in metatheory mechanization.

While synthesizing and repairing regular expressions (regexes) based on Programming-by-Examples (PBE) methods have seen rapid progress in recent years, all existing works only support synthesizing or repairing regexes for membership testing, and the support for extraction is still an open problem. This paper fills the void by proposing the first PBE-based method for synthesizing and repairing regexes for extraction. Our work supports regexes that have real-world extensions such as backreferences and lookarounds. The extensions significantly affect the PBE-based synthesis and repair problem. In fact, we show that there are unsolvable instances of the problem if the synthesized regexes are not allowed to use the extensions, i.e., there is no regex without the extensions that correctly classify the given set of examples, whereas every problem instance is solvable if the extensions are allowed. This is in stark contrast to the case for the membership where every instance is guaranteed to have a solution expressible by a pure regex without the extensions. The main contribution of the paper is an algorithm to solve the PBE-based synthesis and repair problem for extraction. Our algorithm builds on existing methods for synthesizing and repairing regexes for membership testing, i.e., the enumerative search algorithms with SMT constraint solving. However, significant extensions are needed because the SMT constraints in the previous works are based on a non-deterministic semantics of regexes. Non-deterministic semantics is sound for membership but not for extraction, because which substrings are extracted depends on the deterministic behavior of actual regex engines. To address the issue, we propose a new SMT constraint generation method that respects the deterministic behavior of regex engines. For this, we first define a novel formal semantics of an actual regex engine as a deterministic big-step operational semantics, and use it as a basis to design the new SMT constraint generation method. The key idea to simulate the determinism in the formal semantics and the constraints is to consider continuations of regex matching and use them for disambiguation. We also propose two new search space pruning techniques called approximation-by-pure-regex and approximation-by-backreferences that make use of the extraction information in the examples. We have implemented the synthesis and repair algorithm in a tool called R3 (Repairing Regex for extRaction) and evaluated it on 50 regexes that contain real-world extensions. Our evaluation shows the effectiveness of the algorithm and that our new pruning techniques substantially prune the search space.

A key challenge in example-based program synthesis is the gigantic search space of programs. To address this challenge, various work proposed to use abstract interpretation to prune the search space. However, most of existing approaches have focused only on forward abstract interpretation, and thus cannot fully exploit the power of abstract interpretation. In this paper, we propose a novel approach to inductive program synthesis via iterative forward-backward abstract interpretation. The forward abstract interpretation computes possible outputs of a program given inputs, while the backward abstract interpretation computes possible inputs of a program given outputs. By iteratively performing the two abstract interpretations in an alternating fashion, we can effectively determine if any completion of each partial program as a candidate can satisfy the input-output examples. We apply our approach to a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a wide range of inductive synthesis tasks. We have implemented our approach and evaluated it on a set of benchmarks from the prior work. The experimental results show that our approach significantly outperforms the state-of-the-art approaches thanks to the sophisticated abstract interpretation techniques.

Information flow security ensures that the secret data manipulated by a program does not influence its observable output. Proving information flow security is especially challenging for concurrent programs, where operations on secret data may influence the execution time of a thread and, thereby, the interleaving between different threads. Such internal timing channels may affect the observable outcome of a program even if an attacker does not observe execution times. Existing verification techniques for information flow security in concurrent programs attempt to prove that secret data does not influence the relative timing of threads. However, these techniques are often restrictive (for instance because they disallow branching on secret data) and make strong assumptions about the execution platform (ignoring caching, processor instructions with data-dependent runtime, and other common features that affect execution time).

In this paper, we present a novel verification technique for secure information flow in concurrent programs that lifts these restrictions and does not make any assumptions about timing behavior. The key idea is to prove that all mutating operations performed on shared data commute, such that different thread interleavings do not influence its final value. Crucially, commutativity is required only for an abstraction of the shared data that contains the information that will be leaked to a public output. Abstract commutativity is satisfied by many more operations than standard commutativity, which makes our technique widely applicable.

We formalize our technique in CommCSL, a relational concurrent separation logic with support for commutativity-based reasoning, and prove its soundness in Isabelle/HOL. We implemented CommCSL in HyperViper, an automated verifier based on the Viper verification infrastructure, and demonstrate its ability to verify challenging examples.

We present a new approach to the design and implementation of probabilistic programming languages (PPLs), based on the idea of stochastically estimating the probability density ratios necessary for probabilistic inference. By relaxing the usual PPL design constraint that these densities be computed exactly, we are able to eliminate many common restrictions in current PPLs, to deliver a language that, for the first time, simultaneously supports first-class constructs for marginalization and nested inference, unrestricted stochastic control flow, continuous and discrete sampling, and programmable inference with custom proposals. At the heart of our approach is a new technique for compiling these expressive probabilistic programs into randomized algorithms for unbiasedly estimating their densities and density reciprocals. We employ these stochastic probability estimators within modified Monte Carlo inference algorithms that are guaranteed to be sound despite their reliance on inexact estimates of density ratios. We establish the correctness of our compiler using logical relations over the semantics of λ_{SP}, a new core calculus for modeling and inference with stochastic probabilities. We also implement our approach in an open-source extension to Gen, called GenSP, and evaluate it on six challenging inference problems adapted from the modeling and inference literature. We find that: (1) can automate fast density estimators for programs with very expensive exact densities; (2) convergence of inference is mostly unaffected by the noise from these estimators; and (3) our sound-by-construction estimators are competitive with hand-coded density estimators, incurring only a small constant-factor overhead.

Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency.

In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting.

We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks. Our results indicate that our new notion of sync-preserving deadlocks is highly effective, as (i) it can characterize the vast majority of deadlocks and (ii) it can be detected using an online, sound, complete and highly efficient algorithm.

Inductive relations offer a powerful and expressive way of writing program specifications while facilitating compositional reasoning. Their widespread use by proof assistant users has made them a particularly attractive target for proof engineering tools such as QuickChick, a property-based testing tool for Coq which can automatically derive generators for values satisfying an inductive relation. However, while such generators are generally efficient, there is an infrequent yet seemingly inevitable situation where their performance greatly degrades: when multiple inductive relations constrain the same piece of data.

In this paper, we introduce an algorithm for merging two such inductively defined properties that share an index. The algorithm finds shared structure between the two relations, and creates a single merged relation that is provably equivalent to the conjunction of the two. We demonstrate, through a series of case studies, that the merged relations can improve the performance of automatic generation by orders of magnitude, as well as simplify mechanized proofs by getting rid of the need for nested induction and tedious low-level book-keeping.

The fair division literature in economics considers how to divide resources between multiple agents such that the allocation is envy-free: each agent receives their favorite piece. Researchers have developed a variety of fair division protocols for the most standard setting, where the agents want to split a single item, however, the protocols are highly intricate and the proofs of envy-freeness involve tedious case analysis.

We propose Slice, a domain specific language for fair-division. Programs in our language can be converted to logical formulas encoding envy-freeness and other target properties. Then, the constraints can be dispatched to automated solvers. We prove that our constraint generation procedure is sound and complete. We also report on a prototype implementation of Slice, which we have used to automatically check envy-freeness for several protocols from the fair division literature.

We introduce program reconditioning, a method for allowing program generation and differential testing to be used to find miscompilation bugs, and test-case reduction to be used to simplify bug-triggering programs, even when (a) the programming language of interest features undefined behaviour (UB) and (b) no tools exist to detect and avoid this UB. We present two program generation tools based on our reconditioning idea: GLSLsmith for the OpenGL Shading Language (GLSL), a widely-used language for graphics programming, and WGSLsmith for the WebGPU Shading Language (WGSL), a new language for web-based graphics rendering. GLSL features many UBs, but unlike for languages such as C and C++ no tools exist to detect them automatically. While the WGSL language specification features very limited UB, early WGSL implementations do exhibit UB, for reasons of initial implementation simplicity, making it challenging to test them to quickly detect and eliminate unrelated miscompilation bugs. Thanks to reconditioning, we show that GLSLsmith and WGSLsmith allow differential testing and test-case reduction to be applied to compilers for GLSL and WGSL for the first time, despite the unavailability of UB detection techniques for these languages. Through a large testing campaign, we have found 24 and 33 bugs in GLSL and WGSL compilers, respectively. We present experiments showing that when reconditioning is disabled, compiler testing leads to a high rate of test programs that appear to trigger miscompilation bugs, but actually just feature UB. We also present a novel approach to managing floating-point roundoff error using reconditioning, implemented for both GLSL and WGSL.

Compilers are part of the foundation upon which software systems are built; they need to be as correct as possible. This paper is about stress-testing loop optimizers; it presents a major reimplementation of Yet Another Random Program Generator (YARPGen), an open-source generative compiler fuzzer. This new version has found 122 bugs, both in compilers for data-parallel languages, such as the Intel® Implicit SPMD Program Compiler and the Intel® oneAPI DPC++ compiler, and in C++ compilers such as GCC and Clang/LLVM. The first main contribution of our work is a novel method for statically avoiding undefined behavior when generating loops; the resulting programs conform to the relevant language standard, enabling automated testing. The second main contribution is a collection of mechanisms for increasing the diversity of generated loop code; in our evaluation, we demonstrate that these make it possible to trigger loop optimizations significantly more often, providing opportunities to discover bugs in the optimizers.

Automatically proving linearizability of concurrent data structures remains a key challenge for verification. We present temporal interpolation as a new proof principle to guide automated proof search using hindsight arguments within concurrent separation logic. Temporal interpolation offers an easy-to-automate alternative to prophecy variables and has the advantage of structuring proofs into easy-to-discharge hypotheses. Additionally, we advance hindsight theory by integrating it into a program logic, bringing formal rigor and complementary proof machinery. We substantiate the usefulness of temporal interpolation by implementing it in a tool and using it to automatically verify the Logical Ordering tree. The proof is challenging due to future-dependent linearization points and complex structure overlays. It is the first formal proof of this data structure. Interestingly, our formalization revealed an unknown bug and an existing informal proof as erroneous.

We investigate the problem of developing an "in-order" shared-memory concurrency model for languages like C and C++, which executes instructions following their program order, and is thus more amenable to reasoning and verification compared to recent complex proposals with out-of-order execution. We demonstrate that it is possible to fully support non-atomic accesses in an in-order model in a way that validates all compiler optimizations that are performed in single-threaded code (including irrelevant load introduction). The key to doing so is to utilize the distinction between a source model (with catch-fire semantics) and an intermediate representation (IR) model (with undefined value for racy reads) and formally establish the soundness of mapping from source to IR. As for relaxed atomic accesses, an in-order model must forbid load-store reordering. We discuss the rather limited performance impact of this fact and present a pragmatic approach to this problem, which, in the long term, requires a new kind of hardware store instructions for implementing relaxed stores. The source and IR semantics proposed in this paper are based on recent versions of the promising semantics, and the correctness proofs of the mappings from the source to the IR and from the IR to Armv8 are mechanized in Coq. This work is the first to formally relate an in-order source model and an out-of-order IR model with the goal of having an in-order source semantics without any performance overhead for non-atomics.

While mixed integer linear programming (MILP) solvers are routinely used to solve a wide range of important science and engineering problems, it remains a challenging task for end users to write correct and efficient MILP constraints, especially for problems specified using the inherently non-linear Boolean logic operations. To overcome this challenge, we propose a syntax guided synthesis (SyGuS) method capable of generating high-quality MILP constraints from the specifications expressed using arbitrary combinations of Boolean logic operations. At the center of our method is an extensible domain specification language (DSL) whose expressiveness may be improved by adding new integer variables as decision variables, together with an iterative procedure for synthesizing linear constraints from non-linear Boolean logic operations using these integer variables. To make the synthesis method efficient, we also propose an over-approximation technique for soundly proving the correctness of the synthesized linear constraints, and an under-approximation technique for safely pruning away the incorrect constraints. We have implemented and evaluated the method on a wide range of benchmark specifications from statistics, machine learning, and data science applications. The experimental results show that the method is efficient in handling these benchmarks, and the quality of the synthesized MILP constraints is close to, or higher than, that of manually-written constraints in terms of both compactness and solving time.

Complete verification of deep neural networks (DNNs) can exactly determine whether the DNN satisfies a desired trustworthy property (e.g., robustness, fairness) on an infinite set of inputs or not. Despite the tremendous progress to improve the scalability of complete verifiers over the years on individual DNNs, they are inherently inefficient when a deployed DNN is updated to improve its inference speed or accuracy. The inefficiency is because the expensive verifier needs to be run from scratch on the updated DNN. To improve efficiency, we propose a new, general framework for incremental and complete DNN verification based on the design of novel theory, data structure, and algorithms. Our contributions implemented in a tool named IVAN yield an overall geometric mean speedup of 2.4x for verifying challenging MNIST and CIFAR10 classifiers and a geometric mean speedup of 3.8x for the ACAS-XU classifiers over the state-of-the-art baselines.

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction.

Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks while abstracting language model internals and providing high-level semantics.

To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model.

We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).

Neural networks are successful in various tasks but are also susceptible to adversarial examples. An adversarial example is generated by adding a small perturbation to a correctly-classified input with the goal of causing a network classifier to misclassify. In one pixel attacks, an attacker aims to fool an image classifier by modifying a single pixel. This setting is challenging for two reasons: the perturbation region is very small and the perturbation is not differentiable. To cope, one pixel attacks iteratively generate candidate adversarial examples and submit them to the network until finding a successful candidate. However, existing works require a very large number of queries, which is infeasible in many practical settings, where the attacker is limited to a few thousand queries to the network. We propose a novel approach for computing one pixel attacks. The key idea is to leverage program synthesis and identify an expressive program sketch that enables to compute adversarial examples using significantly fewer queries. We introduce OPPSLA, a synthesizer that, given a classifier and a training set, instantiates the sketch with customized conditions over the input’s pixels and the classifier’s output. OPPSLA employs a stochastic search, inspired by the Metropolis-Hastings algorithm, that synthesizes typed expressions enabling minimization of the number of queries to the classifier. We further show how to extend OPPSLA to compute few pixel attacks minimizing the number of perturbed pixels. We evaluate OPPSLA on several deep networks for CIFAR-10 and ImageNet. We show that OPPSLA obtains a state-of-the-art success rate, often with an order of magnitude fewer queries than existing attacks. We further show that OPPSLA’s programs are transferable to other classifiers, unlike existing one pixel attacks, which run from scratch on every classifier and input.

Unstructured sparse neural networks are an important class of machine learning (ML) models, as they compact model size and reduce floating point operations. The execution time of these models is frequently dominated by the sparse matrix multiplication (SpMM) kernel, *C*=*A*× *B*, where *A* is a sparse matrix, and *B* and *C* are dense matrices. The unstructured sparsity pattern of matrices in pruned machine learning models along with their sparsity ratio has rendered useless the large class of libraries and systems that optimize sparse matrix multiplications. Reusing registers is particularly difficult because accesses to memory locations should be known statically. This paper proposes Sparse Register Tiling, a new technique composed of an unroll-and-sparse-jam transformation followed by data compression that is specifically tailored to sparsity patterns in ML matrices. Unroll-and-sparse-jam uses sparsity information to jam the code while improving register reuse. Sparse register tiling is evaluated across 2396 weight matrices from transformer and convolutional models with a sparsity range of 60-95% and provides an average speedup of 1.72× and 2.65× over MKL SpMM and dense matrix multiplication, respectively, on a multicore CPU processor. It also provides an end-to-end speedup of 2.12× for MobileNetV1 with 70% sparsity on an ARM processor commonly used in edge devices.