VMIL 2019- Proceedings of the 11th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages

Full Citation in the ACM Digital Library


Designing a low-level virtual machine for implementing real-time managed languages

Applications of real-time systems have grown significantly in both diversity and popularity, and the appetite for real-time software has never been higher. In contrast, the choice of programming languages used to develop such systems has stagnated, mostly limited to decades-old languages, specifically Ada and C/C++, and more recently real-time Java. We posit that the high cost and difficulty of developing new programming languages for real-time systems is the main reason for this mono-culture.

To tackle the lack of diversity, we propose the design of a micro virtual machine on which managed programming languages for real-time systems can be developed. Our design facilitates bringing the advantages of correct managed languages to the real-time domain. We build on a previously published micro virtual machine specification, named Mu, and propose a set of modifications to its abstractions over concurrency and memory management to make it suitable for real-time systems.

The resulting design is a basis for a new micro virtual machine specification we call RTMu, designed as a reliable and efficient foundation for the development of managed languages for real-time systems.

Implementing a language with explicit assignment semantics

Anzen is a multi-paradigm programming language that aims to provide explicit and controllable assignment semantics. It is based on the observation that abstractions over memory management and data representation, as commonly adopted by contemporary programming languages, often transpire relics of the underlying memory model and lead to confusing assignment semantics in the presence of aliases. In response, Anzen’s goal is to offer a modern approach to programming, built on a sound and unambiguous semantics.This paper describes the implementation of a compiler for Anzen. Our implementation transpiles sources to an intermediate language inspired by the LLVM IR, designed to ease further analysis on Anzen’s statements. This intermediate representation is then consumed by a register-based virtual machine. We present the Anzen compiler’s architecture, introduce its intermediate language and describe the latter’s evaluation. Our work aims to set a reference implementation for future developments and extensions of the language.

Scalable comparison of JavaScript V8 bytecode traces

The comparison and alignment of runtime traces are essential, e.g., for semantic analysis or debugging. However, naive sequence alignment algorithms cannot address the needs of the modern web: (i) the bytecode generation process of V8 is not deterministic; (ii) bytecode traces are large.

We present STRAC, a scalable and extensible tool tailored to compare bytecode traces generated by the V8 JavaScript engine. Given two V8 bytecode traces and a distance function between trace events, STRAC computes and provides the best alignment. The key insight is to split access between memory and disk. STRAC can identify semantically equivalent web pages and is capable of processing huge V8 bytecode traces whose order of magnitude matches today's web like https://2019.splashcon.org, which generates approx. 150k of V8 bytecode instructions.

Efficient fail-fast dynamic subtype checking

We address the problem of dynamically checking if an instance of class S is also an instance of class T. Researchers have designed various strategies to perform constant-time subtype tests. Yet, well-known production implementations degrade to linear search in the worst case, in order to achieve other goals such as constant space and/or efficient dynamic class loading. The fast path is usually optimized for subtype tests that succeed. However, in workloads where dynamic type tests are common, such as Scala's pattern matching and LLVM compiler passes, we observe that 74%--93% of dynamic subtype tests return a negative result. We thus propose a scheme for fail-fast dynamic subtype checking. We assign each type a randomly generated type identifier with fixed size and fixed parity. In the compiled version of each class, we store a fixed-width bloom filter, which combines the type identifiers of all its transitive supertypes. At run-time, the bloom filters enable fast refutation of dynamic subtype tests with high probability. If such a refutation cannot be made, the scheme falls back to conventional techniques. This scheme works with multiple inheritance, separate compilation, and dynamic class loading. A prototype implementation of fail-fasts in the JVM provides provides 1.44x--2.74x speedup over HotSpot's native instanceof, on micro-benchmarks where worst-case behavior is likely.

Towards seamless interfacing between dynamic languages and native code

Existing approaches to interfacing high- and low-level code push considerable burdens onto the programmer, such as wrapper maintenance, explicit code generation, interface re-declaration, and/or signalling to garbage collectors. We note that run-time information on data layout and allocations in native code is available, and may be extended with knowledge of object lifetimes to assist in automating garbage collection. We describe work in progress towards an extension of the CPython virtual machine along these lines. We report initial experience building a first working prototype, and some early performance experiments.

Memory efficient CRDTs in dynamic environments

Modern distributed applications increasingly replicate data in order to guarantee both high availability of systems and an optimal user experience. Conflict-Free Replicated Data Types (CRDTs) are a family of data types specially designed for highly available systems which guarantee some form of eventual consistency. However, memory usage may grow unboundedly in their implementations, as garbage collection of meta-data is not tackled in most approaches.

In this paper, we explore a memory management model for operation-based CRDTs in dynamic setting, where nodes can dynamically join a network, and where the implementation can remove unnecessary meta-data employed by CRDTs used to determine the order of operations applied in different replicas. We first describe how new nodes will be brought up-to-date and fully linked with other replicas, and later we introduce our memory management model which allows meta-data to be removed.

We benchmark the memory usage of an add-wins set using different garbage collection techniques in various situations and show how our approach can be beneficial in comparison to state of the art techniques.

Which of my transient type checks are not (almost) free?

One form of type checking used in gradually typed language is transient type checking: whenever an object ‘flows’ through code with a type annotation, the object is dynamically checked to ensure it has the methods required by the annotation. Just-in-time compilation and optimisation in virtual machines can eliminate much of the overhead of run-time transient type checks. Unfortunately this optimisation is not uniform: some type checks will significantly decrease, or even increase, a program’s performance.

In this paper, we refine the so called “Takikawa” protocol, and use it to identify which type annotations have the greatest effects on performance. In particular, we show how graphing the performance of such benchmarks when varying which type annotations are present in the source code can be used to discern potential patterns in performance. We demonstrate our approach by testing the Moth virtual machine: for many of the benchmarks where Moth’s transient type checking impacts performance, we have been able to identify one or two specific type annotations that are the likely cause. Without these type annotations, the performance impact of transient type checking becomes negligible.

Using our technique programmers can optimise programs by removing expensive type checks, and VM engineers can identify new opportunities for compiler optimisation.