In robotics and computer vision, there are many inference problems that come down to fusing the information from many different sensors, or a small number of sensors deployed over time. Dellaert will review how these and other problems can be posed in terms of factor graphs, which provide a graphical language in which to develop and collaborate on such problems. The talk will emphasize the advantages and intuition that come with these graphical languages, leading to both offline and online sensor fusion algorithms. One of Frank Dellaert’s current research directions is exploring the use of code generation to create high-performance, certifiable code to support autonomous systems in the real world.
Precision medicine promises to deliver ultra-personalized care by casting medicine as an optimization problem: identifying the best possible treatment with respect to all available data. A slew of recent advances in biology, starting with the ability to sequence the human genome, have caused an explosion in the amount of data one can collect on a single patient and a similar explosion in the complexity of reasoning about this data in order to solve this optimization problem. Computational support for the practicing physician is no longer an option. This talk covers precision medicine from the ground up for computer scientists — through a personal journey from programming languages research into academic medicine. It will demonstrate progress to date, including the now-routine use of relational programming in miniKanren to identify personalized treatments for patients with some of the rarest and most challenging diseases in the world.
Database applications typically undergo several schema changes during their life cycle due to performance and maintainability reasons. Such changes to the database schema not only require migrating the underlying data to a new schema, but also re-implementing large chunks of the application code that query and update the database. In this talk, Dillig will describe her recent work on programming languages support for evolving database applications. Specifically, she will first describe her work on verifying equivalence between database applications that operate over different schema, such as those that arise before and after schema refactoring. Next, she will describe how to use this verification procedure to solve the corresponding synthesis problem: That is, given a database application and a new schema, how can we automatically generate an equivalent program over this new schema?
The subject of Warren Sack’s talk will be The Software Arts, a book recently published in the MIT Press “Software Studies” series. Sack offers an alternative history of software that traces its roots to the step-by-step descriptions of how things were made in the workshops of eighteenth-century artists and artisans. He illustrates how software was born of a coupling of the liberal arts and the mechanical arts and argues that the arts are at the heart of computing. The Software Arts is an invitation to artists and humanists to see how their ideas are already at the very center of software; and an invitation to computer scientists to envision how they are artists and humanists too.
Computational energy-efficiency is a critical aspect of many modern embedded devices as it impacts the level of autonomy for numerous scenarios. We present a component-based energy modeling approach to abstract per-component energy in a dataflow computational network executed according to a given scheduling policy. The approach is based on a modeling tool and ultimately relies on battery state to support a wider range of energy-optimization strategies for power-critical devices.
To evaluate software maintenance techniques and tools in controlled experiments with human participants, researchers currently use projects and tasks selected on an ad-hoc basis. This can unrealistically favor their tool, and it makes the comparison of results difficult. We suggest a gradual creation of a benchmark repository with projects, tasks, and metadata relevant for human-based studies. In this paper, we discuss the requirements and challenges of such a repository, along with the steps which could lead to its construction.
This paper describes NAB, a novel framework to execute custom dynamic analysis on open-source software hosted in public repositories. NAB is fully-automatic, language-agnostic and scalable. We present NAB's key features and its architecture. We also discuss three large-scale case studies enabled by NAB on more than 56K Node.js, Java, and Scala projects
This paper describes Renaissance, a new benchmark suite that covers modern JVM concurrency and parallelism paradigms.
Data replication is essential in scenarios like geo-distributed datacenters and edge computing. Yet, it poses a challenge for data consistency. Developers either adopt high consistency at the detriment of performance or they embrace low consistency and face a much higher programming complexity. We argue that language abstractions should support associating the level of consistency to data types. We present ConSysT, a programming language and middleware that provides abstractions to specify consistency types, enabling mixing different consistency levels in the same application. Such mechanism is fully integrated with object-oriented programming and type system guarantees that different levels can be mixed only in a correct way.
While Datalog provides a high-level language for expressing static program analyses, it depends on external tooling to extract the input facts from the analyzed programs. To remove this dependency, we present MetaDL. The MetaDL system consists of a Datalog language extension for source-level program analysis and tools for generating the language extension from a description of the analyzed language.
New programming languages often lack good IDE support, as developing advanced semantic editor services takes additional effort. In previous work we discussed the operational requirements of a constraint solver that leverages the declarative type system specification of a language to provide language-parametric semantic editor services. In this work we describe the implementation of our solver as a two stage process: inference and search. An editor-service specific search strategy determines how and where the search is conducted, and when it terminates. We are currently implementing and evaluating this idea.
N-body simulation is a classic application in high-performance computing. Although compiler optimizations designed for other classics such as matrix multiplication and FFT were well developed, the ones designed for N-body problems have been little studied. Hand optimizations for N-body solvers are thus rampant. To remedy this complication, we design a domain-specific language (DSL) embedded in Python for describing N-body problems and develop a DSL compiler that exploits the symmetric nature of N-body problems. It brings efficient code comparably with hand-written code.
Modern web applications have stringent latency requirements while processing an ever-increasing amount of user data. To address these challenges and improve programmer productivity, Object Relational Mapping (ORM) frameworks have been developed to allow developers writing database processing code in an object-oriented manner. Despite such frameworks, prior work found that developers still struggle in developing ORM-based web applications. This paper presents a series of study and developed tools for optimizing web applications developed using the Ruby on Rails ORM. Using automated static analysis, we detect ORM related inefficiency problems and suggests fixes to developers. Our evaluation on 12 real-world applications shows that more than 1000 performance issues can be detected and fixed.
We present a novel approach to performance, portability, and productivity of data-parallel computations on multi- and many-core architectures. Our approach is based on Multi-Dimensional Homomorphisms (MDHs) -- a formally defined class of functions that cover important data-parallel computations, e.g., linear algebra routines (BLAS) and stencil computations. For MDHs, we present a high-level Domain-Specific Language (DSL) that contributes to high user productivity, and we propose a corresponding DSL compiler which automatically generates optimized (auto-tuned) OpenCL code, thereby providing high, portable performance, over different architectures and input sizes, for programs in our DSL. Our experimental results, on Intel CPU and NVIDIA GPU, demonstrate competitive and often significantly better performance of our approach as compared to state-of-practice approaches, e.g., Intel MKL/MKL-DNN and NVIDIA cuBLAS/cuDNN.
Partial evaluation, based on the first Futamura projection, allows compiling language interpreters with given user programs to efficient target programs. GraalVM is an example system that implements this mechanism. It combines partial evaluation with profiling information and dynamic compilation, to transform interpreters into high-performance machine code at run time. However, partial evaluation is compile-time intensive, as it requires the abstract interpretation of interpreter implementations. Thus, optimizing partial evaluation is still subject to research to this day. We present an approach to speed up partial evaluation, by generating source code ahead of time, which performs partial evaluation specific to interpreter implementations. Generated code, when executed for a given user program at run time, directly emits partially evaluated interpreter instructions for language constructs it knows and sees in the program. This yields the target program faster than performing the first Futamura projection. The generated source code behaves similarly to a specialized partial evaluator deduced by performing the second Futamura projection, although no self-applying partial evaluator is involved during code generation.
Static analysis is the analysis of a program through inspection of the source code, usually carried out by an automated tool. One of the greatest challenges posed by real-world applications is that the whole program is rarely available at any point of the analysis process. One reason for this is separate compilation, where the source code of some libraries might not be accessible in order to protect intellectual property. But even if we have a complete view of the source code including the underlying operating system, we might still have trouble fitting the representation of the entire software into memory. Thus, industrial tools need to deal with uncertainty due to the lack of information.
In my dissertation I discuss state-of-the-art methods to deal with this uncertainty and attempt to improve upon each method to retain information that would otherwise be unavailable. I also propose guidelines on which methods to choose to solve certain problems.
In high-throughput distributed applications, such as large-scale banking systems, synchronization between objects becomes a bottleneck. This short paper focusses on research, in close collaboration with ING Bank, on the opportunity of leveraging application specific knowledge captured by model driven engineering approaches, to increase application performance in high-contention scenarios, while maintaining functional application-level consistency.
With the widespread adoption of concurrent programming, debugging of non-deterministic failures becomes increasingly important. Record & replay debugging aids developers in this effort by reliably reproducing recorded bugs. Because each concurrency model (e.g., threads vs actors) is particularly suited for different tasks, developers started combining them within the same application. Record & replay solutions are typically designed for one concurrency model only. In this paper we propose a novel multi-paradigm record & replay that is based on abstracting concurrency models to a common set of concepts and events.
Comparing the mutation scores achieved for test suites, one is able to judge which test suite is more effective. However, it is not known if the mutation score is a fair metric to do such comparison. In this paper, we present an empirical study, which compares developer-written and automatically generated test suites in terms of mutation score and in relation to the detection ratios of 7 mutation types. Our results indicate fairness on the mutation score.
Virtual reality (VR) is among the key and most promising emerging technologies in the field of education. The current paper aims to present an innovative VR based approach for teacher education. The development of the VR application followed a full design cycle, with active involvement of education experts during the whole development process. The evaluation results indicate a positive impact of the VR intervention on the cultivation of empathy skills. Moreover, the results are statistically significant related to other parameters under investigation, including the sense of presence, embodiment and user’s emotional experiences.
CHERI is an instruction set extension that adds capability-based addressing. With capability-based addressing, forgeable pointers are replaced by capabilities. Programs have to be able to show they possess an appropriate capability before they can access a memory location. Linear capabilities are an extension of the idea of regular capabilities. Just like regular capabilities they cannot be forged. Unlike regular capabilities, they additionally cannot be copied or duplicated in any way. Linear capabilities allow us to enforce certain security principles. They can be used to implement a secure calling convention called StkTokens. In this work we will be designing an ISA extension for CHERI so it can support linear capabilities. While doing this, we pose several questions one should consider when designing an extension of this kind, and propose some possible solutions.
Language designers and early adopters make different claims about their languages to differentiate them from others in order to attract users. Unfortunately, some of such claims are not supported by strong evidence. Moreover, the nature of languages as a special software tool makes it difficult to find objective measures to quantify and compare them per se. One approach to provide objective information about languages is empirical comparison. Hence, this research studies the usage and practice of programming languages based on mining modern, popular, existing software repositories in order to understand and characterize their effect on developing open source software. That is, to compare open source projects written in different languages to understand similarities and examine differences among them.
The designers of static analyses for null safety often try to reduce the number of false positives reported by the analysis through increased engineering effort, user-provided annotations, and/or weaker soundness guarantees. To produce a null-pointer analysis with little engineering effort, reduced false positives, and strong soundness guarantees in a principled way, we adapt the “Abstracting Gradual Typing” framework to the abstract-interpretation based program analysis setting. In particular, a simple static dataflow analysis that relies on user-provided annotations and has nullability lattice N ⊑ ⊤ (where N means “definitely not null” and ⊤ means “possibly null”) is gradualized producing a new lattice N ⊑ ? ⊑ ⊤. Question mark explicitly represents “optimistic uncertainty” in the analysis itself, supporting a formal soundness property and the “gradual guarantees” laid out in the gradual typing literature. We then implement a prototype of our gradual null-pointer analysis as a Facebook Infer checker, and compare it to existing null-pointer analyses via a suite of GitHub repositories used originally by Uber to evaluate their NullAway tool. Our prototype has architecture and output very similar to these existing tools, suggesting the value of applying our approach to more sophisticated program analyses in the future.
We present the Incremental Scannerless Generalized LR (ISGLR) parsing algorithm, which combines the benefits of Incremental Generalized LR (IGLR) parsing and Scannerless Generalized LR (SGLR) parsing. The parser preprocesses the input by modifying the previously saved parse forest. This allows the input to the parser to be a stream of parse nodes, instead of a stream of characters. Scannerless parsing relies heavily on non-determinism during parsing, negatively impacting the incrementality of ISGLR parsing. We evaluated the ISGLR parsing algorithm using file histories from Git, achieving a speedup of up to 25 times over non-incremental SGLR.
Domain-Specific Modeling raises the level of abstraction beyond programming by specifying the solution directly using visual models to express domain concepts. In many cases, final products can be generated automatically from these high-level specifications. This automation is possible because both the language and generators fit the requirements of only one domain. This paper introduces Domain-Specific Modeling and describes the related workshop that took place 20th October 2019.