Formal methods: practical applications and foundations: Editorial
Empirical software metrics for benchmarking of verification tools
We study empirical metrics for software source code, which can predict the performance of verification tools on specific types of software. Our metrics comprise variable usage patterns, loop patterns, as well as indicators of control-flow complexity and are extracted by simple data-flow analyses. We demonstrate that our metrics are powerful enough to devise a machine-learning based portfolio solver for software verification. We show that this portfolio solver would be the (hypothetical) overall winner of the international competition on software verification (SV-COMP) in three consecutive years (2014-2016). This gives strong empirical evidence for the predictive power of our metrics and demonstrates the viability of portfolio solvers for software verification. Moreover, we demonstrate the flexibility of our algorithm for portfolio construction in novel settings: originally conceived for SV-COMP'14, the construction works just as well for SV-COMP'15 (considerably more verification tasks) and for SV-COMP'16 (considerably more candidate verification tools).
From non-preemptive to preemptive scheduling using synchronization synthesis
We present a computer-aided programming approach to concurrency. The approach allows programmers to program assuming a friendly, non-preemptive scheduler, and our synthesis procedure inserts synchronization to ensure that the final program works even with a preemptive scheduler. The correctness specification is implicit, inferred from the non-preemptive behavior. Let us consider sequences of calls that the program makes to an external interface. The specification requires that any such sequence produced under a preemptive scheduler should be included in the set of sequences produced under a non-preemptive scheduler. We guarantee that our synthesis does not introduce deadlocks and that the synchronization inserted is optimal w.r.t. a given objective function. The solution is based on a finitary abstraction, an algorithm for bounded language inclusion modulo an independence relation, and generation of a set of global constraints over synchronization placements. Each model of the global constraints set corresponds to a correctness-ensuring synchronization placement. The placement that is optimal w.r.t. the given objective function is chosen as the synchronization solution. We apply the approach to device-driver programming, where the driver threads call the software interface of the device and the API provided by the operating system. Our experiments demonstrate that our synthesis method is precise and efficient. The implicit specification helped us find one concurrency bug previously missed when model-checking using an explicit, user-provided specification. We implemented objective functions for coarse-grained and fine-grained locking and observed that different synchronization placements are produced for our experiments, favoring a minimal number of synchronization operations or maximum concurrency, respectively.
Quantitative monitoring of STL with edit distance
In cyber-physical systems (CPS), physical behaviors are typically controlled by digital hardware. As a consequence, continuous behaviors are discretized by sampling and quantization prior to their processing. Quantifying the similarity between CPS behaviors and their specification is an important ingredient in evaluating correctness and quality of such systems. We propose a novel procedure for measuring robustness between digitized CPS signals and signal temporal logic (STL) specifications. We first equip STL with quantitative semantics based on the , a metric that quantifies both space and time mismatches between digitized CPS behaviors. We then develop a dynamic programming algorithm for computing the robustness degree between digitized signals and STL specifications. In order to promote hardware-based monitors we implemented our approach in FPGA. We evaluated it on automotive benchmarks defined by research community, and also on realistic data obtained from magnetic sensor used in modern cars.
Algorithmic games for full ground references
We present a full classification of decidable and undecidable cases for contextual equivalence in a finitary ML-like language equipped with full ground storage (both integers and reference names can be stored). The simplest undecidable type is . At the technical level, our results marry game semantics with automata-theoretic techniques developed to handle infinite alphabets. On the automata-theoretic front, we show decidability of the emptiness problem for register pushdown automata extended with fresh-symbol generation.
Monitoring hyperproperties
Hyperproperties, such as non-interference and observational determinism, relate multiple system executions to each other. They are not expressible in standard temporal logics, like LTL, CTL, and CTL*, and thus cannot be monitored with standard runtime verification techniques. extends linear-time temporal logic (LTL) with explicit quantification over traces in order to express hyperproperties. We investigate the runtime verification problem of formulas for three different input models: (1) The model, where a number of system executions is processed in parallel. (2) The model, where system executions are processed sequentially, one execution at a time. In this model, the number of incoming executions is a-priori unbounded and may in fact grow forever. (3) The where the traces are processed sequentially and the number of incoming executions is . We show that the existence of a bound in the parallel and bounded sequential models leads to a different notion of monitorability than in the unbounded sequential model. We show that deciding the monitoriability problem for alternation-free HyperLTL is -complete while the problem is undecidable in general. For every input model, we provide monitoring algorithms along with run-time and storage optimizations. By recognizing properties of specifications such as reflexivity, symmetry, and transitivity, we reduce the number of comparisons between traces. For the sequential models, we present a technique that minimizes the number of traces that need to be stored. We evaluate our optimizations, showing that this leads to a more scalable monitoring and, in particular, a significantly lower memory consumption.
From LTL to rLTL monitoring: improved monitorability through robust semantics
Runtime monitoring is commonly used to detect the violation of desired properties in safety critical cyber-physical systems by observing its executions. Bauer et al. introduced an influential framework for monitoring Linear Temporal Logic (LTL) properties based on a three-valued semantics for a finite execution: the formula is already satisfied by the given execution, it is already violated, or it is still undetermined, i.e., it can still be satisfied and violated by appropriate extensions of the given execution. However, a wide range of formulas are not monitorable under this approach, meaning that there are executions for which satisfaction and violation will always remain undetermined no matter how it is extended. In particular, Bauer et al. report that 44% of the formulas they consider in their experiments fall into this category. Recently, a robust semantics for LTL was introduced to capture different degrees by which a property can be violated. In this paper we introduce a robust semantics for finite strings and show its potential in monitoring: every formula considered by Bauer et al. is monitorable under our approach. Furthermore, we discuss which properties that come naturally in LTL monitoring-such as the realizability of all truth values-can be transferred to the robust setting. We show that LTL formulas with robust semantics can be monitored by deterministic automata, and provide tight bounds on the size of the constructed automaton. Lastly, we report on a prototype implementation and compare it to the LTL monitor of Bauer et al. on a sample of examples.
Under-approximating loops in C programs for fast counterexample detection
Many software model checkers only detect counterexamples with deep loops after exploring numerous spurious and increasingly longer counterexamples. We propose a technique that aims at eliminating this weakness by constructing auxiliary paths that represent the effect of a range of loop iterations. Unlike acceleration, which captures the exact effect of arbitrarily many loop iterations, these auxiliary paths may under-approximate the behaviour of the loops. In return, the approximation is sound with respect to the bit-vector semantics of programs. Our approach supports arbitrary conditions and assignments to arrays in the loop body, but may as a result introduce quantified conditionals. To reduce the resulting performance penalty, we present two quantifier elimination techniques specially geared towards our application. Loop under-approximation can be combined with a broad range of verification techniques. We paired our techniques with lazy abstraction and bounded model checking, and evaluated the resulting tool on a number of buffer overflow benchmarks, demonstrating its ability to efficiently detect deep counterexamples in C programs that manipulate arrays.
Towards efficient verification of population protocols
Population protocols are a well established model of computation by anonymous, identical finite-state agents. A protocol is well-specified if from every initial configuration, all fair executions of the protocol reach a common consensus. The central verification question for population protocols is the : deciding if a given protocol is well-specified. Esparza et al. have recently shown that this problem is decidable, but with very high complexity: it is at least as hard as the Petri net reachability problem, which is TOWER-hard, and for which only algorithms of non-primitive recursive complexity are currently known. In this paper we introduce the class of well-specified strongly-silent protocols and we prove that it is suitable for automatic verification. More precisely, we show that has the same computational power as general well-specified protocols, and captures standard protocols from the literature. Moreover, we show that the membership and correctness problems for reduce to solving boolean combinations of linear constraints over . This allowed us to develop the first software able to automatically prove correctness for of the infinitely many possible inputs.
Formal methods: practical applications and foundations: Editorial
SMT-based verification of program changes through summary repair
This article provides an innovative approach for verification by model checking of programs that undergo continuous changes. To tackle the problem of repeating the entire model checking for each new version of the program, our approach verifies programs incrementally. It reuses computational history of the previous program version, namely function summaries. In particular, the summaries are over-approximations of the bounded program behaviors. Whenever reusing of summaries is not possible straight away, our algorithm repairs the summaries to maximize the chance of reusability of them for subsequent runs. We base our approach on satisfiability modulo theories (SMT) to take full advantage of lightweight modeling approach and at the same time the ability to provide concise function summarization. Our approach leverages pre-computed function summaries in SMT to localize the checks of changed functions. Furthermore, to exploit the trade-off between precision and performance, our approach relies on the use of an SMT solver, not only for underlying reasoning, but also for program modeling and the adjustment of its precision. On the benchmark suite of primarily Linux device drivers versions, we demonstrate that our algorithm achieves an order of magnitude speedup compared to prior approaches.
Mining definitions in Kissat with Kittens
Bounded variable elimination is one of the most important preprocessing techniques in SAT solving. It benefits from discovering functional dependencies in the form of definitions encoded in the CNF. While the common approach pioneered in SatELite relies on syntactic pattern matching, our new approach uses cores produced by an embedded SAT solver, Kitten. In contrast to a similar semantic technique implemented in Lingeling based on BDD algorithms to generate irredundant CNFs, our new approach is able to generate DRAT proofs. We further discuss design choices for our embedded SAT solver Kitten. Experiments with Kissat show the effectiveness of this approach.
The probabilistic termination tool amber
We describe the Amber tool for proving and refuting the termination of a class of probabilistic while-programs with polynomial arithmetic, in a fully automated manner. Amber combines martingale theory with properties of asymptotic bounding functions and implements relaxed versions of existing probabilistic termination proof rules to prove/disprove (positive) almost sure termination of probabilistic loops. Amber supports programs parametrized by symbolic constants and drawing from common probability distributions. Our experimental comparisons give practical evidence of Amber outperforming existing state-of-the-art tools.
Shield synthesis
is an approach to enforce safety properties at runtime. A shield monitors the system and corrects any erroneous output values instantaneously. The shield deviates from the given outputs as little as it can and recovers to hand back control to the system as soon as possible. In the first part of this paper, we consider shield synthesis for reactive hardware systems. First, we define a general framework for solving the shield synthesis problem. Second, we discuss two concrete shield synthesis methods that automatically construct shields from a set of properties: (1) shields, which guarantee recovery in a finite time. (2) shields, which attempt to work with the system to recover as soon as possible. Next, we discuss an extension of -stabilizing and admissible shields, where erroneous output values of the reactive system are corrected while liveness properties of the system are preserved. Finally, we give experimental results for both synthesis methods. In the second part of the paper, we consider shielding a human operator instead of shielding a reactive system: the outputs to be corrected are not initiated by a system but by a human operator who works with an autonomous system. The challenge here lies in giving simple and intuitive explanations to the human for any interferences of the shield. We present results involving mission planning for unmanned aerial vehicles.
Incremental column-wise verification of arithmetic circuits using computer algebra
Verifying arithmetic circuits and most prominently multiplier circuits is an important problem which in practice still requires substantial manual effort. The currently most effective approach uses polynomial reasoning over pseudo boolean polynomials. In this approach a word-level specification is reduced by a Gröbner basis which is implied by the gate-level representation of the circuit. This reduction returns zero if and only if the circuit is correct. We give a rigorous formalization of this approach including soundness and completeness arguments. Furthermore we present a novel incremental column-wise technique to verify gate-level multipliers. This approach is further improved by extracting full- and half-adder constraints in the circuit which allows to rewrite and reduce the Gröbner basis. We also present a new technical theorem which allows to rewrite local parts of the Gröbner basis. Optimizing the Gröbner basis reduces computation time substantially. In addition we extend these algebraic techniques to verify the equivalence of bit-level multipliers without using a word-level specification. Our experiments show that regular multipliers can be verified efficiently by using off-the-shelf computer algebra tools, while more complex and optimized multipliers require more sophisticated techniques. We discuss in detail our complete verification approach including all optimizations.
Para : parameterized path reduction, acceleration, and SMT for reachability in threshold-guarded distributed algorithms
Automatic verification of threshold-based fault-tolerant distributed algorithms (FTDA) is challenging: FTDAs have multiple parameters that are restricted by arithmetic conditions, the number of processes and faults is parameterized, and the algorithm code is parameterized due to conditions counting the number of received messages. Recently, we introduced a technique that first applies data and counter abstraction and then runs bounded model checking (BMC). Given an FTDA, our technique computes an upper bound on the diameter of the system. This makes BMC complete for reachability properties: it always finds a counterexample, if there is an actual error. To verify state-of-the-art FTDAs, further improvement is needed. In contrast to encoding bounded executions of a counter system over an abstract finite domain in SAT, in this paper, we encode bounded executions over integer counters in SMT. In addition, we introduce a new form of reduction that exploits acceleration and the structure of the FTDAs. This aggressively prunes the execution space to be explored by the solver. In this way, we verified safety of seven FTDAs that were out of reach before.
Synthesizing adaptive test strategies from temporal logic specifications
Constructing good test cases is difficult and time-consuming, especially if the system under test is still under development and its exact behavior is not yet fixed. We propose a new approach to compute test strategies for reactive systems from a given temporal logic specification using formal methods. The computed strategies are guaranteed to reveal certain simple faults in realization of the specification and for behavior of the uncontrollable part of the system's environment. The proposed approach supports different assumptions on occurrences of faults (ranging from a single transient fault to a persistent fault) and by default aims at unveiling the weakest one. We argue that such tests are also sensitive for more complex bugs. Since the specification may not define the system behavior completely, we use reactive synthesis algorithms with partial information. The computed strategies are that react to behavior at runtime. We work out the underlying theory of adaptive test strategy synthesis and present experiments for a safety-critical component of a real-world satellite system. We demonstrate that our approach can be applied to industrial specifications and that the synthesized test strategies are capable of detecting bugs that are hard to detect with random testing.
Runtime verification of embedded real-time systems
We present a runtime verification framework that allows on-line monitoring of past-time Metric Temporal Logic (ptMTL) specifications in a discrete time setting. We design observer algorithms for the time-bounded modalities of ptMTL, which take advantage of the highly parallel nature of hardware designs. The algorithms can be translated into efficient hardware blocks, which are designed for reconfigurability, thus, facilitate applications of the framework in both a prototyping and a post-deployment phase of embedded real-time systems. We provide formal correctness proofs for all presented observer algorithms and analyze their time and space complexity. For example, for the most general operator considered, the time-bounded Since operator, we obtain a time complexity that is doubly logarithmic both in the point in time the operator is executed and the operator's time bounds. This result is promising with respect to a self-contained, non-interfering monitoring approach that evaluates real-time specifications in parallel to the system-under-test. We implement our framework on a Field Programmable Gate Array platform and use extensive simulation and logic synthesis runs to assess the benefits of the approach in terms of resource usage and operating frequency.
Isla: integrating full-scale ISA semantics and axiomatic concurrency models (extended version)
Architecture specifications such as Armv8-A and RISC-V are the ultimate foundation for software verification and the correctness criteria for hardware verification. They should define the allowed sequential and relaxed-memory concurrency behaviour of programs, but hitherto there has been no integration of full-scale instruction-set architecture (ISA) semantics with axiomatic concurrency models, either in mathematics or in tools. These ISA semantics can be surprisingly large and intricate, e.g. 100k lines for Armv8-A. In this paper we present a tool, Isla, for computing the allowed behaviours of concurrent litmus tests with respect to full-scale ISA definitions, in the Sail language, and arbitrary axiomatic relaxed-memory concurrency models, in the Cat language. It is based on a generic symbolic engine for Sail ISA specifications. We equip the tool with a web interface to make it widely accessible, and illustrate and evaluate it for Armv8-A and RISC-V. The symbolic execution engine is valuable also for other verification tasks: it has been used in automated ISA test generation for the Arm Morello prototype architecture, extending Armv8-A with CHERI capabilities, and for Iris program-logic reasoning about binary code above the Armv8-A and RISC-V ISA specifications. By using full-scale and authoritative ISA semantics, Isla lets one evaluate litmus tests using arbitrary user instructions with high confidence. Moreover, because these ISA specifications give detailed and validated definitions of the sequential aspects of functionality, as used by hypervisors and operating systems, e.g. instruction fetch, exceptions, and address translation, our tool provides a basis for developing concurrency semantics for these. We demonstrate this for the Armv8-A instruction-fetch and virtual-memory models and examples of Simner et al.