Klee Paper Review

Last modified: 2025-03-05 Category: Systems ..

KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs

source code

What is the Problem?

Testing large and complex programs is difficult. Writing manual tests is extremely time consuming and often still misses many edge cases. Traditional automated testing frameworks see limited success in reaching high coverage, mainly due in part to the unsemantic nature of randomization. While other symbolic execution tools exist, many are limited in their ability to handle environmental dependencies, such as file I/O, network I/O, and system calls, which contribute significantly to the state space of system programs.

Summary

Klee is a symbolic execution tool that automatically explores the state space of a program by maintaining a set of constraints on symbolic inputs, and any downstream operations that are performed on those inputs. Klee transparently optimizes these constraints, while also internally managing the programs "execution" state so as to be able to "fork" at branches in the program without rerunning all previous operations. The authors used Klee to test a variety of programs, both by generating test cases with coverage reports, as a bug finding tool, and as a verification tool.

Key Insights

Semantic understanding of constraints not only leads to much more coverage overall as compared to random testing, but also allows for more efficient exploration of the state space, involving far fewer test cases to reach the same coverage.
Using symbolic execution, you can determine the functional equivalence of two programs by comparing the constraints generated by each program on the same symbolic input. Furthermore, automatic bug finding can be done by checking for constraints that are unsatisfiable, which indicate a bug in the program.

Notable Design Details/Strengths

Implementing the execution state within the symbolic execution engine allows for efficient forking and backtracking, while also allowing for object-level copy-on-write semantics that vastly reduce overhead.
You are able to symbolically execute environmental dependencies by providing a model of the environment to Klee, which allows for testing of programs symbolically, similar to how you'd use mocks in unit testing.

Limitations/Weaknesses

State space explosion remains a problem, and even while Klee optimizes this, it still ends up being one of the limiting factors in practice.
Limited support for some features, e.g. floating point, dynamic memory allocation
Compiling to LLVM Bitcode is hard (use wllvm for larger systems)

Summary of Key Results

GNU Core Utils: 84.5% overall line coverage vs 67.7% for developers' test suite
On Busybox: 90.5% overall line coverage vs 44.8% for developers' test suite
Found 56 serious bugs across all tested applications
100% line coverage on 16 Core Utils tools and 31 Busybox tools
Successfully cross-checked equivalent Busybox and Core Utils utilities, found inconsistencies (some of which were bugs)

Open Questions

Is it possible to use symbolic execution on non-deterministic programs/algorithms? Would the theoretical bounds on randomized algorithms be semantically understood by Klee?
Is symbolic execution possible at a lower level, e.g. on the instruction set level, or would the semantics not be interpretable at that level? If so, is it possible to build a symbolic execution engine that is language agnostic?
Can similar constraint techniques be used to simplify program logic by detecting redundant constraints?

Followup Paper

"ESD" synthesis: https://dl.acm.org/doi/pdf/10.1145/1755913.1755946

Tags: correctness klee program analysis software engineering symbolic execution systems testing verification