Writing Tests
Tests for Shadow generally fall into four categories:
- system call tests
- regression tests
- application tests
- unit tests
Some of these tests may be marked as "extra" tests, which means they are not run by default.
System call tests
Shadow executes real unmodified applications and co-opts them by intercepting and interposing at the system call API. This means that Shadow must try to emulate Linux system calls. Shadow doesn't always need to emulate every system call exactly as Linux does, but it's usually good to try to emulate Linux as closely as possible. When Shadow deviates from Linux behaviour, Shadow is less likely to accurately represent real-world behaviour in its simulation.
When writing new system call handlers or modifying the behaviour of existing ones, it's important to write tests that verify the correctness of the new behaviour. These system call tests are required in pull requests that add to or modify the behaviour of Shadow's system calls. Usually this means that tests are written which execute the system call with a variety of arguments, and we verify that the system call returns the same values in both Linux and Shadow.
These tests fall into two categories: domain-specific system call tests and fuzzing tests. The domain-specific tests should test the system call under a variety of typical use cases, as well as some edge cases (for example passing NULL pointers, negative lengths, etc). The fuzz tests should test many various combinations of the possible argument values. These two types of tests are discussed further below.
Our existing tests are not always consistent in how the tests are organized or designed, so you don't need to follow the exact same design as other tests in the Shadow repository. If you're adding new tests to an existing file, you should try to write the tests in a similar style to the existing tests.
These tests typically use the libc library to test the system calls;
for example libc::listen(fd, 10)
. For the most part the tests assume that the
libc system call wrappers are the same as the kernel system calls themselves,
but this is not always the case. Sometimes they differ and you might want to
make the system call directly (for example the glibc fork()
system call
wrapper usually makes a clone
system call, not a fork
system call), or
there might not be a libc wrapper for the system call that you wish to test
(for example set_tid_address
). In this case you probably want to use the
linux-api library which makes the system call directly without
using a third-party library like glibc. The linux-api library only implements a
handful of system calls, and we've been adding more as we need them. You may
need to add support for the system call you wish to test to linux-api.
These tests are run emulated within Shadow and natively outside of Shadow. This
is done using the CMake add_linux_tests
and add_shadow_tests
macros. The
tests are built by Cargo and then run by CMake. For example the listen
tests
use:
add_linux_tests(BASENAME listen COMMAND sh -c "../../../target/debug/test_listen --libc-passing")
add_shadow_tests(BASENAME listen)
which results in the CMake tests:
1/2 Test #110: listen-shadow .................... Passed 0.56 sec
2/2 Test #109: listen-linux ..................... Passed 10.12 sec
Domain-specific system call tests
Here is an example of an existing test for the
listen
system call:
#![allow(unused)] fn main() { /// Test listen using a backlog of 0. fn test_zero_backlog( domain: libc::c_int, sock_type: libc::c_int, flag: libc::c_int, bind: Option<SockAddr>, ) -> Result<(), String> { let fd = unsafe { libc::socket(domain, sock_type | flag, 0) }; assert!(fd >= 0); if let Some(address) = bind { bind_fd(fd, address); } let args = ListenArguments { fd, backlog: 0 }; let expected_errno = match (domain, sock_type, bind) { (libc::AF_INET, libc::SOCK_STREAM, _) => None, (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, Some(_)) => None, (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, None) => Some(libc::EINVAL), (_, libc::SOCK_DGRAM, _) => Some(libc::EOPNOTSUPP), _ => unimplemented!(), }; test_utils::run_and_close_fds(&[fd], || check_listen_call(&args, expected_errno)) } }
There are many listen
tests including the one above, such as
test_zero_backlog
, test_negative_backlog
, test_large_backlog
,
test_listen_twice
, test_reduced_backlog
, and more.
Fuzz tests
"Fuzz"-style testing is another way we test syscalls: they use some support code to test many various combinations of the possible argument values expected by a syscall, and verify that the return value for each combination of arguments is the same as what Linux returns. Because the developer usually writes these tests to cover most or all possible argument combinations, it ensures that Shadow's emulation of the syscall is highly accurate.
Fuzz tests can be a bit trickier to write, especially for more complicated
syscalls, and sometimes they don't make sense (e.g., when testing what happens
when trying to connect()
to a TCP server with a full accept queue). They
often help us find inconsistent behavior between Shadow and Linux and help us
make Shadow more accurate, so we prefer that fuzz tests are included with pull
requests when possible.
There are some good examples of writing fuzz tests in our time-related test
code in src/test/time
. For example, the
clock_nanosleep
test demonstrates how to test the syscall
with all combinations of its arguments with both valid and invalid values.
Unit tests
Shadow supports unit tests for rust code. These can be written as standard rust unit tests. These tests run natively and not under Shadow, but they are also run under Miri and Loom as "extra" tests.
For example see the IntervalMap
tests.
#![allow(unused)] fn main() { #[cfg(test)] mod tests { use super::*; // ... #[test] fn test_insert_into_empty() { let mut m = IntervalMap::new(); insert_and_validate(&mut m, 10..20, "x", &[], &[(10..20, "x")]); } // ... } }
1/1 Test #1: rust-unit-tests .................. Passed 149.52 sec
Regression tests
Sometimes it's useful to write a regression test that doesn't belong under any
specific system call tests. These tests can be written like the system call
tests above, but are stored in the src/test/regression/
directory.
Application tests
It's often useful to test that applications behave correctly in Shadow. These tests do not replace the need for the system call tests above, but can complement them. For example we have tor and tgen tests. These help prevent regressions where we accidentally break Tor behaviour.
We also run our examples as tests. These examples include those in our documentation (for example see the "getting started" example) as well as other application examples.
Extra tests
Any of the tests above may be configured as an "extra" test. These tests are
not run by default and require that Shadow is built and tested using the
--extra
flag.
./setup build --test --extra
./setup test --extra
These are usually tests that require extra dependencies, tests which take a long time to build or run, or tests which might be difficult to maintain. These tests may be removed at any time if they become difficult to maintain or they update to require features that Shadow doesn't or can't support. An example could be if an application is using epoll, but then updates to use io_uring which Shadow doesn't support (and would take a lot of effort to support), we would need to remove the test.
Extra tests currently run in the CI environment but only under a single platform, so they're not as well tested as non-"extra" tests.