The Shadow Simulator
What is Shadow?
Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.
Shadow experiments can be scientifically controlled and deterministically replicated, making it easier for you to reproduce bugs and eliminate confounding factors in your experiments.
How Does Shadow Work?
Shadow directly executes real applications:
- Shadow directly executes unmodified, real application code using native OS (Linux) processes.
- Shadow co-opts the native processes into a discrete-event simulation by interposing at the system call API.
- The necessary system calls are emulated such that the applications need not be aware that they are running in a Shadow simulation.
Shadow connects the applications in a simulated network:
- Shadow constructs a private, virtual network through which the managed processes can communicate.
- Shadow internally implements simulated versions of common network protocols (e.g., TCP and UDP).
- Shadow internally models network routing characteristics (e.g., path latency and packet loss) using a configurable network graph.
Why is Shadow Needed?
Network emulators (e.g., mininet) run real application code on top of real OS kernels in real time, but are non-determinsitic and have limited scalability: time distortion can occur if emulated processes exceed an unknown computational threshold, leading to undefined behavior.
Network simulators (e.g., ns-3) offer more experimental control and scalability, but have limited application-layer realism because they run application abstractions in place of real application code.
Shadow offers a novel, hybrid emulation/simulation architecture: it directly executes real applications as native OS processes in order to faithfully reproduce application-layer behavior while also co-opting the processes into a high-performance network simulation that can scale to large distributed systems with hundreds of thousands of processes.
Caveats
Shadow implements over 150 functions from the system call API, but does not yet fully support all API features. Although applications that make basic use of the supported system calls should work out of the box, those that use more complex features or functions may not yet function correctly when running in Shadow. Extending support for the API is a work-in-progress.
That being said, we are particularly motivated to run large-scale Tor Network simulations. This use-case is already fairly well-supported and we are eager to continue extending support for it.
More Information
Homepage:
Documentation:
Community Support:
Bug Reports:
Shadow Design Overview
Shadow is a multi-threaded network experimentation tool that is designed as a hybrid between simulation and emulation architectures: it directly executes applications as Linux processes, but runs them in the context of a discrete-event network simulation.
Shadow's version 2 design is summarized in the following sections. Please see the end of this document for references to published design articles with more details.
Executing Applications
Shadow directly executes real, unmodified application binaries natively in Linux
as standard OS processes (using vfork()
and execvpe()
): we call these
processes executed by Shadow managed processes. When executing each managed
process, Shadow dynamically injects a shim library using preloading (via the
LD_PRELOAD
environment variable) and establishes an inter-process control
channel using shared memory and semaphores. The control channel enables Shadow
to exchange messages with the shim and to instruct the shim to perform actions
in the managed process space.
Intercepting System Calls
The shim co-opts each running managed process into the simulation environment by intercepting all system calls they make rather than allowing them to be handled by the Linux kernel. System call interception happens through two methods: first via preloading and second via a seccomp filter.
-
Preloading: Because the shim is preloaded, the shim will be the first library that is searched when attempting to dynamically resolve symbols. We use the shim to override functions in other shared libraries (e.g., system call wrapper functions from libc) by supplying identically named functions with alternative implementations inside the shim. Note that preloading works on dynamically linked function calls (e.g., to libc system call wrappers), but not on statically linked function calls (e.g. those made from inside of libc) or system calls made using a
syscall
instruction. -
seccomp: System calls that are not interceptable via preloading are intercepted using the kernel's seccomp facility. The shim of each managed process installs a seccomp filter that traps all system calls (except those made from the shim) and a handler function to handle the trapped system calls. This facility has a very small overhead because it involves running the installed filter in kernel mode, but we infrequently incur this overhead in practice since most system calls are interceptable via the more efficient preloading method.
Emulating System Calls
System calls that are intercepted by the shim (using either preloading or seccomp) are emulated by Shadow. Hot-path system calls (e.g., time-related system calls) are handled directly in the shim by using state that is stored in shared memory. Other system calls are sent from the shim to Shadow via the control channel and handled in Shadow (the shim sends the system call number and argument registers). While the shim is waiting for a system call to be serviced by Shadow, the managed process is blocked; this allows Shadow to precisely control the running state of each process.
Shadow emulates system calls using its simulated kernel. The simulated kernel (re)implements (i.e., simulates) important system functionality, including: the passage of time; input and output operations on file, socket, pipe, timer, and event descriptors; signals; packet transmissions with respect to transport layer protocols such as TCP and UDP; and aspects of computer networking including routing, queuing, and bandwidth limits. Thus, Shadow establishes a private, simulated network environment that is completely isolated from the real network, but is internally interoperable and entirely controllable.
Care is taken to ensure that all random bytes that are needed during the
simulation are initiated from a seeded pseudorandom source, including during the
emulation of system calls such as getrandom()
and when emulating reads from
files like /dev/*random
. This enables Shadow to produce deterministic
simulations, i.e., running a simulation twice using the same inputs and the same
seed should produce the same sequence of operations in the managed process.
Managing Memory
Some system calls pass dynamically allocated memory addresses (e.g., the buffer
address in the sendto()
system call). To handle this system call in Shadow,
this shim sends the buffer address but not the buffer contents to Shadow. Shadow
uses an inter-process memory access manager to directly and efficiently read and
write the memory of each managed process without extraneous data copies or
control messages. Briefly, the memory manager (re)maps the memory of each
managed process into a shared memory file that is accessible by both Shadow and
the managed process. When Shadow needs to copy data from a memory address passed
to it by the shim, the memory manager translates the managed process's memory
address to a shared memory address and brokers requested data copies. This
approach minimizes the number of data copies and system calls needed to transfer
the buffer contents from the managed process to Shadow.
Scheduling
Shadow is designed to be high performance: it uses a thread for every virtual host configured in an experiment while only allowing a number of threads equal to the number of available CPU cores to run in parallel to avoid performance degradation caused by CPU oversubscription. Work stealing is used to ensure that each core is always running a worker thread as long as remaining work exists. Shadow also effectively uses CPU pinning to reduce the frequency of cache misses, CPU migrations, and context switches.
Research
Shadow's design is based on the following published research articles. Please cite our work when using Shadow in your projects.
Shadow version 2 (latest)
This is the latest v2 design described above:
Co-opting Linux Processes for High-Performance Network Simulation
by Rob Jansen, Jim Newsome, and Ryan Wails
in the 2022 USENIX Annual Technical Conference, 2022.
@inproceedings{netsim-atc2022,
author = {Rob Jansen and Jim Newsome and Ryan Wails},
title = {Co-opting Linux Processes for High-Performance Network Simulation},
booktitle = {USENIX Annual Technical Conference},
year = {2022},
note = {See also \url{https://netsim-atc2022.github.io}},
}
Shadow version 1 (original)
This is the original v1 design, using plugins loaded into the Shadow process rather than independent processes:
Shadow: Running Tor in a Box for Accurate and Efficient Experimentation
by Rob Jansen and Nicholas Hopper
in the Symposium on Network and Distributed System Security, 2012.
@inproceedings{shadow-ndss12,
title = {Shadow: Running Tor in a Box for Accurate and Efficient Experimentation},
author = {Rob Jansen and Nicholas Hopper},
booktitle = {Symposium on Network and Distributed System Security},
year = {2012},
note = {See also \url{https://shadow.github.io}},
}
Supported Platforms
Officially supported platforms
We support the following Linux x86-64 distributions:
- Ubuntu 20.04, 22.04, 24.04
- Debian 10, 11, and 12
- Fedora 40
We do not provide official support for other platforms. This means that we do not ensure that Shadow successfully builds and passes tests on other platforms. However, we will review pull requests that allow Shadow to build and run on unsupported platforms.
Our policy regarding supported platforms can be found in our "stability guarantees".
Supported Linux kernel versions
Some Linux distributions support multiple kernel versions, for example an older General Availability (GA) kernel and newer hardware-enablement (HWE) kernels. We try to allow Shadow to run on the oldest kernel supported on each distribution (the GA kernel). However:
- On Debian 10 (buster) We do not support the GA kernel. We do support the HWE kernel (e.g. installed via backports).
- We are currently only able to regularly test on the latest Ubuntu kernel, since that's what GitHub Actions provides.
By these criteria, Shadow's oldest supported kernel version is currently 5.4 (the GA kernel in Ubuntu 20.04.0).
Docker
If you are installing Shadow within a Docker container, you must increase the
size of the container's /dev/shm
mount and disable the seccomp security
profile. You can do this by passing additional flags to docker run
.
Example:
docker run -it --shm-size=1024g --security-opt seccomp=unconfined ubuntu:24.04
If you are having difficulty installing Shadow on any supported platforms, you may find the continuous integration build steps helpful.
Installing Dependencies
Required:
- gcc, gcc-c++
- python (version >= 3.6)
- glib (version >= 2.58.0)
- cmake (version >= 3.13.4)
- make
- pkg-config
- xz-utils
- lscpu
- rustup (version ~ latest)
- libclang (version >= 9)
APT (Debian/Ubuntu):
# required dependencies
sudo apt-get install -y \
cmake \
findutils \
libclang-dev \
libc-dbg \
libglib2.0-0 \
libglib2.0-dev \
make \
netbase \
python3 \
python3-networkx \
xz-utils \
util-linux \
gcc \
g++
# rustup: https://rustup.rs
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
On older versions of Debian or Ubuntu, the default version of libclang is too
old, which may cause bindgen to have errors finding system header files,
particularly when compiling with gcc. In this case you will need to explicitly
install a newer-than-default version of libclang. e.g. on debian-10
install
libclang-13-dev
.
DNF (Fedora):
Warning: dnf
often installs 32-bit (i686
) versions of
libraries. You may want to use the --best
option to make sure you're
installing the 64-bit (x86_64
) versions, which are required by Shadow.
# required dependencies
sudo dnf install -y \
cmake \
findutils \
clang-devel \
glib2 \
glib2-devel \
make \
python3 \
python3-networkx \
xz \
xz-devel \
yum-utils \
diffutils \
util-linux \
gcc \
gcc-c++
# rustup: https://rustup.rs
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Shadow Setup
After building and testing Shadow, the install step is optional. If you do not
wish to install Shadow, you can run it directly from the build directory
(./build/src/main/shadow
). Shadow only supports building from directories
that do not have whitespace characters.
git clone https://github.com/shadow/shadow.git
cd shadow
./setup build --clean --test
./setup test
# Optionally install (to ~/.local/bin by default). Can otherwise run the binary
# directly at build/src/main/shadow.
./setup install
For the remainder of this documentation, we assume the Shadow binary is in your
PATH
. The default installed location of /home/${USER}/.local/bin
is
probably already in your PATH
. If it isn't, you can add it by running:
echo 'export PATH="${PATH}:/home/${USER}/.local/bin"' >> ~/.bashrc && source ~/.bashrc
The path that Shadow is installed to must not contain any space characters as
they are not supported by the dynamic linker's LD_PRELOAD
mechanism.
Check that Shadow is installed and runs:
shadow --version
shadow --help
Uninstall Shadow
After running ./setup install
, you can find the list of installed files in
./build/install_manifest.txt
. To uninstall Shadow, remove any files listed.
Setup Notes
-
All build output is generated to the
./build
directory. -
Use
./setup build --help
to see all build options; some useful build options are:-g
or--debug
to build Shadow with debugging symbols and additional runtime checks. This option will significantly reduce the simulator performance.--search
if you installed dependencies to non-standard locations. Used when searching for libraries, headers, and pkg-config files. Appropriate suffixes like/lib
and/include
of the provided path are also searched when looking for files of the corresponding type.--prefix
if you want to install Shadow somewhere besides~/.local
.
-
The
setup
script is a wrapper tocmake
andmake
. Usingcmake
andmake
directly is also possible, but unsupported. For example:# alternative installation method rm -r build && mkdir build && cd build cmake -DCMAKE_INSTALL_PREFIX="~/.local" -DSHADOW_TEST=ON .. make ctest make install
System Configs and Limits
Some Linux system configuration changes are needed to run large-scale Shadow simulations (more than about 1000 processes). If you're just trying Shadow or running small simulations, you can skip these steps.
Number of Open Files
There is a default Linux system limit on the total number of open files. Since
Shadow opens files from within its own process space and not from within the
managed processes, both the system limit and the per-process limit must be
greater than the combined total number of files opened by all managed
processes. If each managed process in your simulation opens many files, you'll
likely want to increase the limit so that your application doesn't receive
EMFILE
errors when calling open()
.
System-wide Limits
Check the system-wide limits with:
sysctl fs.nr_open # per-process open file limit
sysctl fs.file-max # system-wide open file limit
Use cat /proc/sys/fs/file-nr
to find:
- the current, system-wide number of used file handles
- the current, system-wide number of free file handles
- and the system-wide limit on the maximum number of open files for all processes
Change the limits, persistent across reboots, and apply now:
sudo sysctl -w fs.nr_open=10485760
echo "fs.nr_open = 10485760" | sudo tee -a /etc/sysctl.conf
sudo sysctl -w fs.file-max=10485760
echo "fs.file-max = 10485760" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
User Limits
Check the maximum number of open file descriptors currently allowed in your session:
ulimit -n
Check the number of files currently used in a process with pid=PID:
/bin/ls -l /proc/PID/fd/ | wc -l
You will want to almost certainly want to raise the user file limit by modifying
/etc/security/limits.conf
. For example:
rjansen soft nofile 10485760
rjansen hard nofile 10485760
The max you can use is your fs.nr_open
system-wide limit setting from above.
You need to either log out and back in or reboot for the changes to take affect.
You can watch /proc/sys/fs/file-nr
and reduce the limit according to your
usage, if you'd like.
systemd Limits
systemd may place a limit on the number of tasks that a user can run in its slice. You can check to see if a limit is in place by running
$ systemctl status user-$UID.slice
Here's a listing of an example response:
● user-1027.slice - User Slice of <user>
Loaded: loaded
Transient: yes
Drop-In: /run/systemd/system/user-1027.slice.d
└─50-After-systemd-logind\x2eservice.conf, 50-After-systemd-user-sessions\x2eservice.conf, 50-Description.conf, 50-TasksMax.conf
Active: active since Wed 2020-05-06 21:20:08 EDT; 1 years 2 months ago
Tasks: 81 (limit: 12288)
The last line of the listing shows that this user has a task limit of 12288 tasks.
If this task limit is too small, it can be removed with the following command:
$ sudo systemctl set-property user-$UID.slice TasksMax=infinity
Number of Maps
There is a system limit on the number of mmap()
mappings per process. Most
users will not have to modify these settings. However, if an application running
in Shadow makes extensive use of mmap()
, you may need to increase the limit.
Process Limit
The process limit can be queried in these ways:
sysctl vm.max_map_count
cat /proc/sys/vm/max_map_count
You can check the number of maps currently used in a process with pid=PID like this:
wc -l /proc/PID/maps
Set a new limit, make it persistent, apply it now:
sudo sysctl -w vm.max_map_count=1073741824
echo "vm.max_map_count = 1073741824" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Process / Thread Count Limits
System-Wide Limits
The kernel may limit the max-pid value to a small value, which will limit the total number of possible processes running on the machine. This limit can be raised by the command
sudo sysctl -w kernel.pid_max=4194304
echo "kernel.pid_max = 4194304" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
The kernel may also limit the total number of threads running on the machine. This limit can be raised, too.
sudo sysctl -w kernel.threads-max=4194304
echo "kernel.threads-max = 4194304" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
The kernel has a fixed system-wide limit of 4,194,304 processes/threads. When running extremely large simulations, or when running multiple simulations in parallel, you should be aware of this limit and ensure the total number of processes/threads used by all simulations will not exceed this limit.
The kernel may cap the kernel.threads-max
value automatically so that, in the
maximum limit, the memory consumed by kernel thread control structures do not
consume more than approx. (1/8)th of system memory (see
https://stackoverflow.com/a/21926745).
User Limits
You may need to raise the maximum number of user processes allowed in
/etc/security/limits.conf
. For example, user limits can be removed with the
lines:
rjansen soft nproc unlimited
rjansen hard nproc unlimited
For more information
https://www.kernel.org/doc/Documentation/sysctl/fs.txt
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
man proc
man ulimit
cat /proc/sys/fs/file-max
cat /proc/sys/fs/inode-max
Running Shadow
When installing Shadow, the main executable was placed in /bin
in your install
prefix (~/.local/bin
by default). As a reminder, it would be helpful if this
location was included in your environment PATH
.
The main Shadow binary executable, shadow
, contains most of the simulator's
code, including events and the event engine, the network stack, and the routing
logic. Shadow's event engine supports multi-threading using the -p
or
--parallelism
flags (or their corresponding configuration file
option) to simulate multiple hosts
in parallel.
In the following sections we provide some examples to help you get started, but Shadow's configuration format is entirely specified in the "Shadow Config Specification" and "Network Graph Specification" documents. You will find these useful once you begin writing your own simulations.
Basic File Transfer Example
Here we present a basic example that simulates the network traffic of an HTTP server with 3 clients, each running on different virtual hosts. If you do not have Python or cURL installed, you can download them through your distribution's package manager.
Configuring the Simulation
Each client uses cURL to make an HTTP request to a basic Python HTTP server.
Shadow requires a configuration file that specifies information about the network graph and the processes to run within the simulation. This example uses a built-in network graph for simplicity.
shadow.yaml
:
general:
# stop after 10 simulated seconds
stop_time: 10s
# old versions of cURL use a busy loop, so to avoid spinning in this busy
# loop indefinitely, we add a system call latency to advance the simulated
# time when running non-blocking system calls
model_unblocked_syscall_latency: true
network:
graph:
# use a built-in network graph containing
# a single vertex with a bandwidth of 1 Gbit
type: 1_gbit_switch
hosts:
# a host with the hostname 'server'
server:
network_node_id: 0
processes:
- path: python3
args: -m http.server 80
start_time: 3s
# tell shadow to expect this process to still be running at the end of the
# simulation
expected_final_state: running
# three hosts with hostnames 'client1', 'client2', and 'client3' using a yaml
# anchor to avoid duplicating the options for each host
client1: &client_host
network_node_id: 0
processes:
- path: curl
args: -s server
start_time: 5s
client2: *client_host
client3: *client_host
Running the Simulation
Shadow stores simulation data to the shadow.data/
directory by default. We
first remove this directory if it already exists, and then run Shadow.
# delete any existing simulation data
rm -rf shadow.data/
shadow shadow.yaml > shadow.log
This small Shadow simulation should complete almost immediately.
Viewing the Simulation Output
Shadow will write simulation output to the data directory shadow.data/
. Each
host has its own directory under shadow.data/hosts/
. For example:
$ ls -l shadow.data/hosts/
drwxrwxr-x 2 user user 4096 Jun 2 16:54 client1
drwxrwxr-x 2 user user 4096 Jun 2 16:54 client2
drwxrwxr-x 2 user user 4096 Jun 2 16:54 client3
drwxrwxr-x 2 user user 4096 Jun 2 16:54 server
Each host directory contains the output for each process running on that host. For example:
$ ls -l shadow.data/hosts/client1/
-rw-rw-r-- 1 user user 0 Jun 2 16:54 curl.1000.shimlog
-rw-r--r-- 1 user user 0 Jun 2 16:54 curl.1000.stderr
-rw-r--r-- 1 user user 542 Jun 2 16:54 curl.1000.stdout
$ cat shadow.data/hosts/client1/curl.1000.stdout
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
...
Traffic Generation Example
We recommend getting started with the basic file transfer before running this example. It contains some basics about running Shadow simulations that are not covered here.
During Shadow simulations, it is often useful to generate background traffic flows between your simulated hosts. This example uses the TGen traffic generator for this purpose.
TGen is capable of generating basic file transfers, where you can configure how much data is transferred in each direction, how long to wait in between each transfer, and how many transfers to perform. TGen also supports more complex behavior models: you can use Markov models to configure a state machine with precise inter-packet timing characteristics. We only make use of its basic features in this example.
If you don't have it installed, you can follow the instructions here. The following example runs TGen with 10 clients that each download 10 files from a server over a simple network graph.
A Shadow Simulation using TGen
The following examples simulates a network with 1 TGen server and 10 TGen clients that are generating TCP traffic to and from the server.
Configuring Shadow
The shadow.yaml
file instructs Shadow how to model the network that is used to
carry the traffic between the hosts, and about the bandwidth available to each
of the hosts. It also specifies how many processes to run in the simulation, and
the configuration options for those applications.
shadow.yaml
:
general:
stop_time: 10m
# Needed to avoid deadlock in some configurations of tgen.
# See below.
model_unblocked_syscall_latency: true
network:
graph:
# a custom single-node graph
type: gml
inline: |
graph [
node [
id 0
host_bandwidth_down "140 Mbit"
host_bandwidth_up "18 Mbit"
]
edge [
source 0
target 0
latency "50 ms"
packet_loss 0.01
]
]
hosts:
server:
network_node_id: 0
processes:
# Assumes `tgen` is on your shell's `PATH`.
# Otherwise use an absolute path here.
- path: tgen
# The ../../../ prefix assumes that tgen.server.graph.xml in the same
# directory as the data directory (specified with the -d CLI argument).
# See notes below explaining Shadow's directory structure.
args: ../../../tgen.server.graphml.xml
start_time: 1s
# Tell shadow to expect this process to still be running at the end of the
# simulation.
expected_final_state: running
client1: &client_host
network_node_id: 0
processes:
- path: tgen
args: ../../../tgen.client.graphml.xml
start_time: 2s
client2: *client_host
client3: *client_host
client4: *client_host
client5: *client_host
We can see that Shadow will be running 6 processes in total, and that those
processes are configured using graphml.xml
files (the configuration file
format for TGen) as arguments.
Each host directory is also the working
directory for the host's
processes, which is why we specified ../../../tgen.server.graphml.xml
as the
path to the TGen configuration in our Shadow configuration file
(./shadow.data/hosts/server/../../../tgen.server.graphml.xml
→
./tgen.server.graphml.xml
). The host directory structure is stable---it is
guaranteed not to change between minor releases, so the ../../../
prefix may
reliably be used to refer to files in the same directory as the data directory.
model_unblocked_syscall_latency
is used to avoid deadlock in case tgen was
compiled with libopenblas.
Configuring TGen
Each TGen process requires an action-dependency graph in order to configure the behavior of the clients and server. See the TGen documentation for more information about customizing TGen behaviors.
Our TGen Server
The main configuration here is the port number on which the server will listen.
tgen.server.graphml.xml
:
<?xml version="1.0" encoding="utf-8"?><graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key attr.name="serverport" attr.type="string" for="node" id="d1" />
<key attr.name="loglevel" attr.type="string" for="node" id="d0" />
<graph edgedefault="directed">
<node id="start">
<data key="d0">info</data>
<data key="d1">8888</data>
</node>
</graph>
</graphml>
Our TGen Clients
The client config specifies that we connect to the server using its name and
port server:8888
, and that we download and upload 1 MiB
10 times, pausing 1,
2, or 3 seconds between each transfer.
tgen.client.graphml.xml
:
<?xml version="1.0" encoding="utf-8"?><graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key attr.name="recvsize" attr.type="string" for="node" id="d5" />
<key attr.name="sendsize" attr.type="string" for="node" id="d4" />
<key attr.name="count" attr.type="string" for="node" id="d3" />
<key attr.name="time" attr.type="string" for="node" id="d2" />
<key attr.name="peers" attr.type="string" for="node" id="d1" />
<key attr.name="loglevel" attr.type="string" for="node" id="d0" />
<graph edgedefault="directed">
<node id="start">
<data key="d0">info</data>
<data key="d1">server:8888</data>
</node>
<node id="pause">
<data key="d2">1,2,3</data>
</node>
<node id="end">
<data key="d3">10</data>
</node>
<node id="stream">
<data key="d4">1 MiB</data>
<data key="d5">1 MiB</data>
</node>
<edge source="start" target="stream" />
<edge source="pause" target="start" />
<edge source="end" target="pause" />
<edge source="stream" target="end" />
</graph>
</graphml>
Running the Simulation
With the above three files saved in the same directory, you can start a
simulation. Shadow stores simulation data to the shadow.data/
directory by
default. We first remove this directory if it already exists, and then run
Shadow. This example may take a few minutes.
# delete any existing simulation data
rm -rf shadow.data/
shadow shadow.yaml > shadow.log
Simulation Output
Shadow will write simulation output to the data directory shadow.data/
. Each
host has its own directory under shadow.data/hosts/
.
In the TGen process output, lines containing stream-success
represent
completed downloads and contain useful timing statistics. From these lines we
should see that clients have completed a total of 50 streams:
$ for d in shadow.data/hosts/client*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50
We can also look at the transfers from the servers' perspective:
$ for d in shadow.data/hosts/server*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50
You can also parse the TGen output logged to the stdout files using the
tgentools
program from the TGen repo, and plot the data in graphical format to
visualize the performance characteristics of the transfers. This
page describes how
to get started.
Simple Tor Network Example
We recommend getting started with the basic file transfer and traffic generation examples to orient yourself with Shadow before running this slightly more complex Tor simulation.
This example requires that you have installed:
tor
; can typically be installed via your system package manager.tgen
; will most likely need to be built from source.
Configuring Shadow
This simulation again uses tgen
as both client and server. In addition to a
tor
-oblivious client and server, we add a tor
network and a client that uses
tor
to connect to the server.
shadow.yaml
:
general:
stop_time: 30 min
network:
graph:
type: gml
inline: |
graph [
directed 0
node [
id 0
host_bandwidth_down "1 Gbit"
host_bandwidth_up "1 Gbit"
]
edge [
source 0
target 0
latency "50 ms"
jitter "0 ms"
packet_loss 0.0
]
]
hosts:
fileserver:
network_node_id: 0
processes:
- path: tgen
# See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
environment: { OPENBLAS_NUM_THREADS: "1" }
args: ../../../conf/tgen.server.graphml.xml
start_time: 1
expected_final_state: running
4uthority:
network_node_id: 0
ip_addr: 100.0.0.1
processes:
- path: tor
args: --Address 4uthority --Nickname 4uthority
--defaults-torrc torrc-defaults -f torrc
start_time: 1
expected_final_state: running
exit1:
network_node_id: 0
processes:
- path: tor
args: --Address exit1 --Nickname exit1
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
exit2:
network_node_id: 0
processes:
- path: tor
args: --Address exit2 --Nickname exit2
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
relay1:
network_node_id: 0
processes:
- path: tor
args: --Address relay1 --Nickname relay1
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
relay2:
network_node_id: 0
processes:
- path: tor
args: --Address relay2 --Nickname relay2
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
relay3:
network_node_id: 0
processes:
- path: tor
args: --Address relay3 --Nickname relay3
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
relay4:
network_node_id: 0
processes:
- path: tor
args: --Address relay4 --Nickname relay4
--defaults-torrc torrc-defaults -f torrc
start_time: 60
expected_final_state: running
client:
network_node_id: 0
processes:
- path: tgen
# See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
environment: { OPENBLAS_NUM_THREADS: "1" }
args: ../../../conf/tgen.client.graphml.xml
start_time: 600
torclient:
network_node_id: 0
processes:
- path: tor
args: --Address torclient --Nickname torclient
--defaults-torrc torrc-defaults -f torrc
start_time: 900
expected_final_state: running
- path: tgen
# See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
environment: { OPENBLAS_NUM_THREADS: "1" }
args: ../../../conf/tgen.torclient.graphml.xml
start_time: 1500
Running the Simulation
We run this example similarly as before. Here we use an additional command-line
flag --template-directory
to copy a template directory layout containing each
host's tor
configuraton files into its host directory before the simulation
begins.
For brevity we omit the contents of our template directory, and configuration files that are referenced from it, but you can find them at examples/docs/tor/shadow.data.template/
and examples/docs/tor/conf/
.
# delete any existing simulation data
rm -rf shadow.data/
shadow --template-directory shadow.data.template shadow.yaml > shadow.log
Simulation Output
As before, Shadow will write simulation output to the data directory
shadow.data/
. Each host has its own directory under shadow.data/hosts/
.
In the TGen process output, lines containing stream-success
represent
completed downloads and contain useful timing statistics. From these lines we
should see that clients have completed a total of 20 streams:
$ for d in shadow.data/hosts/*client*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50
We can also look at the transfers from the servers' perspective:
$ for d in shadow.data/hosts/fileserver*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50
You can also parse the TGen output logged to the stdout files using the
tgentools
program from the TGen repo, and plot the data in graphical format to
visualize the performance characteristics of the transfers. This
page describes how
to get started.
More Realistic Simulations
You can use the tornettools toolkit to run larger, more complex Tor networks that are meant to more accurately resemble the characteristics and state of the public Tor network.
Determinism
To improve determinism for your simulation, Shadow preloads an auxiliary library, libshadow_openssl_rng, which override's some of openssl's RNG routines. This is enabled by default, but can be controlled using the experimental use_openssl_rng_preload option.
Shadow Configuration Overview
Shadow requires a configuration file that provides a network graph and information about the processes to run during the simulation. This configuration file uses the YAML format. The options and their effect on the simulation are described in more detail (alongside a simple example configuration file) on the configuration options page.
Many of the configuration file options can also be overridden using command-line
options. For example, the configuration option
general.stop_time
can be
overridden with shadow's --stop-time
option, and
general.log_level
can be
overridden with --log-level
. See shadow --help
for other command-line
options.
The configuration file does not perform any shell expansion, other than home
directory ~/
expansion on some specific options.
Quantities with Units
Some options such as
hosts.<hostname>.bandwidth_down
accept quantity values containing a magnitude and a unit. For example bandwidth
values can be expressed as 1 Mbit
, 1000 Kbit
, 977 Kibit
, etc. The space
between the magnitude and unit is optional (for example 5Mbit
), and the unit
can be pluralized (for example 5 Mbits
). Units are case-sensitive.
Time
Time values are expressed as either sub-second units, seconds, minutes, or hours.
Acceptable units are:
- nanosecond / ns
- microsecond / us / μs
- millisecond / ms
- second / sec / s
- minute / min / m
- hour / hr / h
Examples: 30 s
, 2 hr
, 10 minutes
, 100 ms
Bandwidth
Bandwidth values are expressed in bits-per-second with the unit bit
. All
bandwidth values should be divisible by 8 bits-per-second (for example 30 bit
is invalid, but 30 Kbit
is valid).
Acceptable unit prefixes are:
- kilo / K
- kibi / Ki
- mega / M
- mebi / Mi
- giga / G
- gibi / Gi
- tera / T
- tebi / Ti
Examples: 100 Mbit
, 100 Mbits
, 10 kilobits
, 128 bits
Byte Sizes
Byte size values are expressed with the unit byte
or B
.
Acceptable unit prefixes are:
- kilo / K
- kibi / Ki
- mega / M
- mebi / Mi
- giga / G
- gibi / Gi
- tera / T
- tebi / Ti
Examples: 20 B
, 100 MB
, 100 megabyte
, 10 kibibytes
, 30 MiB
, 1024 Mbytes
Unix Signals
Several options allow the user to specify a Unix Signal. These can be specified
either as a string signal name (e.g. SIGKILL
), or an integer signal number (e.g. 9
).
String signal names must be capitalized and include the SIG
prefix.
Realtime signals (signal numbers 32+) are not supported.
YAML Extensions
Shadow supports the extended YAML conventions for merge keys and extension fields).
For examples, see Managing Complex Configurations.
Shadow Configuration Specification
Shadow uses the standard YAML 1.2 format to accept configuration options, with the following extensions:
The following describes Shadow's YAML format and all of the options that Shadow supports that can be used to customize a simulation.
Example:
general:
stop_time: 2 min
network:
graph:
type: gml
inline: |
graph [
node [
id 0
host_bandwidth_down "140 Mbit"
host_bandwidth_up "18 Mbit"
]
edge [
source 0
target 0
latency "50 ms"
packet_loss 0.01
]
]
hosts:
server:
network_node_id: 0
processes:
- path: /usr/sbin/nginx
args: -c ../../../nginx.conf -p .
start_time: 1
expected_final_state: running
client1: &client_host
network_node_id: 0
host_options:
log_level: debug
processes:
- path: /usr/bin/curl
args: server --silent
start_time: 5
client2: *client_host
client3: *client_host
Contents:
general
general.bootstrap_end_time
general.data_directory
general.heartbeat_interval
general.log_level
general.model_unblocked_syscall_latency
general.parallelism
general.progress
general.seed
general.stop_time
general.template_directory
network
network.graph
network.graph.type
network.graph.<file|inline>
network.graph.file.path
network.graph.file.compression
network.use_shortest_path
experimental
experimental.host_heartbeat_interval
experimental.host_heartbeat_log_info
experimental.host_heartbeat_log_level
experimental.interface_qdisc
experimental.max_unapplied_cpu_latency
experimental.report_errors_to_stderr
experimental.runahead
experimental.scheduler
experimental.socket_recv_autotune
experimental.socket_recv_buffer
experimental.socket_send_autotune
experimental.socket_send_buffer
experimental.strace_logging_mode
experimental.unblocked_syscall_latency
experimental.unblocked_vdso_latency
experimental.use_cpu_pinning
experimental.use_dynamic_runahead
experimental.use_memory_manager
experimental.use_new_tcp
experimental.use_object_counters
experimental.use_preload_libc
experimental.use_preload_openssl_crypto
experimental.use_preload_openssl_rng
experimental.use_sched_fifo
experimental.use_syscall_counters
experimental.use_worker_spinning
host_option_defaults
host_option_defaults.log_level
host_option_defaults.pcap_capture_size
host_option_defaults.pcap_enabled
hosts
hosts.<hostname>.bandwidth_down
hosts.<hostname>.bandwidth_up
hosts.<hostname>.ip_addr
hosts.<hostname>.network_node_id
hosts.<hostname>.host_options
hosts.<hostname>.processes
hosts.<hostname>.processes[*].args
hosts.<hostname>.processes[*].environment
hosts.<hostname>.processes[*].expected_final_state
hosts.<hostname>.processes[*].path
hosts.<hostname>.processes[*].shutdown_signal
hosts.<hostname>.processes[*].shutdown_time
hosts.<hostname>.processes[*].start_time
general
Required
General experiment settings.
general.bootstrap_end_time
Default: "0 sec"
Type: String OR Integer
The simulated time that ends Shadow's high network bandwidth/reliability bootstrap period.
If the bootstrap end time is greater than 0, Shadow uses a simulation bootstrapping period where hosts have unrestricted network bandwidth and no packet drop. This can help to bootstrap large networks quickly when the network hosts have low network bandwidth or low network reliability.
general.data_directory
Default: "shadow.data"
Type: String
Path to store simulation output.
general.heartbeat_interval
Default: "1 sec"
Type: String OR Integer OR null
Interval at which to print simulation heartbeat messages.
general.log_level
Default: "info"
Type: "error" OR "warning" OR "info" OR "debug" OR "trace"
Log level of output written on stdout. If Shadow was built in release mode, then messages at level 'trace' will always be dropped.
general.model_unblocked_syscall_latency
Default: false
Type: Bool
Whether to model syscalls and VDSO functions that don't block as having some latency. This should have minimal effect on typical simulations, but can be helpful for programs with "busy loops" that otherwise deadlock under Shadow.
general.parallelism
Default: 0
Type: Integer
How many parallel threads to use to run the simulation. Optimal performance is usually obtained with
the number of physical CPU cores (nproc
without hyperthreading or nproc
/2 with
hyperthreading).
A value of 0 will allow Shadow to choose the number of threads, typically the number of physical CPU cores available in the current CPU affinity mask and cgroup.
Virtual hosts depend on network packets that can potentially arrive from other virtual hosts, so each worker can only advance according to the propagation delay to avoid dependency violations. Therefore, not all threads will have 100% CPU utilization.
general.progress
Default: false
Type: Bool
Show the simulation progress on stderr.
When running in a tty, the progress will be updated every second and shown at the bottom of the terminal. Otherwise the progress will be printed without ANSI escape codes at intervals which increase as the simulation progresses.
general.seed
Default: 1
Type: Integer
Initialize randomness using seed N.
general.stop_time
Required
Type: String OR Integer
The simulated time at which the simulation ends.
general.template_directory
Default: null
Type: String OR null
Path to recursively copy during startup and use as the data-directory.
network
Required
Network settings.
network.graph
Required
The network topology graph.
A network topology represented by a connected graph with certain attributes specified on the network nodes and edges. For more information on how to structure this data, see the Network Graph Overview.
Example:
network:
graph:
type: gml
inline: |
graph [
...
]
network.graph.type
Required
Type: "gml" OR "1_gbit_switch"
The network graph can be specified in the GML format, or a built-in "1_gbit_switch" graph with a single network node can be used instead.
The built-in "1_gbit_switch" graph contains the following:
graph [
directed 0
node [
id 0
host_bandwidth_up "1 Gbit"
host_bandwidth_down "1 Gbit"
]
edge [
source 0
target 0
latency "1 ms"
packet_loss 0.0
]
]
network.graph.<file|inline>
Required if network.graph.type
is "gml"
Type: Object OR String
If the network graph type is not a built-in network graph, the graph data can be specified as a path to an external file, or as an inline string.
network.graph.file.path
Required
Type: String
The path to the file.
If the path begins with ~/
, it will be considered relative to the current
user's home directory. No other shell expansion is performed on the path.
network.graph.file.compression
Default: null
Type: "xz" OR null
The file's compression format.
network.use_shortest_path
Default: true
Type: Bool
When routing packets, follow the shortest path rather than following a direct edge between network nodes. If false, the network graph is required to be complete (including self-loops) and to have exactly one edge between any two nodes.
experimental
Experimental experiment settings. Unstable and may change or be removed at any time, regardless of Shadow version.
experimental.host_heartbeat_interval
Default: "1 sec"
Type: String OR Integer OR null
Amount of time between host heartbeat messages.
experimental.host_heartbeat_log_info
Default: ["node"]
Type: Array of ("node" OR "socket" OR "ram")
List of information to show in the host's heartbeat message.
experimental.host_heartbeat_log_level
Default: "info"
Type: "error" OR "warning" OR "info" OR "debug" OR "trace"
Log level at which to print host heartbeat messages.
experimental.interface_qdisc
Default: "fifo"
Type: "fifo" OR "round-robin"
The queueing discipline to use at the network interface.
experimental.max_unapplied_cpu_latency
Default: "1 microsecond"
Type: String
Max amount of execution-time latency allowed to accumulate before the clock is moved forward. Moving the clock forward is a potentially expensive operation, so larger values reduce simulation overhead, at the cost of coarser time jumps.
Note also that accumulated-but-unapplied latency is discarded when a thread is blocked on a syscall.
Ignored when
general.model_unblocked_syscall_latency
is false.
experimental.report_errors_to_stderr
Default: true
Type: Bool
Report Error
-level log messages to shadow's stderr
in addition to logging
them to stdout
.
experimental.runahead
Default: "1 ms"
Type: String OR null
If set, overrides the automatically calculated minimum time workers may run ahead when sending events between virtual hosts.
experimental.scheduler
Default: "thread-per-core"
Type: "thread-per-core" OR "thread-per-host"
The host scheduler implementation, which decides how to assign hosts to threads and threads to CPU cores.
experimental.socket_recv_autotune
Default: true
Type: Bool
Enable receive window autotuning.
experimental.socket_recv_buffer
Default: "174760 B"
Type: String OR Integer
Initial size of the socket's receive buffer.
experimental.socket_send_autotune
Default: true
Type: Bool
Enable send window autotuning.
experimental.socket_send_buffer
Default: "131072 B"
Type: String OR Integer
Initial size of the socket's send buffer.
experimental.strace_logging_mode
Default: "off"
Type: "off" OR "standard" OR "deterministic"
Log the syscalls for each process to individual "strace" files.
The mode determines the format that the syscalls are logged in. For example, the "deterministic" mode will avoid logging memory addresses or potentially uninitialized memory.
The logs will be stored at
shadow.data/hosts/<hostname>/<procname>.<pid>.strace
.
Limitations:
- Syscalls run natively will not log the syscall arguments or return value (for
example
SYS_getcwd
). - Syscalls processed within Shadow's C code will not log the syscall arguments.
- Syscalls that are interrupted by a signal may not be logged (for example
SYS_read
). - Syscalls that are interrupted by a signal may be logged inaccurately. For
example, the log may show
syscall(...) = -1 (EINTR)
, but the managed process may not actually see this return value. Instead the syscall may be restarted.
experimental.unblocked_syscall_latency
Default: "1 microseconds"
Type: String
The simulated latency of an unblocked syscall. For simulation efficiency, this
latency is only added when max_unapplied_cpu_latency
is reached.
Ignored when
general.model_unblocked_syscall_latency
is false.
experimental.unblocked_vdso_latency
Default: "10 nanoseconds"
Type: String
The simulated latency of an unblocked vdso function. For simulation efficiency, this
latency is only added when max_unapplied_cpu_latency
is reached.
Ignored when
general.model_unblocked_syscall_latency
is false.
experimental.use_cpu_pinning
Default: true
Type: Bool
Pin each thread and any processes it executes to the same logical CPU Core to improve cache affinity.
experimental.use_dynamic_runahead
Default: false
Type: Bool
Update the minimum runahead dynamically throughout the simulation.
experimental.use_memory_manager
Default: false
Type: Bool
Use the MemoryManager in memory-mapping mode. This can improve
performance, but disables support for dynamically spawning processes
inside the simulation (e.g. the fork
syscall).
experimental.use_new_tcp
Default: false
Type: Bool
Use the rust TCP implementation.
experimental.use_object_counters
Default: true
Type: Bool
Count object allocations and deallocations. If disabled, we will not be able to detect object memory leaks.
experimental.use_preload_libc
Default: true
Type: Bool
Preload our libc library for all managed processes for fast syscall interposition when possible.
experimental.use_preload_openssl_crypto
Default: false
Type: Bool
Preload our OpenSSL crypto library for all managed processes to skip some AES crypto operations, which may speed up simulation if your CPU lacks AES-NI support. However, it changes the behavior of your application and can cause bugs in OpenSSL that are hard to notice. You should probably not use this option unless you really know what you're doing.
experimental.use_preload_openssl_rng
Default: true
Type: Bool
Preload our OpenSSL RNG library for all managed processes to mitigate non-deterministic use of OpenSSL.
experimental.use_sched_fifo
Default: false
Type: Bool
Use the SCHED_FIFO
scheduler. Requires CAP_SYS_NICE
. See sched(7),
capabilities(7).
experimental.use_syscall_counters
Default: true
Type: Bool
Count the number of occurrences for individual syscalls.
experimental.use_worker_spinning
Default: true
Type: Bool
Each worker thread will spin in a sched_yield
loop while waiting for a new task. This is ignored
if not using the thread-per-core scheduler.
This may improve runtime performance in some environments.
host_option_defaults
Default options for all hosts. These options can also be overridden for each
host individually in the host's hosts.<hostname>.host_options
section.
host_option_defaults.log_level
Default: null
Type: "error" OR "warning" OR "info" OR "debug" OR "trace" OR null
Log level at which to print host log messages.
host_option_defaults.pcap_capture_size
Default: "65535 B"
Type: String OR Integer
How much data to capture per packet (header and payload) if pcap logging is enabled.
The default of 65535 bytes is the maximum length of an IP packet.
host_option_defaults.pcap_enabled
Default: false
Type: Bool
Should Shadow generate pcap files?
Logs all network input and output for this host in PCAP format (for viewing in
e.g. wireshark). The pcap files will be stored in the host's data directory,
for example shadow.data/hosts/myhost/eth0.pcap
.
hosts
Required
Type: Object
The simulated hosts which execute processes. Each field corresponds to a host configuration, with the field name being used as the network hostname. A hostname must follow the character requirements of hostname(7).
Shadow assigns each host to a network node in the network graph.
In Shadow, each host is given an RNG whose seed is derived from the global seed
(general.seed
) and the hostname. This means that changing a
host's name will change that host's RNG seed, subtly affecting the simulation
results.
hosts.<hostname>.bandwidth_down
Default: null
Type: String OR Integer OR null
Downstream bandwidth capacity of the host.
Overrides any default bandwidth values set in the assigned network graph node.
hosts.<hostname>.bandwidth_up
Default: null
Type: String OR Integer OR null
Upstream bandwidth capacity of the host.
Overrides any default bandwidth values set in the assigned network graph node.
hosts.<hostname>.ip_addr
Default: null
Type: String OR null
IP address to assign to the host.
This IP address must not conflict with the address of any other host (two hosts must not have the same IP address).
hosts.<hostname>.network_node_id
Required
Type: Integer
Network graph node ID to assign the host to.
hosts.<hostname>.host_options
See host_option_defaults
for supported fields.
Example:
hosts:
client:
...
host_options:
log_level: debug
hosts.<hostname>.processes
Required
Type: Array
Virtual software processes that the host will run.
hosts.<hostname>.processes[*].args
Default: ""
Type: String OR Array of String
Process arguments.
The arguments can be specified as a string in a shell command-line format:
args: "--user-agent 'Mozilla/5.0 (compatible; ...)' http://myserver:8080"
Or as an array of strings:
args: ['--user-agent', 'Mozilla/5.0 (compatible; ...)', 'http://myserver:8080']
Shell expansion (which includes ~/
expansion) is not performed on either
format. In the command-line format, the string is parsed as an argument vector
following typical shell quotation parsing rules.
hosts.<hostname>.processes[*].environment
Default: ""
Type: Object
Environment variables passed when executing this process.
Shell expansion (which includes ~/
expansion) is not performed on any fields.
Examples:
environment:
ENV_A: "1"
ENV_B: foo
environment: { ENV_A: "1", ENV_B: foo }
hosts.<hostname>.processes[*].expected_final_state
Default: {exited: 0}
Type: {"exited": <Integer>} OR {"signaled": Unix Signal} OR "running"
The expected state of the process at the end of the simulation. If the process
exits before the end of the simulation with an unexpected state, or is still running
at the end of the simulation when this was not running
, shadow will log an error
and return a non-zero status for the simulation.
Use exited
to indicate that a process should have exited normally; e.g. by returning
from main
or calling exit
.
Use signaled
to indicate that a process should have been killed by a signal.
Use running
for a process expected to still be running at the end of the simulation,
such as a server process that you didn't arrange to shutdown before the end of the simulation.
(All processes will be killed by Shadow when the simulation ends).
Examples:
{exited: 0}
{exited: 1}
{signaled: SIGINT}
{signaled: 9}
running
Only processes started directly from the configuration have an
expected_final_state
. Processes that those processes start (e.g. via fork
in C, or running an executable in a shell script) don't have one. Generally it's
the parent process's responsibility to do any necessary validation of the exit
status of its children (e.g. via waitpid
in C, or checking $?
in a bash
script).
hosts.<hostname>.processes[*].path
Required
Type: String
If the path begins with ~/
, it will be considered relative to the current
user's home directory. No other shell expansion is performed on the path.
Bare file basenames like sleep
will be located using Shadow's PATH
environment variable (e.g. to /usr/bin/sleep
).
hosts.<hostname>.processes[*].shutdown_signal
Default: "SIGTERM"
Type: Unix Signal
The signal that will be sent to the process at
hosts.<hostname>.processes[*].shutdown_time
.
Signals specified by name should be all-caps and include the SIG prefix; e.g.
"SIGTERM".
Many long-running processes support exiting cleanly when sent SIGTERM
or
SIGINT
.
If the process is expected to be killed directly by the signal instead of
catching it and exiting cleanly, you can set
expected_final_state
to prevent
Shadow from interpreting this as an error. e.g. SIGKILL
cannot be caught, so
will always result in an end state of {signaled: SIGKILL}
if the process didn't
already exit before the signal was sent.
path: sleep
args: "1000"
start_time: 1s
shutdown_time: 2s
shutdown_signal: SIGKILL
expected_final_state: {signaled: SIGKILL}
hosts.<hostname>.processes[*].shutdown_time
Default: null
Type: String OR Integer OR null
The simulated time at which to send
hosts.<hostname>.processes[*].shutdown_signal
to the process. This must be before general.stop_time
.
hosts.<hostname>.processes[*].start_time
Default: "0 sec"
Type: String OR Integer
The simulated time at which to execute the process. This must be before
general.stop_time
.
Managing Complex Configurations
It is sometimes useful to generate shadow configuration files dynamically. Since Shadow accepts configuration files in YAML 1.2 format, there are many options available; even more so since JSON is also valid YAML 1.2.
YAML templating
YAML itself has some features to help avoid repetition. When using these
features, it can be helpful to use shadow's --show-config
flag to examine the
"flat" generated config.
An individual node can be made into an anchor (&AnchorName x
), and
referenced via an alias (*AnchorName
). For example, here we create
and use the anchors Node
, Fast
, Slow
, ClientPath
, and ServerPath
:
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
fast_client:
network_node_id: &Node 0
bandwidth_up: &Fast "100 Mbit"
bandwidth_down: *Fast
processes:
- path: &ClientPath "/path/to/client"
# ...
slow_client:
network_node_id: *Node
bandwidth_up: &Slow "1 Mbit"
bandwidth_down: *Slow
processes:
- path: *ClientPath
# ...
fast_server:
network_node_id: *Node
bandwidth_up: *Fast
bandwidth_down: *Fast
processes:
- path: &ServerPath "/path/to/server"
# ...
slow_server:
network_node_id: *Node
bandwidth_up: *Slow
bandwidth_down: *Slow
processes:
- path: *ServerPath
We can use extension fields to move our constants into one place:
x-constants:
- &Node 0
- &Fast "100 Mbit"
- &Slow "1 Mbit"
- &ClientPath "/path/to/client"
- &ServerPath "/path/to/server"
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
fast_client:
network_node_id: *Node
bandwidth_up: *Fast
bandwidth_down: *Fast
processes:
- path: *ClientPath
slow_client:
network_node_id: *Node
bandwidth_up: *Slow
bandwidth_down: *Slow
processes:
- path: *ClientPath
fast_server:
network_node_id: *Node
bandwidth_up: *Fast
bandwidth_down: *Fast
processes:
- path: *ServerPath
slow_server:
network_node_id: *Node
bandwidth_up: *Slow
bandwidth_down: *Slow
processes:
- path: *ServerPath
We can also use merge keys to make extendable templates for fast and slow hosts:
x-constants:
- &Node 0
- &Fast "100 Mbit"
- &Slow "1 Mbit"
- &ClientPath "/path/to/client"
- &ServerPath "/path/to/server"
- &FastHost
network_node_id: *Node
bandwidth_up: *Fast
bandwidth_down: *Fast
- &SlowHost
network_node_id: *Node
bandwidth_up: *Slow
bandwidth_down: *Slow
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
fast_client:
<<: *FastHost
processes:
- path: *ClientPath
slow_client:
<<: *SlowHost
processes:
- path: *ClientPath
fast_server:
<<: *FastHost
processes:
- path: *ServerPath
slow_server:
<<: *SlowHost
processes:
- path: *ServerPath
Dynamic Generation
There are many tools and libraries for generating YAML and JSON. These can be helpful for representing more complex relationships between parameter values.
Suppose we want to add a cleanup process to each host that runs one second before the simulation ends. Since YAML doesn't support arithmetic, the following doesn't work:
x-constants:
- &StopTimeSec 10
- &CleanupProcess
# This will evaluate to the invalid time string "10 - 1"; not "9"
start_time: *StopTimeSec - 1
...
# ...
In such cases it may be helpful to write your configuration in a language that does support more advanced features that can generate YAML or JSON.
Python example
We can achieve the desired effect in Python like so:
#!/usr/bin/env python3
Node = 0
StopTimeSec = 10
Fast = "100 Mbit"
Slow = "1 Mbit"
ClientPath = "/path/to/client"
ServerPath = "/path/to/server"
FastHost = {
'network_node_id': Node,
'bandwidth_up': Fast,
'bandwidth_down': Fast,
}
SlowHost = {
'network_node_id': Node,
'bandwidth_up': Slow,
'bandwidth_down': Slow,
}
CleanupProcess = {
'start_time': f'{StopTimeSec - 1}s',
'path': '/path/to/cleanup',
}
config = {
'general': {
'stop_time': '10s',
},
'network': {
'graph': {
'type': '1_gbit_switch'
},
},
'hosts': {
'fast_client': {
**FastHost,
'processes': [
{'path': ClientPath},
CleanupProcess,
],
},
'slow_client': {
**SlowHost,
'processes': [
{'path': ClientPath},
CleanupProcess,
],
},
'fast_server': {
**FastHost,
'processes': [
{'path': ServerPath},
CleanupProcess,
],
},
'slow_server': {
**SlowHost,
'processes': [
{'path': ServerPath},
CleanupProcess,
],
},
},
}
import yaml
print(yaml.safe_dump(config))
Nix example
There are also languages that specialize in doing this kind of advanced configuration generation. For example, using NixOs's config language:
let
Node = 0;
StopTimeSec = 10;
Fast = "100 Mbit";
Slow = "1 Mbit";
ClientPath = "/path/to/client";
ServerPath = "/path/to/server";
FastHost = {
network_node_id = Node;
bandwidth_up = Fast;
bandwidth_down = Fast;
};
SlowHost = {
network_node_id = Node;
bandwidth_up = Slow;
bandwidth_down = Slow;
};
CleanupProcess = {
start_time = (toString (StopTimeSec - 1)) + "s";
path = "/path/to/cleanup";
};
in
{
general = {
stop_time = (toString StopTimeSec) + "s";
};
network = {
graph = {
type = "1_gbit_switch";
};
};
hosts = {
fast_client = FastHost // {
processes = [
{path = ClientPath;}
CleanupProcess
];
};
slow_client = SlowHost // {
processes = [
{path = ClientPath;}
CleanupProcess
];
};
fast_server = FastHost // {
processes = [
{path = ServerPath;}
CleanupProcess
];
};
slow_server = SlowHost // {
processes = [
{path = ServerPath;}
CleanupProcess
];
};
};
}
This can be converted to JSON, which is also valid YAML, with:
nix eval -f example.nix --json
Network Graph Overview
Processes running in Shadow do not have access to the internet; instead, processes running on Shadow virtual hosts utilize an internal routing module to communicate with other processes running on other virtual hosts in the simulation. The routing module is used to position virtual hosts within a network topology, to compute communication paths between virtual hosts, and to enforce network path characteristics like latency and packet loss.
Importantly, the routing module is currently used to model the performance characteristics of internet paths; we do not simulate the behavior of network routers (we do not run routing protocols like BGP).
This page describes the routing module and how it can be configured.
Graph
Shadow represents a network topology over which processes can communicate using a weighted graph. The graph contains vertices that abstractly represent network locations, and edges representing network paths between those locations.
When referring to a network graph, the terms vertices and nodes are interchangeable. In our documentation, we refer to these as nodes. Note that nodes in the network graph are distinct from virtual hosts in the Shadow config file: a virtual host models an end-host machine, whereas a network node represents a location at which a host can connect to the simulated network.
Shadow requires that the network graph is connected such that there exists at least one path (a series of one or more edges) between every pair of nodes.
Behavior
The graph encodes network positioning and path characteristics as attributes on the nodes and edges. Shadow uses the connectivity graph along with the information encoded in node and edge attributes to:
- attach virtual hosts to specific nodes (i.e., locations) in the network graph;
- assign the bandwidth allowed for each attached virtual host;
- compute the shortest path (weighted by edge
latency
) between two virtual hosts using Dijkstra's algorithm; and - compute the end-to-end latency and packet loss for the shortest path.
The bandwidth of the virtual hosts and the end-to-end latency and packet loss for a shortest path between two virtual hosts are then enforced for all network communication.
Important Notes
- The network graph may be directed or undirected, as long as the graph is structured such that every node can reach every other node through a series of edges.
- If the network graph is a complete
graph (there exists a single
unique edge between every pair of nodes), then we can avoid running the
shortest path algorithm as a performance optimization by setting the
use_shortest_path
option to
False
. - Each node in the graph must have a self-loop (an edge from the node to itself). This edge will be used for communication between two hosts attached to the same node, regardless of if a shorter path exists.
Network Graph Attributes
We encode attributes on the nodes and edges that allow for configuring the simulated network characteristics. The attributes and their effect on the simulated network are described in more detail (alongside a simple example graph) on the network graph specification page.
Using an Existing Graph
We created a large network graph representing worldwide latencies and bandwidths as of 2018 using the RIPE Atlas measurement platform. The graph contains network bandwidths and latencies in and between major cities around the world, and is suitable for general usage for most types of Shadow simualtions. The graph (updated for Shadow version 2.x) is available for download as a research artifact and more details about the measurement methodology is available on the research artifacts site.
Note: the scripts we used to create the graph are also available, but are not recommended for general use. The scripts require advanced knowledge of RIPE Atlas and also require that you possess RIPE Atlas credits to conduct the measurements needed to create a new graph. We recommend using our existing graph linked above instead, which we may periodically update.
Creating Your Own Graph
The python module networkx can be used to create and manipulate more complicated graphs.
Network Graph Specification
The network graph overview provides a general summary of Shadow's use of a network graph to abstractly model network position and to connect virtual hosts in a network topology while enforcing network characteristics on paths between hosts. This page describes the specific attributes that can be configured in the network graph, and the effect that each attribute has on the simulation.
Example Graph
Below is an example of a simple network graph in the Shadow-supported GML format (note that GML calls graph vertices as nodes, but these terms are generally interchangeable).
graph [
directed 0
node [
id 0
label "node at 1.2.3.4"
host_bandwidth_down "100 Mbit"
host_bandwidth_up "100 Mbit"
]
edge [
source 0
target 0
label "path from 1.2.3.4 to 1.2.3.4"
latency "10 ms"
jitter "0 ms"
packet_loss 0.0
]
]
Configurable Attributes
graph.directed
node.id
node.label
node.host_bandwidth_down
node.host_bandwidth_up
edge.source
edge.target
edge.label
edge.latency
edge.jitter
edge.packet_loss
graph.directed
Required: False
Default: 0
Type: Integer
Specifies the symmetry of the edges in the graph. If set to 0
(the default),
the graph is an undirected
graph: an edge
between node u
and node v
is symmetric and can be used to construct a
path both from u
to v
and from v
to u
. If set to 1
, the graph is a
directed graph: an edge from
node u
to node v
is assymmetric and can only be used to construct a path
from u
to v
(a separate edge from v
to u
must be specified to compose a
path in the reverse direction).
node.id
Required: True
Type: Integer
A unique integer identifier for a given node.
node.label
Required: False
Default: n/a
Type: String
An optional, human-meaningful string description of the node. The string may be used in log messages printed by Shadow.
node.host_bandwidth_down
Required: True
Type: String
A string defining the downstream (receive) bandwidth that will be allowed for
any host attached to this node. Hosts may individually override this value in
the Shadow config file.
The format of the string specifies the bandwidth and its unit as described in
the config documentation, e.g., 10 Mbit
. Note that
this bandwidth is allowed for every host that is attached to this node; it is
not the total bandwidth logically available at the node (which is not
defined).
node.host_bandwidth_up
Required: True
Type: String
A string defining the upstream (send) bandwidth that will be allowed for any
host attached to this node. Hosts may individually override this value in the
Shadow config file. The
format of the string specifies the bandwidth and its unit as described in the
config documentation, e.g., 10 Mbit
. Note that
this bandwidth is allowed for every host that is attached to this node; it is
not the total bandwidth logically available at the node (which is not
defined).
edge.source
Required: True
Type: Integer
The unique integer identifier of the first of two nodes of the edge. The node must exist in the graph. If the graph is directed, this node is treated as the source or start of the edge.
edge.target
Required: True
Type: Integer
The unique integer identifier of the second of two nodes of the edge. The node must exist in the graph. If the graph is directed, this node is treated as the target or end of the edge.
edge.label
Required: False
Default: n/a
Type: String
An optional, human-meaningful string description of the edge. The string may be used in log messages printed by Shadow.
edge.latency
Required: True
Type: String
The latency that will be added to packets traversing this edge. This value is
used as a weight while running Dijkstra's shortest path algorithm. The format of
the string specifies the latency and its unit, e.g., 10 ms
. If a unit is not
specified, it will be assumed that it is in the base unit of "seconds". The
latency must not be 0.
edge.jitter
Required: False
Default: n/a
Type: String
This keyword is allowed but currently nonfunctional; it is reserved for future use.
edge.packet_loss
Required: True
Type: Float
A fractional value between 0 and 1 representing the chance that a packet traversing this edge will get dropped.
Disabling Sidechannel Mitigations
Sidechannel attacks in the style of Spectre and Meltdown allow malicious code to access data it otherwise wouldn't be able to. Modern systems employ countermeasures to prevent these attacks, which typically incur some performance cost, and may not be necessary when running Shadow simulations. i.e. Shadow's performance can be improved by disabling these mitigations.
Keep in mind that Shadow already isn't designed to protect itself or its host system from malicious software. See Security.
Speculative Store Bypass
The Speculative Store Bypass attack allows malicious code to read data it otherwise wouldn't be able to, e.g. due to software sandboxing such as in a javascript engine. For a high-level overview of this attack and mitigations, see: https://www.redhat.com/en/blog/speculative-store-bypass-explained-what-it-how-it-works. For a more technical overview, see https://software.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf.
We have observed the mitigation for this vulnerability to add roughly a 30% performance overhead to Shadow simulations. Because process isolation is already sufficient to mitigate this vulnerability (See "Process Isolation"), and because Shadow already makes no attempt to protect itself from malicious code within its own processes, and isn't designed to run in a managed-code environment itself, enabling this mitigation in Shadow and its managed processes doesn't have any clear benefit.
Shadow itself makes use of seccomp
, but uses the
SECCOMP_FILTER_FLAG_SPEC_ALLOW
flag to avoid turning on this mitigation. It
also logs a warning if it detects this mitigation is already enabled.
One common way this mitigation can be turned on inadvertently is by running
inside a Docker container, with seccomp enabled (which is the default). You can
avoid this by turning off seccomp entirely (using --security-opt seccomp=unconfined
, but this might not be an option when running in a
shared environment. Unfortunately, Docker currently doesn't
expose an option to use its seccomp
functionality without turning on this mitigation.
Another way to avoid enabling this mitigation is by changing the kernel
parameter
spec_store_bypass_disable
. Overriding its default value of seccomp
to
prctl
will still allow software sandboxes such as javascript engines to enable
this mitigation, but will no longer enable it by default when installing a
seccomp
filter. In principle this could create a vulnerability if there's code
running on the system that relies on the default behavior without explicitly
opting in via prctl
, so use some caution. For more discussion on this
parameter, see this discussion on the kernel mailing list about whether the
kernel default ought to be changed from seccomp
to prctl
:
https://lore.kernel.org/lkml/20201104215702.GG24993@redhat.com/
Other mitigations
In some ad-hoc measurements we've found that disabling all sidechannel
mitigations with
mitigations=off
also provides a significant performance boost. We haven't thoroughly evaluated
the exact benefits though, and this setting could expose your system to attack.
At a minimum, this isn't advised on a system that runs any untrusted code at
any privilege level, including in managed environments such as running
javascript in a web browser.
Parallel simulations
Some care must be taken when running multiple Shadow simulations on the same hardware at the same time. By default, Shadow pins threads to specific CPUs to avoid CPU migrations. The CPU selection logic isn't aware of other processes that may be using substantial CPU time and/or pinning, including other Shadow simulations. i.e. without some care, multiple Shadow simulations running on the same machine at the same time will generally end up trying to use the same set of CPUs, even if other CPUs on the machine are idle.
Disabling pinning
The simplest solution is to disable CPU pinning entirely. This has a substantial
performance penalty (with some reports as high as 3x), but can be a reasonable
solution for small simulations. Pinning can be disabled by passing
--use-cpu-pinning=false
to Shadow.
Setting an initial CPU affinity
Shadow checks the initial CPU affinity assigned to it, and only assigns to CPUs
within that set. The easiest way to run Shadow with a subset of CPUs is with the
taskset
utility. e.g. to start one Shadow simulation using CPUs 0-9 and
another using CPUs 10-19, you could use:
$ (cd sim1 && taskset --cpu-list 0-9 shadow sim1config.yml) &
$ (cd sim2 && taskset --cpu-list 10-19 shadow sim2config.yml) &
Shadow similarly avoids trying to pin to CPUs outside of its cgroup cpuset (see
cpuset(7)). This allows Shadow to work correctly in such scenarios (such as
running in a container on a shared machine that only has access to some CPUs),
but is generally more complex and requires higher privilege than setting the CPU
affinity with taskset
.
Choosing a CPU set
When assigning Shadow a subset of CPUs, some care must be taken to get optimal
performance. You can use the lscpu
utility to see the layout of the CPUs on
your machine.
- Avoid using multiple CPUs on the same core (aka hyperthreading). Such CPUs compete with each-other for resources.
- Prefer CPUs on the same socket and (NUMA) node. Such CPUs share cache, which is typically beneficial in Shadow simulations.
For example, given the lscpu
output:
$ lscpu --parse=cpu,core,socket,node
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node
0,0,0,0
1,1,1,1
2,2,0,0
3,3,1,1
4,4,0,0
5,5,1,1
6,6,0,0
7,7,1,1
8,8,0,0
9,9,1,1
10,10,0,0
11,11,1,1
12,12,0,0
13,13,1,1
14,14,0,0
15,15,1,1
16,16,0,0
17,17,1,1
18,18,0,0
19,19,1,1
20,20,0,0
21,21,1,1
22,22,0,0
23,23,1,1
24,24,0,0
25,25,1,1
26,26,0,0
27,27,1,1
28,28,0,0
29,29,1,1
30,30,0,0
31,31,1,1
32,32,0,0
33,33,1,1
34,34,0,0
35,35,1,1
36,36,0,0
37,37,1,1
38,38,0,0
39,39,1,1
40,0,0,0
41,1,1,1
42,2,0,0
43,3,1,1
44,4,0,0
45,5,1,1
46,6,0,0
47,7,1,1
48,8,0,0
49,9,1,1
50,10,0,0
51,11,1,1
52,12,0,0
53,13,1,1
54,14,0,0
55,15,1,1
56,16,0,0
57,17,1,1
58,18,0,0
59,19,1,1
60,20,0,0
61,21,1,1
62,22,0,0
63,23,1,1
64,24,0,0
65,25,1,1
66,26,0,0
67,27,1,1
68,28,0,0
69,29,1,1
70,30,0,0
71,31,1,1
72,32,0,0
73,33,1,1
74,34,0,0
75,35,1,1
76,36,0,0
77,37,1,1
78,38,0,0
79,39,1,1
A reasonable configuration for two simulations might be taskset --cpu-list 0-39:2
(CPUs 0,2,...,38) and taskset --cpu-list 1-39:2
. (CPUs 1,3,...,39).
This assignment leaves CPUs 40-79 idle, since those share the same physical
cores at CPUs 0-39, puts the first simulation on socket 0 and numa node 0, and
the second simulation on socket 1 and numa node 1.
Configuration options
Shadow's configuration options are generally tuned for optimal performance using Tor benchmarks, but not all system architectures and simulation workloads are the same. Shadow has several configuration options that may improve the simulation performance. Many of these options are considered "experimental", which means that they may be changed or removed at any time. If you find any of these options useful, let us know.
Be careful as these options may also worsen the simulation performance or in
some cases alter the simulation behaviour. Also remember that Shadow's
--debug
build flag will significantly reduce the simulation performance, so
don't use this flag when running long simulations.
bootstrap_end_time
Shadow supports an optional "bootstrapping period" of high network bandwidth and reliability for simulations which require network-related bootstrapping (for example Tor). While the network performance characteristics will be unrealistic during this time period, it can significantly reduce the simulation's wall clock time. After this bootstrapping period ends, the network bandwidth/reliability is reverted back to the values specified in the simulation and network configuration.
log_level
Using lower log levels such as debug
or trace
will lead to a much greater
volume of log messages. Writing these messages to disk can significantly impact
the simulation run time performance, even if you're writing to an SSD. Unless
you're actively debugging an issue in Shadow, you should use info
level or
higher.
Along these same lines, you should try to reduce the amount of disk I/O of the managed applications running within Shadow. Even if they each write a reasonably small amount, it can add up to a lot of disk I/O when running simulations with thousands of processes.
heartbeat_interval
and host_heartbeat_interval
Shadow logs simulation statistics at given simulation time intervals. If any of these time intervals are small relative to the total time of the simulation, a large number of log lines will be written. If the log is being written to disk, this increased disk I/O may slow down the simulation dramatically.
parallelism
Simulations with multiple hosts can be parallelized across multiple threads. By default Shadow tries to choose an optimal number of threads to run in parallel, but a different number of threads may yield better run time performance.
use_cpu_pinning
CPU pinning is enabled by default and should improve the simulation performance, but in shared computing environments it might be beneficial to disable this option.
scheduler
Shadow supports two different types of work schedulers. The default
thread_per_core
scheduler has been found to be significantly faster on most
machines, but may perform worse than the thread_per_host
scheduler in rare
circumstances.
use_memory_manager
Shadow supports a memory manager that uses shared memory maps to reduce the overhead of accessing a managed process' data from Shadow's main process, but this is disabled by default as it does not support other Shadow features such as emulating the fork/exec syscalls. If you do not need support for these features, enabling this memory manager may slightly improve simulation performance.
use_worker_spinning
Shadow's thread-per-core scheduler uses a spinloop by default. While this results in significant performance improvements in our benchmarks, it may be worth testing Shadow's performance with this disabled.
max_unapplied_cpu_latency
If model_unblocked_syscall_latency
is
enabled, increasing the max unapplied CPU latency may improve the simulation
run time performance.
runahead
This option effectively sets a minimum network latency. Increasing this value will allow for better simulation parallelisation and possibly better run time performance, but will affect the network characteristics of the simulation.
Profiling
Profiling can be useful for improving the performance of experiments, either as improvements to the implementation of Shadow itself, or in altering the configuration of the experiments you are running.
Profiling with top
/htop
Tools like top
and htop
will give good first-order approximations for what
Shadow is doing. While they can only give system-wide to thread-level
granularity, this can often still tell you important details such as whether
Shadow, the simulated processes, or the kernel are consuming memory and
processor cycles. E.g., if you're running into memory constraints, the RES
or
MEM
column of these tools can tell you where to start looking for ways to
address that. If execution time is too long, sorting by CPU
or TIME
can
provide insight into where that time is being spent.
One limitation to note is that Shadow relies on spinlocks in barriers for some of its operation. Especially when running with many threads, these spinlocks will show as consuming most of the CPU anytime the simulation is bottlenecked on few simulated processes. Telling when this is happening can be difficult in these tools, because no symbol information is available.
Profiling with perf
The perf
tool is a powerful interface to the Linux kernel's performance
counter subsystem. See man perf
or the perf
wiki for full details on how
to use it, but some highlights most relevant to Shadow execution time are given
here.
Regardless of how you are using perf
, the aforementioned complication of
spinlocks in Shadow apply. Namely, when there is any bottleneck on the barrier,
the symbols associated with the spinlocks will dominate the sample
counts. Improving the performance of the spinlocks will not improve the
performance of the experiment, but improving the performance of whatever is
causing the bottleneck (likely something towards the top of non-spinlock
symbols) can.
perf top
The perf top
command will likely be the most practical mode of
perf
for profiling all parts of a Shadow experiment. It requires one
of: root access, appropriately set up Linux capabilities, or a system
configured to allow performance monitoring (similar to attaching to
processes with gdb
), so isn't always available, but is very simple
when it is. The interface is similar to top
's, but provides
information on the granularity of symbols, across the entire
system. This means you will be able to tell which specific functions
in Shadow, the simulated processes, and the kernel are consuming CPU
time.
When perf top
can't find symbol information for a process, it will display
the offset of the instruction as hex instead. (Note this means it will be
ranked by instruction, rather than the entire function.) If you know where the
respective executable or shared object file is, you can look up the name of the
symbol for that instruction's function by opening the file with gdb
and
running info symbol [ADDRESS]
. If gdb
can't find the symbols either, you
can look it up manually using readelf -s
and finding the symbol with the
largest address smaller than the offset you are looking for (note that
readelf
does not output the symbols in order of address; you can pipe the
output to awk '{$1=""; print $0}' | sort
to get a sorted list).
Details on more options (e.g., for filtering the sampled CPUs or processes) can
be found in man perf top
.
perf record
If you know which particular process you wish to profile, perf record
can
give far greater detail than other options. To use it for Shadow, either run it
when starting Shadow:
perf record shadow shadow.config.yaml > shadow.log
Or, attach to a running Shadow process:
perf record -p <PID>
Attaching to a process requires similar permissions as perf top
, but can be
used to profile any process, including the simulated processes launched by
Shadow.
The perf record
process will write a perf.data
file when you press Ctrl-c,
or Shadow ends. You can then analyze the report:
perf report
More details are available in man perf record
and man perf report
.
Stability Guarantees
Shadow generally follows the semantic versioning principles:
- PATCH version increases (ex:
2.0.1
to2.0.2
) are intended for bug fixes. - MINOR version increases (ex:
2.0.2
to2.1.0
) are intended for new backwards-compatible features and changes. - MAJOR version increases (ex:
1.2.2
to2.0.0
) are intended for incompatible changes.
More specifically, we aim to provide the following guarantees between MINOR versions:
- Command line and configuration option changes and additions will be
backwards-compatible.
- Default values for existing options will not change.
- File and directory names in Shadow's data directory
(
general.data_directory
) will not change. - Support for any of Shadow's supported platforms will not be dropped, unless those platforms no longer receive free updates and support from the distribution's developer.
- We will not change the criteria for the minimum supported Linux kernel version as documented in supported platforms. (Note though that this still allows us to increase the minimum kernel version as a result of dropping support for a platform, which we may do as noted above).
The following may change between ANY versions (MAJOR, MINOR, or PATCH):
- The log format and messages.
- Experimental options may change or be removed.
- The simulation may produce different results.
- New files may be added in Shadow's data directory
(
general.data_directory
).- If new files are added in Shadow's host-data directories, they will begin
with the prefix
<process name>.<pid>
.
- If new files are added in Shadow's host-data directories, they will begin
with the prefix
Non-goal: Security
Never run code under Shadow that you wouldn't trust enough to run outside of Shadow on the same system at the same level of privilege.
While Shadow uses some of the same techniques used by other systems to isolate potentially vulnerable or malicious software, this is not a design goal of Shadow. A managed program in a Shadow simulation can, if it tries to, detect that it's running under such a simulation and break out of the "sandbox" to issue native system calls.
For example:
- Shadow currently doesn't restrict access to the host file system. A malicious managed program can read and modify the same files that Shadow itself can.
- Shadow inserts some code via
LD_PRELOAD
into managed processes. This code intentionally has the ability to make non-interposed system calls (which it uses to communicate with the Shadow process), and makes no effort to protect itself from the managed code running in the same process.
Reporting security issues
Security issues can be reported to unique_halberd_0m@icloud.com .
Limitations and workarounds
Shadow can typically run applications without modification, but there are a few limitations to be aware of.
If you are severely affected by one of these limitations (or another not listed here) let us know, as this can help us prioritize our improvements to Shadow. You may reach out in our discussions or issues.
Unimplemented system calls and options
When Shadow encounters a syscall or a syscall option that it hasn't implemented,
it will generally return ENOSYS
and log at warn
level or higher. In many
such cases the application is able to recover, and this has little or no effect
on the ultimate results of the simulation.
There are some syscalls that shadow doesn't quite emulate faithfully, but has a
"best effort" implementation. As with unimplemented sysalls, shadow logs at
warn
level when encountering such a syscall.
vfork
A notable example of a not-quite faithfully implemented syscall is
vfork
, which shadow
effectively implements as a synonym for fork
. Usage of vfork
that is
compliant with the POSIX.1 specification that "behavior is undefined if the
process created by vfork() either modifies any data other than a variable of
type pid_t used to store the return value...". However, usage that relies on
specific Linux implementation details of vfork
(e.g. that a write to a global
variable from the child will be observed by the parent) won't work correctly.
As in other such cases, shadow logs a warning when it encounters vfork
, so
that users can identify it as the potential source of problems if a simulation
doesn't work as expected.
IPv6
Shadow does not yet implement IPv6. Most applications can be configured to use IPv4 instead. Tracking issue: #2216.
Statically linked executables
Shadow relies on LD_PRELOAD
to inject code into the managed processes. This
doesn't work for statically linked executables. Tracking issue:
#1839.
Most applications can be dynamically linked, though occasionally you may need to edit build scripts and/or recompile.
golang
golang
typically defaults to producing statically linked executables, unless
the application uses cgo
. Using the networking functionality of golang
's
standard library usually pulls in cgo
by default and thus results in a
dynamically linked executable.
You can also explicitly force go
to produce a dynamically linked executable. e.g.
# Install a dynamically linked `std`
go install -buildmode=shared std
# Build your application with dynamic linking
go build -linkshared myapp.go
Busy loops
By default, Shadow runs each thread of managed processes until it's blocked by a
syscall such as nanosleep
, read
, select
, futex
, or epoll
. Likewise,
time only moves forward when Shadow is blocked on such a call - Shadow
effectively models the CPU as being infinitely fast. This model is generally
sufficient for modeling non-CPU-bound network applications.
Unfortunately this model can lead to deadlock in the case of "busy loops", where
a thread repeatedly checks for something to happen indefinitely or until some
amount of wall-clock-time has passed. e.g., a worker thread might repeatedly
check whether work is available for some amount of time before going to sleep on
a futex
, to avoid the latency of going to sleep and waking back up in cases
where work arrives quickly. However since Shadow normally doesn't advance time
when making non-blocking syscalls or allow other threads to run, such a loop can
run indefinitely, deadlocking the whole simulation.
When feasible, it's usually good practice to modify such loops to have a bound on the number of iterations instead of or in addition to a bound on wallclock time.
For cases where modifying the loop is infeasible, Shadow provides the option
--model-unblocked-syscall-latency
. When this option is enabled, Shadow moves
time forward a small amount on every syscall (and VDSO function call), and
switches to another thread if one becomes runnable in the meantime (e.g. because
network data arrived when the clock moved forward, unblocking it).
This feature should only be used when it's needed to get around such loops. Some limitations:
-
It may cause the simulation to run slower.
-
Enabling this feature forces Shadow to switch between threads more frequently, which is costly and hurts cache performance. We have minimized this effect to the extent that we can, but it can especially hurt performance when there are multiple unblocked threads on a single simulated Host, forcing Shadow to keep switching between them to keep the simulated time synchronized.
-
Busy loops intrinsically waste some CPU cycles. Outside of Shadow this can be a tradeoff for improved latency by avoiding a thread switch. However, in a Shadow simulation this latency isn't modeled, so busy-looping instead of blocking immediately has no benefit to simulated performance; only cost to simulation performance. If feasible, changing the busy-loop to block immediately instead of spinning should improve simulation performance without substantially affecting simulation results.
-
-
It's not meant as an accurate model of syscall latency. It generally models syscalls as being somewhat faster than they would be on a real system to minimize the impact on simulation results.
-
Nonetheless it does affect simulation results. Arguably this model is more accurate, since syscalls on real systems do take non-zero time, but it makes the time model more complex to understand and reason about.
-
It still doesn't account for time spent by the CPU executing code, which also means that a busy-loop that makes no syscalls at all can still lead to deadlock. Fortunately such busy loops are rare and are generally agreed upon to be bugs, since they'd also potentially monopolize a CPU indefinitely when run natively.
For more about this topic, see #1792.
Compatibility Notes
- libopenblas
- cURL
- Wget2
- Nginx
- iPerf 2
- iPerf 3
- Jetty
- etcd (distributed key-value store)
- CTorrent and opentracker
- http-server
libopenblas
libopenblas is a fairly low-level library, and can get pulled in transitively via dependencies. e.g., tgen uses libigraph, which links against liblapack, which links against blas.
libopenblas, when compiled with pthread support, uses busy-loops in its worker threads.
There are several known workarounds:
-
Use Shadow's
--model-unblocked-syscall-latency
feature. See busy-loops for details and caveats. -
Use a different implementation of libblas. e.g. on Ubuntu, there are several alternative packages that can provide libblas. In particular, libblas3 doesn't have this issue.
-
Install libopenblas compiled without pthread support. e.g. on Ubuntu this can be obtained by installing libopenblas0-serial instead of libopenblas0-pthread.
-
Configure libopenblas to not use threads at runtime. This can be done by setting the environment variable
OPENBLAS_NUM_THREADS=1
, in the process's environment attribute in the Shadow config. Example: tor-minimal.yaml:109
See also:
cURL
Example
general:
stop_time: 10s
model_unblocked_syscall_latency: true
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: python3
args: -m http.server 80
start_time: 0s
expected_final_state: running
client1: &client_host
network_node_id: 0
processes:
- path: curl
args: -s server
start_time: 2s
client2: *client_host
client3: *client_host
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/curl.1000.stdout
Notes
- Older versions of cURL use a busy loop that is incompatible with Shadow and
will cause Shadow to deadlock.
model_unblocked_syscall_latency
works around this (see busy-loops). Newer versions of cURL, such as the version provided in Ubuntu 20.04, don't have this issue. See issue #1794 for details.
Wget2
Example
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: python3
args: -m http.server 80
start_time: 0s
expected_final_state: running
client1: &client_host
network_node_id: 0
processes:
- path: wget2
args: --no-tcp-fastopen server
start_time: 2s
client2: *client_host
client3: *client_host
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/index.html
Notes
- Shadow doesn't support
TCP_FASTOPEN
so you must run Wget2 using the--no-tcp-fastopen
option.
Nginx
Example
shadow.yaml
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: nginx
args: -c ../../../nginx.conf -p .
start_time: 0s
expected_final_state: running
client1: &client_host
network_node_id: 0
processes:
- path: curl
args: -s server
start_time: 2s
client2: *client_host
client3: *client_host
nginx.conf
error_log stderr;
# shadow wants to run nginx in the foreground
daemon off;
# shadow doesn't support some syscalls that nginx uses to set up and control
# worker child processes.
# https://github.com/shadow/shadow/issues/3174
master_process off;
worker_processes 0;
# don't use the system pid file
pid nginx.pid;
events {
# we're not using any workers, so this is the maximum number
# of simultaneous connections we can support
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# shadow does not support sendfile()
sendfile off;
access_log off;
server {
listen 80;
location / {
root /var/www/html;
index index.nginx-debian.html;
}
}
}
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/curl.1000.stdout
Notes
-
Shadow currently doesn't support some syscalls that nginx uses to set up and control worker child processes, so you must disable additional processes using
master_process off
andworker_processes 0
. See https://github.com/shadow/shadow/issues/3174. -
Shadow doesn't support
sendfile()
so you must disable it usingsendfile off
.
iPerf 2
Example
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: iperf
args: -s
start_time: 0s
expected_final_state: running
client:
network_node_id: 0
processes:
- path: iperf
args: -c server -t 5
start_time: 2s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
Notes
- You must use an iPerf 2 version >=
2.1.1
. Older versions of iPerf have a no-syscall busy loop that is incompatible with Shadow.
iPerf 3
Example
general:
stop_time: 10s
model_unblocked_syscall_latency: true
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: iperf3
args: -s --bind 0.0.0.0
start_time: 0s
# Tell shadow to expect this process to still be running at the end of the
# simulation.
expected_final_state: running
client:
network_node_id: 0
processes:
- path: iperf3
args: -c server -t 5
start_time: 2s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
Notes
-
By default iPerf 3 servers bind to an IPv6 address, but Shadow doesn't support IPv6. Instead you need to bind the server to an IPv4 address such as 0.0.0.0.
-
The iPerf 3 server exits with a non-zero error code and the message "unable to start listener for connections: Address already in use" after the client disconnects. This is likely due to Shadow not supporting the
SO_REUSEADDR
socket option. -
iPerf 3 uses a busy loop that is incompatible with Shadow and will cause Shadow to deadlock. A workaround is to use the
model_unblocked_syscall_latency
option.
Jetty
Running Jetty with the http module works, but we haven't tested anything more than this.
Example
shadow.yaml
general:
stop_time: 10s
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: java
args: -jar ../../../jetty-home-12.0.12/start.jar jetty.http.port=80 --modules=http
expected_final_state: running
client1: &client_host
network_node_id: 0
processes:
- path: curl
args: -s server
start_time: 2s
client2: *client_host
client3: *client_host
if [ ! -d jetty-home-12.0.12/ ]; then
wget https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-home/12.0.12/jetty-home-12.0.12.zip
echo "2dc2c60a8a3cb84df64134bed4df1c45598118e9a228604eaeb8b9b42d80bc07 jetty-home-12.0.12.zip" | sha256sum -c
unzip -q jetty-home-12.0.12.zip && rm jetty-home-12.0.12.zip
fi
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/curl.1000.stdout
etcd (distributed key-value store)
Example
Example for etcd version 3.3.x.
general:
stop_time: 30s
network:
graph:
type: gml
inline: |
graph [
node [
id 0
host_bandwidth_down "20 Mbit"
host_bandwidth_up "20 Mbit"
]
edge [
source 0
target 0
latency "150 ms"
packet_loss 0.01
]
]
hosts:
server1:
network_node_id: 0
processes:
- path: etcd
args:
--name server1
--log-output=stdout
--initial-cluster-token etcd-cluster-1
--initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
--listen-client-urls http://0.0.0.0:2379
--advertise-client-urls http://server1:2379
--listen-peer-urls http://0.0.0.0:2380
--initial-advertise-peer-urls http://server1:2380
expected_final_state: running
- path: etcdctl
args: set my-key my-value
start_time: 10s
server2:
network_node_id: 0
processes:
- path: etcd
# each etcd peer must have a different start time
# https://github.com/shadow/shadow/issues/2858
start_time: 1ms
args:
--name server2
--log-output=stdout
--initial-cluster-token etcd-cluster-1
--initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
--listen-client-urls http://0.0.0.0:2379
--advertise-client-urls http://server2:2379
--listen-peer-urls http://0.0.0.0:2380
--initial-advertise-peer-urls http://server2:2380
expected_final_state: running
- path: etcdctl
args: get my-key
start_time: 12s
server3:
network_node_id: 0
processes:
- path: etcd
start_time: 2ms
args:
--name server3
--log-output=stdout
--initial-cluster-token etcd-cluster-1
--initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
--listen-client-urls http://0.0.0.0:2379
--advertise-client-urls http://server3:2379
--listen-peer-urls http://0.0.0.0:2380
--initial-advertise-peer-urls http://server3:2380
expected_final_state: running
- path: etcdctl
args: get my-key
start_time: 12s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/*/etcdctl.*.stdout
Notes
-
The etcd binary must not be statically linked. You can build a dynamically linked version by replacing
CGO_ENABLED=0
withCGO_ENABLED=1
in etcd'sscripts/build.sh
andscripts/build_lib.sh
scripts. The etcd packages included in the Debian and Ubuntu APT repositories are dynamically linked, so they can be used directly. -
Each etcd peer must be started at a different time since etcd uses the current time as an RNG seed. See issue #2858 for details.
-
If using etcd version greater than 3.5.4, you must build etcd from source and comment out the keepalive period assignment as Shadow does not support this.
CTorrent and opentracker
Example
general:
stop_time: 60s
network:
graph:
type: 1_gbit_switch
hosts:
tracker:
network_node_id: 0
processes:
- path: opentracker
# Tell shadow to expect this process to still be running at the end of the
# simulation.
expected_final_state: running
uploader:
network_node_id: 0
processes:
- path: cp
args: ../../../foo .
start_time: 10s
# Create the torrent file
- path: ctorrent
args: -t foo -s example.torrent -u http://tracker:6969/announce
start_time: 11s
# Serve the torrent
- path: ctorrent
args: example.torrent
start_time: 12s
expected_final_state: running
downloader1: &downloader_host
network_node_id: 0
processes:
# Download and share the torrent
- path: ctorrent
args: ../uploader/example.torrent
start_time: 30s
expected_final_state: running
downloader2: *downloader_host
downloader3: *downloader_host
downloader4: *downloader_host
downloader5: *downloader_host
echo "bar" > foo
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/downloader1/foo
Notes
- Shadow must be run as a non-root user since opentracker will attempt to drop privileges if it detects that the effective user is root.
http-server
Example
general:
stop_time: 10s
model_unblocked_syscall_latency: true
network:
graph:
type: 1_gbit_switch
hosts:
server:
network_node_id: 0
processes:
- path: node
args: /usr/local/bin/http-server -p 80 -d
start_time: 3s
expected_final_state: running
client:
network_node_id: 0
processes:
- path: curl
args: -s server
start_time: 5s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client/curl.1000.stdout
Notes
- Either the Node.js runtime or http-server uses a busy loop that is
incompatible with Shadow and will cause Shadow to deadlock.
model_unblocked_syscall_latency
works around this (see busy-loops).
Contributing
Summary:
- contribute changes through pull requests
- encouraged to use issue and discussion posts to notify us beforehand
- changes must include tests
- System call tests (domain-specific and fuzz tests) (required)
- Unit tests (preferred when possible)
- Regression tests (as needed)
- Application tests (as needed)
- pull requests should be easy to review
- changes should be easy to maintain
- new code should be written in Rust
- see our coding guide and best practices for Shadow
New features, bug fixes, documentation changes, etc. can be submitted through a GitHub pull request. For large changes we encourage you to post an issue or discussion before submitting your pull request so that you can make sure your changes fit well with the direction of the project. This is especially applicable to large changes. This way you won't spend time writing a pull request that we can't merge into Shadow. For details about how to draft pull requests and respond to reviewer feedback, see our additional documentation.
All pull requests with new or changed features should contain tests to validate that they work as expected and that they mirror similar behaviour in Linux. If changes or additions are made that affect Shadow's system call support, the pull request must also include system call tests that test the new or changed behaviour. The more tests that you include, the more confident that we'll be that the changes are correct, and the more likely it will be that your changes can be merged. We know that tests aren't very exciting to write, but Shadow relies heavily on tests to catch broken features and discrepancies with Linux. For more information about writing tests for Shadow, see our "Writing Tests" documentation.
Shadow is a community-supported project and the maintainers might not have a lot of time to review pull requests. Submitting pull requests with good documentation, tests, clear commit messages, and concise changes will help the maintainers with their reviews, and also help increase the likelihood that we will be able to merge your changes.
A core principle of Shadow development is that the project should be easy to maintain. This means that we try to reduce the number of dependencies when possible, and when we need to add new dependencies they should be popular well-used dependencies with community support. This also means that it is unlikely that we will add new dependencies for non-rust packages (for example distro packages). Shadow is supported on multiple Linux platforms with different packaging styles (APT and DNF) and different package versions, so distro packages are difficult to support and maintain across all of our supported platforms.
The main Shadow code base currently consists of both Rust and C code. We have been migrating our C code to Rust, but this migration is still in progress. All new code should be written in Rust. This includes the main Shadow application, the shim, and tests. Exceptions may be made for bug fixes or when the change is small and is in existing C code.
While we've been moving Shadow to Rust, we've learned a lot and have changed some designs. This means that the existing Shadow code is not always consistent in the way that it designs features or uses third-party libraries. For best practices and details about writing new code for Shadow, see our coding documentation.
If you have any questions about contributing to Shadow, feel free to ask us by making a new discussion post.
Coding style
Logging
In Rust code, we use the log framework for logging. In C code we use a wrapper library that also uses Rust's log framework internally.
For general guidance on what levels to log at, see log::Level.
Some shadow-specific log level policies:
-
We reserve the
Error
level for situations in which theshadow
process as a whole will exit with a non-zero code. Conversely, whenshadow
exits with a non-zero code, the user should be able to get some idea of what caused it by looking at theError
-level log entries. -
Warning
should be used for messages that ought to be checked by the user before trusting the results of a simulation. For example, we use these in syscall handlers when an unimplemented syscall or option is used, and shadow is forced to return something likeENOTSUP
,EINVAL
orENOSYS
. In such cases the simulation is able to continue, and might still be representative of what would happen on a real Linux system; e.g. libc will often fall back to an older syscall, resulting in minimal impact on the simulated behavior of the managed process.
Clang-format
Our C code formatting style is defined in our
clang-format configuration
file. We try to avoid mass re-formatting, but generally any
lines you modify should be reformatted using clang-format
.
To add Ctrl-k as a "format region" in visual and select modes of vim, add the following to your .vimrc:
vmap <C-K> :py3f /usr/share/vim/addons/syntax/clang-format.py<cr>
Alternatively you can use the git-clang-format tool on the command-line to modify the lines touched by your commits.
Rustfmt
To format your Rust code, run cargo fmt
in the src
directory.
(cd src && cargo fmt)
Clippy
We use Clippy to help
detect errors and non-idiomatic Rust code. You can run clippy
locally with:
(cd src && cargo clippy)
Including headers
Which headers to include
Every source and header file should directly include the headers that export all referenced symbols and macros.
In a C file, includes should be broken into blocks, with the includes sorted alphabetically within each block. The blocks should occur in this order:
- The C file's corresponding header; e.g.
foo.h
forfoo.c
. This enforces that the header is self-contained; i.e. doesn't depend on other headers to be included before it. - System headers are included next to minimize unintentionally exposing any macros we define to them.
- Any other necessary internal headers.
This style is loosely based on that used in glib and supported by the include what you use tool.
Inclusion style
Headers included from within the project should use quote-includes, and should
use paths relative to src/
. e.g. #include "main/utility/byte_queue.h"
, not
#include "byte_queue.h"
(even from within the same directory), and not
#include <main/utility/byte_queue.h>
.
Headers included external to this repository should use angle-bracket includes.
e.g. #include <glib.h>
, not #include "glib.h"
.
Writing Tests
Tests for Shadow generally fall into four categories:
- system call tests
- regression tests
- application tests
- unit tests
Some of these tests may be marked as "extra" tests, which means they are not run by default.
System call tests
Shadow executes real unmodified applications and co-opts them by intercepting and interposing at the system call API. This means that Shadow must try to emulate Linux system calls. Shadow doesn't always need to emulate every system call exactly as Linux does, but it's usually good to try to emulate Linux as closely as possible. When Shadow deviates from Linux behaviour, Shadow is less likely to accurately represent real-world behaviour in its simulation.
When writing new system call handlers or modifying the behaviour of existing ones, it's important to write tests that verify the correctness of the new behaviour. These system call tests are required in pull requests that add to or modify the behaviour of Shadow's system calls. Usually this means that tests are written which execute the system call with a variety of arguments, and we verify that the system call returns the same values in both Linux and Shadow.
These tests fall into two categories: domain-specific system call tests and fuzzing tests. The domain-specific tests should test the system call under a variety of typical use cases, as well as some edge cases (for example passing NULL pointers, negative lengths, etc). The fuzz tests should test many various combinations of the possible argument values. These two types of tests are discussed further below.
Our existing tests are not always consistent in how the tests are organized or designed, so you don't need to follow the exact same design as other tests in the Shadow repository. If you're adding new tests to an existing file, you should try to write the tests in a similar style to the existing tests.
These tests typically use the libc library to test the system calls;
for example libc::listen(fd, 10)
. For the most part the tests assume that the
libc system call wrappers are the same as the kernel system calls themselves,
but this is not always the case. Sometimes they differ and you might want to
make the system call directly (for example the glibc fork()
system call
wrapper usually makes a clone
system call, not a fork
system call), or
there might not be a libc wrapper for the system call that you wish to test
(for example set_tid_address
). In this case you probably want to use the
linux-api library which makes the system call directly without
using a third-party library like glibc. The linux-api library only implements a
handful of system calls, and we've been adding more as we need them. You may
need to add support for the system call you wish to test to linux-api.
These tests are run emulated within Shadow and natively outside of Shadow. This
is done using the CMake add_linux_tests
and add_shadow_tests
macros. The
tests are built by Cargo and then run by CMake. For example the listen
tests
use:
add_linux_tests(BASENAME listen COMMAND sh -c "../../../target/debug/test_listen --libc-passing")
add_shadow_tests(BASENAME listen)
which results in the CMake tests:
1/2 Test #110: listen-shadow .................... Passed 0.56 sec
2/2 Test #109: listen-linux ..................... Passed 10.12 sec
Domain-specific system call tests
Here is an example of an existing test for the
listen
system call:
#![allow(unused)] fn main() { /// Test listen using a backlog of 0. fn test_zero_backlog( domain: libc::c_int, sock_type: libc::c_int, flag: libc::c_int, bind: Option<SockAddr>, ) -> Result<(), String> { let fd = unsafe { libc::socket(domain, sock_type | flag, 0) }; assert!(fd >= 0); if let Some(address) = bind { bind_fd(fd, address); } let args = ListenArguments { fd, backlog: 0 }; let expected_errno = match (domain, sock_type, bind) { (libc::AF_INET, libc::SOCK_STREAM, _) => None, (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, Some(_)) => None, (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, None) => Some(libc::EINVAL), (_, libc::SOCK_DGRAM, _) => Some(libc::EOPNOTSUPP), _ => unimplemented!(), }; test_utils::run_and_close_fds(&[fd], || check_listen_call(&args, expected_errno)) } }
There are many listen
tests including the one above, such as
test_zero_backlog
, test_negative_backlog
, test_large_backlog
,
test_listen_twice
, test_reduced_backlog
, and more.
Fuzz tests
"Fuzz"-style testing is another way we test syscalls: they use some support code to test many various combinations of the possible argument values expected by a syscall, and verify that the return value for each combination of arguments is the same as what Linux returns. Because the developer usually writes these tests to cover most or all possible argument combinations, it ensures that Shadow's emulation of the syscall is highly accurate.
Fuzz tests can be a bit trickier to write, especially for more complicated
syscalls, and sometimes they don't make sense (e.g., when testing what happens
when trying to connect()
to a TCP server with a full accept queue). They
often help us find inconsistent behavior between Shadow and Linux and help us
make Shadow more accurate, so we prefer that fuzz tests are included with pull
requests when possible.
There are some good examples of writing fuzz tests in our time-related test
code in src/test/time
. For example, the
clock_nanosleep
test demonstrates how to test the syscall
with all combinations of its arguments with both valid and invalid values.
Unit tests
Shadow supports unit tests for rust code. These can be written as standard rust unit tests. These tests run natively and not under Shadow, but they are also run under Miri and Loom as "extra" tests.
For example see the IntervalMap
tests.
#![allow(unused)] fn main() { #[cfg(test)] mod tests { use super::*; // ... #[test] fn test_insert_into_empty() { let mut m = IntervalMap::new(); insert_and_validate(&mut m, 10..20, "x", &[], &[(10..20, "x")]); } // ... } }
1/1 Test #1: rust-unit-tests .................. Passed 149.52 sec
Regression tests
Sometimes it's useful to write a regression test that doesn't belong under any
specific system call tests. These tests can be written like the system call
tests above, but are stored in the src/test/regression/
directory.
Application tests
It's often useful to test that applications behave correctly in Shadow. These tests do not replace the need for the system call tests above, but can complement them. For example we have tor and tgen tests. These help prevent regressions where we accidentally break Tor behaviour.
We also run our examples as tests. These examples include those in our documentation (for example see the "getting started" example) as well as other application examples.
Extra tests
Any of the tests above may be configured as an "extra" test. These tests are
not run by default and require that Shadow is built and tested using the
--extra
flag.
./setup build --test --extra
./setup test --extra
These are usually tests that require extra dependencies, tests which take a long time to build or run, or tests which might be difficult to maintain. These tests may be removed at any time if they become difficult to maintain or they update to require features that Shadow doesn't or can't support. An example could be if an application is using epoll, but then updates to use io_uring which Shadow doesn't support (and would take a lot of effort to support), we would need to remove the test.
Extra tests currently run in the CI environment but only under a single platform, so they're not as well tested as non-"extra" tests.
Pull requests (PRs)
Clean commits
Ideally, every commit in history of the main
branch should:
- Be a focused, self-contained change.
- Have a commit message that summarizes the change and explains why the change is being made, if not self-evident.
- Build (
./setup build --test
). - Pass tests (
./setup test
).
Drafting a PR
PRs should be split into smaller, more focused, changes when feasible.
However, we also want to avoid polluting the history with commits that don't
build or pass tests, or commits within a single PR that fix a mistake earlier in
the PR. While iterating on the PR, the --fixup
and
--squash
flags are useful for committing changes that should ultimately be
merged with one of the earlier commits.
When creating a pull request, we suggest you first create it as a draft. This will still trigger our continuous-integration checks, and give you a chance resolve any issues with those (i.e. broken tests) before requesting review.
Once done iterating, first consider using git rebase -i --autosquash
to clean
up your commit history, and then force pushing to update your PR. Finally, take
the pull request out of draft mode to signal that you're ready for review.
Responding to review feedback
During PR review, please do not rebase or force-push, since this makes it
difficult to see what's changed between rounds of review. Consider using
--fixup
and --squash
for commits responding to review feedback, so that they
can be appropriately squashed before the final merge. git autofixup can also be useful for generating
--fixup
commits.
Merging
When the PR is ready to be merged, the reviewer might ask you to git rebase
and force push to clean up history, or might do it themselves.
For the maintainer doing the merge:
If the PR is relatively small, or if it's not worth the effort of rewriting history into clean commits, use the "squash and merge" strategy.
If the individual commits appear to be useful to keep around in our history,
instead use the "create a merge commit" strategy. There's no need to review
every individual commit when using this strategy, but if the intermediate
commits are obviously low quality consider using the "squash and merge strategy"
instead. Note that since this strategy creates a merge commit, we can still
later identify and filter out the intermediate commits if desired, e.g. with
git log --first-parent main
.
We've disabled the "Rebase and merge" option, since it does a fast-forward merge, which makes the intermediate commits indistingishuable from the validated and reviewed final state of the PR.
A common task is to rebase a PR on main so that it is up to date, perhaps fix
some conflicts or add some changes to the PR, and then push the updated branch
to test it in the CI before merging. Suppose a user contributor
submitted a
branch bugfix
as PR 1234
, and has allowed the maintainers to update the PR.
Then you could fetch the branch to perform work on the PR locally:
git fetch origin pull/1234/head:pr-1234
git checkout pr-1234
git rebase main
<fix conflicts or commit other changes>
git push -f git@github.com:contributor/shadow.git pr-1234:bugfix
git checkout main
git branch -D pr-1234
If it passes the tests, you can merge the PR in the Github interface as usual.
Coding
Building the guide
cargo install mdbook
(cd mdbook && mdbook build)
firefox build/guide/index.html
Building the rust docs
(cd src && cargo doc --workspace --exclude shadow-tests)
Generating compiler command database
Many tools benefit from a compiler command
database, conventionally in a
file called compile_commands.json
. If shadow's setup
script finds the
bear tool on your PATH
, it will
automatically use it to create and update build/compile_commands.json
when
running setup build
.
Files and descriptors
Shadow currently has two ways of simulating descriptors. The first is
LegacyDescriptor
which is written in C and is used for
most descriptor/file types (IP sockets, epoll, files, etc). With this type, the
epoll file / posix description and its descriptor live in the same object. The
second way of simulating descriptors is in Rust, where we have a File
type that can be referenced by many Descriptor
objects. This
allows us to easily implement dup()
for descriptors implemented with
this new code. Our plan is to move existing legacy descriptors over to these
new Rust file types.
Platform (libc and Linux) crates
We use several Rust crates for accessing platform functionality and definitions. Roughly from lowest-level to highest-level:
-
Our
linux-api
crate provides fairly low-level bindings over the Linux kernel headers, and a fewnix
-style higher-level wrappers. It does not depend onstd
orlibc
. It also re-exports these definitions as a C library that can be used without creating conflicts with libc headers or linux system headers. Use this when working with the syscall ABI (such as when implementing syscall handlers), for internal parameters and state that are likely to interact with the syscall ABI (such as file states), and for making syscalls when none of the higher-level crates are suitable (see below). -
libc
provides fairly low-level bindings of system libc standard headers. If you need syscall-level ABI-compatibility, uselinux-api
instead. If you don't, prefer one of the higher-level crates. -
nix
provides a safer and more Rust-idiomatic layer on top of thelibc
crate, as well as adapters for underlyinglibc
definitions. There's currently a lot of usage of this in Shadow, but we're working on moving away from it (see #3345). In most scenarios, one of the other crates mentioned here is a more appropriate choice. -
rustix
provides a similar API tonix
, but can be configured not to depend onstd
orlibc
. This is useful in code that's linked into Shadow's shim, where we don't want to depend onstd
orlibc
. -
Rust's
std
crate provides, well, the standard way of interacting with the platform, in a portable and Rust-idiomatic way. This is generally the right choice for code that doesn't run in Shadow's shim, in places we're not concerned about the precise syscalls that get executed.
When choosing which one to use:
-
For code that will be linked into shadow's shim, prefer
rustix
. In cases whererustix
doesn't provide the desired functionality, or in C code, or when we need precise control over what syscall is made with what parameters, uselinux-api
.We want to minimize, and ideally eliminate, usage of
libc
from the shim.libc
has global state that can easily become corrupted when we use it from the shim, which isLD_PRELOAD
ed into managed programs. This is especially because much of the shim executes in the context ofSIGSYS
signal handlers, meaning we might already be in a non-reentrant, non-async-signal-safe libc function higher in the stack. See also https://github.com/shadow/shadow/milestone/54. -
For shadow's syscall handler implementations, prefer
linux-api
.Since we are intercepting and implementing at the syscall level, the interface we are implementing is the Linux syscall ABI interface. Therefore we should be careful to use Linux's definitions for the parameters and return values. While types and constants in libc are often equivalent to kernel types and constants with the same names, there are many known cases where they aren't, and in general there's no guarantee even that one that is consistent today will remain consistent tomorrow. See also https://github.com/shadow/shadow/issues/3007.
This also applies when implementing a syscall by delegating to the host system. For example suppose we implement a
fcntl
syscall by by making a nativefcntl
syscall on the native file descriptor. Making the syscall directly is the most straightforward way to "pass through" exactly the original intended semantics. If we use a higher level interface, evenlibc
, we have to be careful about translating the parameters and return values back and forth between the two different API layers. -
For code that runs in the shadow process, where we are acting as a "normal" program that wants to interact with the kernel, generally prefer the highest-level interface that provides the necessary functionality. e.g. when creating worker threads in Rust, we generally use
std::thread
; there's no reason to use one of the lower level crates. Occasionally we need some functionality not provided instd
though, in which case it makes sense to drop down to one of the lower level crates. -
In tests, any of the above can make sense. In places we're specifically trying to test shadow's emulation of some functionality, making direct syscalls, e.g. with the
linux-api
crate orlibc
'ssyscall
function, is the most direct and precise approach. On the other hand, we often want to test higher level interfaces as a form of integration testing, since those are more typically what managed programs use. We usually focus on testing at thelibc
interface, since most managed programs use that interface, and it's low-level enough to be able to control and understand what's happening at the syscall level. For incidental system functionality in tests (e.g. creating a temp file, in a test that isn't specifically trying to test that functionality) it usually makes sense to use whatever interface is most idiomatic and convenient.
deny(unsafe_op_in_unsafe_fn)
All crates should use #![deny(unsafe_op_in_unsafe_fn)]
. When adding a new
crate, remember to add this to the lib.rs
or main.rs
.
[https://github.com/rust-lang/rfcs/blob/master/text/2585-unsafe-block-in-unsafe-fn.md]
No longer treat the body of an unsafe fn as being an unsafe block. To avoid a breaking change, this is a warning now and may become an error in a future edition.
This helps make it clearer where "unsafe" code is being used and can make reviewing code easier.
Debugging
Debugging the Shadow process
Shadow is currently built with debugging symbols in both debug and release
builds, though it may be easier to debug a debug build (generated by passing
the --debug
flag to setup build
).
Shadow can be run under GDB by prepending gdb --args
to its command-line.
e.g.:
gdb --args shadow shadow.yaml
An alternative is to run Shadow with the --gdb
flag, which will pause shadow
after startup and print its PID. You can then simply attach GDB to Shadow in a
new terminal and continue the experiment.
Example:
# terminal 1
# shadow will print its PID and pause
$ shadow --gdb shadow.yaml > shadow.log
** Starting Shadow
** Pausing with SIGTSTP to enable debugger attachment (pid 1234)
# terminal 2
$ gdb --pid=1234
> continue
Troubleshooting
If when loading the shadow binary in gdb you see the error:
Reading symbols from /my/binary...
Dwarf Error: DW_FORM_strx1 found in non-DWO CU [in module /my/binary]
(No debugging symbols found in /my/binary)
It's likely that the version of Rust that you're building Shadow with is incompatible with your version of GDB. You can read more at Rust issue #98746.
Debugging managed processes
A simulation's managed processes are implemented as native OS processes, with
their syscalls interposed by Shadow. Since they are native processes, many
normal tools for inspecting native processes can be used on those as well. e.g.
top
will show how much CPU and memory they are using.
Generating a core file
If a managed process is crashing, it is sometimes easiest to let the native process to generate a core file, and then use GDB to inspect it afterwards.
# Enable core dumps.
ulimit -c unlimited
# Ensure core dumps are written to a file.
# e.g. this is sometimes needed in Ubuntu to override the default behavior of
# piping the core file to the system crash handler.
echo core | sudo tee /proc/sys/kernel/core_pattern
# Run the simulation in which a process is crashing.
shadow shadow.yaml
# Tell gdb to inspect the core file. From within gdb you'll be able to
# inspect the state of the process when it was killed.
gdb <path-to-process-executable> <path-to-core-file>
Attaching with GDB
You can attach GDB directly to the managed process. To make this easier you can
use the --debug-hosts
option to pause Shadow after it launches each managed
process on the given hosts. Shadow will print the native process' PID before
stopping. For example, --debug-hosts client,server
will pause Shadow after
launching any managed processees on hosts "client" and "server". This allows
you to attach GDB directly to those managed processes before resuming Shadow.
# terminal 1
$ shadow --debug-hosts client,server shadow.yaml > shadow.log
** Starting Shadow
** Pausing with SIGTSTP to enable debugger attachment to managed process 'server.nginx.1000' (pid 1234)
** If running Shadow under Bash, resume Shadow by pressing Ctrl-Z to background this task and then typing "fg".
** (If you wish to kill Shadow, type "kill %%" instead.)
** If running Shadow under GDB, resume Shadow by typing "signal SIGCONT".
# terminal 2
$ gdb --pid=1234
Debugging with GDB
In managed processes, Shadow uses SIGSYS
and SIGSEGV
to intercept system
calls and some CPU instructions. By default, GDB stops every time these signals
are raised. In most cases you'll want to override this behavior to silently
continue executing instead:
(gdb) handle SIGSYS noprint
(gdb) handle SIGSEGV noprint
Once you have reached a point of interest, it's often useful to look at the backtrace for the current stack:
(gdb) bt
In multi-threaded applications, you can get a backtrace for all stacks:
(gdb) thread apply all bt
Profiling
Profiling can be useful for improving the performance of experiments, either as improvements to the implementation of Shadow itself, or in altering the configuration of the experiments you are running.
Profiling with top
/htop
Tools like top
and htop
will give good first-order approximations for what
Shadow is doing. While they can only give system-wide to thread-level
granularity, this can often still tell you important details such as whether
Shadow, the simulated processes, or the kernel are consuming memory and
processor cycles. E.g., if you're running into memory constraints, the RES
or
MEM
column of these tools can tell you where to start looking for ways to
address that. If execution time is too long, sorting by CPU
or TIME
can
provide insight into where that time is being spent.
One limitation to note is that Shadow relies on spinlocks in barriers for some of its operation. Especially when running with many threads, these spinlocks will show as consuming most of the CPU anytime the simulation is bottlenecked on few simulated processes. Telling when this is happening can be difficult in these tools, because no symbol information is available.
Profiling with perf
The perf
tool is a powerful interface to the Linux kernel's performance
counter subsystem. See man perf
or the perf
wiki for full details on how
to use it, but some highlights most relevant to Shadow execution time are given
here.
Regardless of how you are using perf
, the aforementioned complication of
spinlocks in Shadow apply. Namely, when there is any bottleneck on the barrier,
the symbols associated with the spinlocks will dominate the sample
counts. Improving the performance of the spinlocks will not improve the
performance of the experiment, but improving the performance of whatever is
causing the bottleneck (likely something towards the top of non-spinlock
symbols) can.
perf top
The perf top
command will likely be the most practical mode of
perf
for profiling all parts of a Shadow experiment. It requires one
of: root access, appropriately set up Linux capabilities, or a system
configured to allow performance monitoring (similar to attaching to
processes with gdb
), so isn't always available, but is very simple
when it is. The interface is similar to top
's, but provides
information on the granularity of symbols, across the entire
system. This means you will be able to tell which specific functions
in Shadow, the simulated processes, and the kernel are consuming CPU
time.
When perf top
can't find symbol information for a process, it will display
the offset of the instruction as hex instead. (Note this means it will be
ranked by instruction, rather than the entire function.) If you know where the
respective executable or shared object file is, you can look up the name of the
symbol for that instruction's function by opening the file with gdb
and
running info symbol [ADDRESS]
. If gdb
can't find the symbols either, you
can look it up manually using readelf -s
and finding the symbol with the
largest address smaller than the offset you are looking for (note that
readelf
does not output the symbols in order of address; you can pipe the
output to awk '{$1=""; print $0}' | sort
to get a sorted list).
Details on more options (e.g., for filtering the sampled CPUs or processes) can
be found in man perf top
.
perf record
If you know which particular process you wish to profile, perf record
can
give far greater detail than other options. To use it for Shadow, either run it
when starting Shadow:
perf record shadow shadow.config.yaml > shadow.log
Or, attach to a running Shadow process:
perf record -p <PID>
Attaching to a process requires similar permissions as perf top
, but can be
used to profile any process, including the simulated processes launched by
Shadow.
The perf record
process will write a perf.data
file when you press Ctrl-c,
or Shadow ends. You can then analyze the report:
perf report
More details are available in man perf record
and man perf report
.
Testing for Nondeterminism
If you run Shadow twice with the same seed (the -s
or --seed
command line
options), then it should produce deterministic results (it's a bug if it
doesn't).
If you find non-deterministic behavior in your Shadow experiment, please consider helping to diagnose the problem by opening a new issue.
Comparing strace output (experimental)
Shadow has an experimental feature for logging most system calls made by the
managed process in a format similar to the strace tool. You can enable this
with the strace_logging_mode
option. You can compare
this strace log from two simulations to look for non-deterministic behaviour.
To avoid capturing memory addresses and uninitialized memory in the log, you
should use the deterministic
logging mode.
For example, after running two simulations with --strace-logging-mode deterministic
where the results are in the shadow.data.1
and shadow.data.2
directories, you could run something like the following bash script:
#!/bin/bash
found_difference=0
for SUFFIX in \
hosts/fileserver/tgen.1000.strace \
hosts/client/tgen.1000.strace
do
diff --brief shadow.data.1/${SUFFIX} shadow.data.2/${SUFFIX}
exit_code=$?
if (($exit_code != 0)); then
found_difference=1
fi
done
if (($found_difference == 1)); then
echo -e "\033[0;31mDetected difference in output (Shadow may be non-deterministic).\033[0m"
else
echo -e "\033[0;32mDid not detect difference in Shadow output (Shadow may be deterministic).\033[0m"
fi
Comparing application output
A good way to check this is to compare the log output of an application that
was run in Shadow. For example, after running two TGen simulations where the
results are in the shadow.data.1
and shadow.data.2
directories, you could
run something like the following bash script:
#!/bin/bash
found_difference=0
for SUFFIX in \
hosts/fileserver/tgen.1000.stdout \
hosts/client/tgen.1000.stdout
do
## ignore memory addresses in log file with `sed 's/0x[0-9a-f]*/HEX/g' FILENAME`
sed -i 's/0x[0-9a-f]*/HEX/g' shadow.data.1/${SUFFIX}
sed -i 's/0x[0-9a-f]*/HEX/g' shadow.data.2/${SUFFIX}
diff --brief shadow.data.1/${SUFFIX} shadow.data.2/${SUFFIX}
exit_code=$?
if (($exit_code != 0)); then
found_difference=1
fi
done
if (($found_difference == 1)); then
echo -e "\033[0;31mDetected difference in output (Shadow may be non-deterministic).\033[0m"
else
echo -e "\033[0;32mDid not detect difference in Shadow output (Shadow may be deterministic).\033[0m"
fi
Extra Tests
Shadow includes tests that require additional dependencies, such as Tor, TGen, networkx, obfs4proxy, and golang. These aren't run by default, but are run as part of the CI tests.
To run them locally, first make sure that both tor and tgen are located on your
shell's PATH
You should also install all of Shadow's optional dependencies.
To run the golang tests you will need to both install golang, and install
a dynamic version of the golang standard library. The latter can be done with
go install -buildmode=shared -linkshared std
.
It is recommended to build Shadow in release mode, otherwise the Tor tests may not complete before the timeout.
./setup build --test --extra
./setup test --extra
# To exclude the TGen and Tor tests (for example if you built Shadow in debug mode)
./setup test --extra -- --label-exclude "tgen|tor"
# To include only the TGen tests
./setup test --extra tgen
# To run a specific TGen test
./setup test --extra tgen-duration-1mbit_300ms-1000streams-shadow
If you change the version of tor located at ~/.local/bin/tor
, make sure to
re-run ./setup build --test
.
Miri
rustup toolchain install nightly
rustup +nightly component add miri
# Disable isolation for some tests that use the current time (Instant::now).
# Disable leak-checking for now. Some tests intentionally panic, causing leaks.
export MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-ignore-leaks"
(cd src && cargo +nightly miri test --workspace)
Continuous integration tests
On GitHub
Our continuous integration tests build and test Shadow on every supported platform and configuration. GitHub runs these tests automatically when making or modifying a pull request, in the build and test workflow. Pull requests without passing integration tests are blocked from merging.
Running locally
We also have scripts for running the continuous integration tests locally, inside Docker containers. This can be useful for debugging and for quickly iterating on a test that's failing in GitHub's test runs.
The run.sh
script builds shadow inside a Docker image, and
runs our tests in it.
By default, the script will attempt to use a Docker image with already shadow
built, perform an incremental build on top of that, and then run shadow's tests.
If you don't already have a local image, the script will implicitly try to pull
from the shadowsim/shadow-ci on
dockerhub. You can override this repo with -r
or force the script to build a
new image locally with -i
.
For example, to perform an incremental build and test on ubuntu 24.04, with the gcc compiler in debug mode:
ci/run.sh -c ubuntu:24.04 -C gcc -b debug
If the tests fail, shadow's build directory, including test outputs, will be copied
from the ephemeral Docker container into ci/build
.
For additional options, run ci/run.sh -h
.
Debugging locally
After a local run fails, you can use Docker to help debug it. If you previously
ran the tests without the -i
option, re-run with the -i
option to rebuild
the Docker image(s). If Shadow was built successfully and the failure happened
at the testing step, then the Docker image was built and tagged, and you can
run an interactive shell in a container built from that image.
e.g.:
docker run --shm-size=1024g --security-opt=seccomp=unconfined -it shadowsim/shadow-ci:ubuntu-24.04-gcc-debug /bin/bash
If the failure happened in the middle of building the Docker image, you can do the same with the last intermediate layer that was built successfully. e.g. given the output:
$ ci/run.sh -i -c ubuntu:24.04 -C gcc -b debug
<snip>
Step 13/13 : RUN . ci/container_scripts/build_and_install.sh
---> Running in a11c4a554ef8
<snip>
516 [ERROR] Non - zero return code from make.
You can start a container from the image where Docker tried (and failed) to run
ci/the build_and_install.sh
script was executed with:
docker run --shm-size=1024g --security-opt=seccomp=unconfined -it a11c4a554ef8 /bin/bash
Maintainer playbook
Tagging Shadow releases
Before creating a new release, be sure to handle all issues in its GitHub Project. Issues that can wait until the next release can be moved to the next release's project (which you may need to create). Remaining issues should be resolved before continuing with the release process.
We use Semantic Versioning, and increment version numbers with the bumpversion tool.
The following commands can be used to tag a new version of Shadow, after which an archive will be available on github's releases page.
Install bumpversion if needed:
python3 -m venv bumpenv
source bumpenv/bin/activate
pip install -U pip
pip install bumpversion
Make sure main is up to date:
git checkout main
git pull
The bumpversion command is run like this (it is recommended to add
--dry-run --verbose
until you are confident in the result):
bumpversion --dry-run --verbose <major|minor|patch|release|build>
Decide which part of the version you are bumping. Our format is
{major}.{minor}.{patch}-{release}.{build}
. Bumping earlier parts of the
version will cause later parts to get reset to 0 (or 'pre' for the release
part). For example, if you are at 2.0.0
, going to 2.1.0-pre
is easy:
bumpversion minor --tag --commit
In the above case, we can just tag and commit immediately. But if you are going
from 2.0.0
to 2.1.0
, you'll need to either run twice (first to bump the
minor from 0 to 2, then to bump the release from 'pre' to the invisible
'stable'):
bumpversion minor
bumpversion --allow-dirty release --commit --tag
or use the serialize option to specify the intended format of the next version:
bumpversion minor --serialize '{major}.{minor}.{patch}' --commit --tag
Now check that things worked and get the new version number:
git log -1 --stat
git describe --tags
VERSION=`awk -F "=" '/current_version/ {print $2}' .bumpversion.cfg | tr -d ' '`
Update the Cargo lock file, then ammend the commit and tag to include the update
(closely check and update the Bump version: from → to
messages as needed):
(cd src && cargo update --workspace)
git add src/Cargo.lock
git commit --amend
git tag -f -a "v$VERSION"
Check again:
git log -1 --stat
git describe --tags
Now if everything looks good, push to GitHub:
git push origin "v$VERSION"
Our releases will then be tagged off of the main branch.
You probably want to also reset the CHANGELOG.md
file in a new commit after
tagging/pushing the release.
Format of Shadow Log Messages
❗ Warning |
---|
The format of the log messages is not stable and may change at any time. |
Log Line Prefix
Shadow produces simulator log messages in the following format:
real-time [thread-id:thread-name] virtual-time [loglevel] [hostname:ip] [src-file:line-number] [function-name] MESSAGE
real-time
:
the wall clock time since the start of the experiment, represented ashours:minutes:seconds
thread-id
:
the thread id (as returned bygettid
) of the system thread that generated the message.thread-name
:
the name of the system thread that generated the messagevirtual-time
:
the simulated time since the start of the experiment, represented ashours:minutes:seconds
loglevel
:
one ofERROR
<WARN
<INFO
<DEBUG
<TRACE
, in that orderhostname
:
the name of the host as specified inhosts.<hostname>
of the simulation configip
:
the IP address of the host as specified inhosts.<hostname>.ip_address_hint
of the simulation config, or a random IP address if one is not specifiedsrc-file
:
the name of the source code file where the message is loggedline-number
:
the line number in the source code file where the message is loggedfunction-name
:
the name of the function logging the messageMESSAGE
:
the actual message to be logged
By default, Shadow only prints core messages at or below the info
log
level. This behavior can be changed
using the Shadow option -l
or --log-level
to increase or decrease the
verbosity of the output. As mentioned in the example from the previous section,
the output from each application process is stored in separate log files beneath
the shadow.data
directory, and the format of those log files is
application-specific (i.e., Shadow writes application output directly to
file).
Heartbeat Messages
Shadow logs simulator heartbeat messages that contain useful system information
for each virtual node in the experiment, in messages containing the string
shadow-heartbeat
. By default, these heartbeats are logged once per second, but
the frequency can be changed using the --heartbeat-frequency
option to Shadow
(see shadow --help
).
There are currently three heartbeat statistic
subsystems: node
,
socket
, and ram
. For each subsystem that is enabled, Shadow will print a
'header' message followed by regular message every frequency interval. The
'header' messages generally describe the statistics that are printed in the
regular messages for that subsystem.
The following are examples of the statistics that are available for each subsystem:
Node:
[node-header] interval-seconds,recv-bytes,send-bytes,cpu-percent,delayed-count,avgdelay-milliseconds;inbound-localhost-counters;outbound-localhost-counters;inbound-remote-counters;outbound-remote-counters where counters are: packets-total,bytes-total,packets-control,bytes-control-header,packets-control-retrans,bytes-control-header-retrans,packets-data,bytes-data-header,bytes-data-payload,packets-data-retrans,bytes-data-header-retrans,bytes-data-payload-retrans
Socket:
[socket-header] descriptor-number,protocol-string,hostname:port-peer;inbuflen-bytes,inbufsize-bytes,outbuflen-bytes,outbufsize-bytes;recv-bytes,send-bytes;inbound-localhost-counters;outbound-localhost-counters;inbound-remote-counters;outbound-remote-counters|...where counters are: packets-total,bytes-total,packets-control,bytes-control-header,packets-control-retrans,bytes-control-header-retrans,packets-data,bytes-data-header,bytes-data-payload,packets-data-retrans,bytes-data-header-retrans,bytes-data-payload-retrans
Ram:
[ram-header] interval-seconds,alloc-bytes,dealloc-bytes,total-bytes,pointers-count,failfree-count
Parsing Shadow Log Messages
❗ Warning |
---|
The heartbeat/tracker log messages are considered experimental and may change or be removed at any time. |
Shadow logs simulator heartbeat messages that contain useful system information
for each virtual host in the experiment. For example, Shadow logs the number of
bytes sent/received, number of bytes allocated/deallocated, CPU usage, etc. You
can parse these heartbeat log messages to get insight into the simulation.
Details of these heartbeat messages can be found
here, and they can be enabled by setting the
experimental
experimental.host_heartbeat_interval
configuration option.
Example Simulation Data
The methods we describe below can be used on the output from and Shadow simulation. Here, we use the output from the Traffic Generation example simulation for illustrative purposes.
Parsing and Plotting Results
Shadow includes some Python scripts that can parse important statistics from the Shadow log messages, including network throughput over time, client download statistics, and client load statistics, and then visualize the results. The following will parse and plot the output produced from the above experiment:
# parse the shadow output file
src/tools/parse-shadow.py --help
src/tools/parse-shadow.py --prefix results shadow.log
# plot the results
src/tools/plot-shadow.py --help
src/tools/plot-shadow.py --data results "example-plots"
The parse-*.py
scripts generate stats.*.json.xz
files. The (heavily trimmed)
contents of stats.shadow.json
look like the following:
$ xzcat results/stats.shadow.json.xz
{
"nodes": {
"client:11.0.0.1": {
"recv": {
"bytes_control_header": {
"0": 0,
"1": 0,
"2": 0,
...
"599": 0
},
"bytes_control_header_retrans": { ... },
"bytes_data_header": { ... },
"bytes_data_header_retrans": { ... },
"bytes_data_payload": { ... },
"bytes_data_payload_retrans": { ... },
"bytes_total": { ... }
},
"send": { ... }
},
"server:11.0.0.2": { ... }
},
"ticks": {
"2": {
"maxrss_gib": 0.162216,
"time_seconds": 0.070114
},
"3": { ... },
...
"599": { ... }
}
}
The plot-*.py
scripts generate graphs. Open the PDF file that was created to
see the graphed results.
Comparing Data from Multiple Simulations
Consider a set of experiments where we would like to analyze the effect of changing our hosts' socket receive buffer sizes. We run the following 2 experiments:
# delete any existing simulation data and post-processing
rm -rf shadow.{data,log} 10KiB.{data,results,log} 100KiB.{data,results,log} *.results.pdf
shadow --socket-recv-buffer 10KiB --socket-recv-autotune false \
--data-directory 10KiB.data shadow.yaml > 10KiB.log
shadow --socket-recv-buffer 100KiB --socket-recv-autotune false \
--data-directory 100KiB.data shadow.yaml > 100KiB.log
To parse these log files, we use the following scripts:
src/tools/parse-shadow.py --prefix=10KiB.results 10KiB.log
src/tools/parse-shadow.py --prefix=100KiB.results 100KiB.log
Each of the directories 10KiB.results/
and 100KiB.results/
now contain data
statistics files extracted from the log files. We can now combine and visualize
these results with the plot-shadow.py
script:
src/tools/plot-shadow.py --prefix "recv-buffer" --data 10KiB.results/ "10 KiB" --data 100KiB.results/ "100 KiB"
Open the PDF file that was created to compare results from the experiments.
National Science Foundation Sponsorship
Project Title: Expanding Research Frontiers with a Next-Generation Anonymous Communication Experimentation (ACE) Framework
Project Period: October 1, 2019 - September 30, 2022 2023 (extended)
Abstract: NSF Award Abstract #1925497
The goal of this project is to develop a scalable and mature deterministic network simulator, capable of quickly and accurately simulating large networks such as Tor. This project builds on the Shadow Simulator.
NSF Project Overview
ACE will be developed with the following features:
- Application Emulation. Learning from the community’s experience, ACE will directly execute software and run applications as normal operating system processes. By supporting the general execution of applications (i.e., anything that can be executed as a process: network servers, web browsers, scripts, etc.), ACE will support software independent of the programming language chosen by developers, and ACE will maximize its applicability to a large range of evaluation approaches that CISE researchers choose to utilize. As a result, ACE will be well-suited to website fingerprinting and censorship circumvention research focus areas, which typically require running a variety of tools written in a variety of languages.
- Network Simulation. ACE will feature a light-weight network simulation component that will allow applications to communicate with each other through the ACE framework rather than over the Internet. ACE will simulate common transport protocols, such as TCP and UDP. ACE will also simulate virtual network routers and other network path components between end-hosts, and support evaluation under dynamic changes to timing, congestion, latency, bandwidth, network location, and network path elements. Therefore, ACE will support both network-aware and location-aware anonymous communication research and allow researchers to continue to advance this research agenda in current and future Internet architectures.
- Function Interposition. ACE will utilize function interposition in order to connect the processes being run by the operating system to the core network simulation component. ACE will support an API of common system calls that are used to, e.g., send and receive data to and from the network. Therefore, all processes executed in ACE will be isolated from the Internet and connected through ACE’s simulated network, and the simulation component will drive process execution.
- Controlled, Deterministic Execution. ACE features a deterministic discrete-event engine, and will therefore control time and operate in simulated timescales. As a result, ACE will be disconnected from the computational abilities of the host machine: ACE will run as-fast-as-possible, which could be faster or slower than real time depending on experiment load. ACE is deterministic so that research results can be independently and identically reproduced and verified across research labs.
- Parallel and Distributed Execution. ACE will rely on the operating system kernel to run and manage processes. Operating system kernels have been optimized for this task, and ACE will benefit in terms of better performance and a smaller code base. Moreover, ACE will be embarrassingly parallel: the Linux kernel generally scales to millions of processes that can be run in parallel, and we will design ACE such that any number of processes can be executed across multiple distinct machines. Therefore, ACE will scale to realistically-sized anonymous communication networks containing millions of virtual hosts, and can be deployed on whatever existing infrastructure is available at community members' institutions.
As part of the ACE framework, we will also develop a user interface to control and monitor the experimental process, a toolkit to help users set up and configure experiments (including network, mobility, and traffic characteristics and models) and to visualize results, and a data repository where researchers can share and archive experimental results.
Project Goals/Activities
Here we outline some high level tasks that we are completing or plan to complete under this project. We are using Github for project development, including for tracking progress on major milestones and development tasks. We provide an outline of our agenda here, and link to the appropriate Github page where appropriate. Tasks without corresponding Github links means we don't yet have progress to share at this time.
-
Task 0: Investigate Architectural Improvements
- Build prototype of a process-based simulation architecture - milestone
- Evaluate and compare against a plugin-based simulation architecture
- Decide which architecture is right for ACE
-
Task 1: Develop Core ACE System
- Improve test coverage and infrastructure - shadow milestone, shadow-plugin-tor milestone
- Enable new code to be written in Rust - milestone
- Improve consistency of simulation options and configuration
- Improve maintainability and accuracy of TCP implementation - milestone
- Simplify event scheduler, implement continuous event execution model
- Build a distributed core simulation engine
- Develop CPU usage model to ensure virtual process CPU utilization consumes simulation time
-
Task 2: Develop User Interface and Visualizations
- Design control protocol and API for interacting with Shadow
- Specify/document protocol
- Develop user interface that uses the control API
- Improve tools for analyzing and understanding simulation results
-
Task 3: Develop Simulation Models for ACE
- Improve tools for generating and configuring private Tor networks
- Improve tools for generating and configuring background traffic models
- Improve tools for modeling Internet paths and latency
- Develop support for mobile hosts
- Create realistic host mobility models
-
Task 4: Engage Community
- Create data repository where users can share configs and results
- Create user outreach material and surveys to collect feedback
- Improve user documentation and usage instructions
Over all tasks, we plan to significantly improve documentation, test coverage, and code maintainability.
People
- Rob Jansen - Project Leader, Principal Investigator, U.S. Naval Research Laboratory
- Roger Dingledine - Principal Investigator, The Tor Project
- Micah Sherr - Principal Investigator, Georgetown University
- Jim Newsome - Developer, The Tor Project
- Steven Engler - Developer, Georgetown University / The Tor Project