The Shadow Simulator

What is Shadow?

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.

Shadow experiments can be scientifically controlled and deterministically replicated, making it easier for you to reproduce bugs and eliminate confounding factors in your experiments.

How Does Shadow Work?

Shadow directly executes real applications:

  • Shadow directly executes unmodified, real application code using native OS (Linux) processes.
  • Shadow co-opts the native processes into a discrete-event simulation by interposing at the system call API.
  • The necessary system calls are emulated such that the applications need not be aware that they are running in a Shadow simulation.

Shadow connects the applications in a simulated network:

  • Shadow constructs a private, virtual network through which the managed processes can communicate.
  • Shadow internally implements simulated versions of common network protocols (e.g., TCP and UDP).
  • Shadow internally models network routing characteristics (e.g., path latency and packet loss) using a configurable network graph.

Why is Shadow Needed?

Network emulators (e.g., mininet) run real application code on top of real OS kernels in real time, but are non-determinsitic and have limited scalability: time distortion can occur if emulated processes exceed an unknown computational threshold, leading to undefined behavior.

Network simulators (e.g., ns-3) offer more experimental control and scalability, but have limited application-layer realism because they run application abstractions in place of real application code.

Shadow offers a novel, hybrid emulation/simulation architecture: it directly executes real applications as native OS processes in order to faithfully reproduce application-layer behavior while also co-opting the processes into a high-performance network simulation that can scale to large distributed systems with hundreds of thousands of processes.

Caveats

Shadow implements over 150 functions from the system call API, but does not yet fully support all API features. Although applications that make basic use of the supported system calls should work out of the box, those that use more complex features or functions may not yet function correctly when running in Shadow. Extending support for the API is a work-in-progress.

That being said, we are particularly motivated to run large-scale Tor Network simulations. This use-case is already fairly well-supported and we are eager to continue extending support for it.

More Information

Homepage:

Documentation:

Community Support:

Bug Reports:

Shadow Design Overview

Shadow is a multi-threaded network experimentation tool that is designed as a hybrid between simulation and emulation architectures: it directly executes applications as Linux processes, but runs them in the context of a discrete-event network simulation.

Shadow's version 2 design is summarized in the following sections. Please see the end of this document for references to published design articles with more details.

Executing Applications

Shadow directly executes real, unmodified application binaries natively in Linux as standard OS processes (using vfork() and execvpe()): we call these processes executed by Shadow managed processes. When executing each managed process, Shadow dynamically injects a shim library using preloading (via the LD_PRELOAD environment variable) and establishes an inter-process control channel using shared memory and semaphores. The control channel enables Shadow to exchange messages with the shim and to instruct the shim to perform actions in the managed process space.

Intercepting System Calls

The shim co-opts each running managed process into the simulation environment by intercepting all system calls they make rather than allowing them to be handled by the Linux kernel. System call interception happens through two methods: first via preloading and second via a seccomp filter.

  • Preloading: Because the shim is preloaded, the shim will be the first library that is searched when attempting to dynamically resolve symbols. We use the shim to override functions in other shared libraries (e.g., system call wrapper functions from libc) by supplying identically named functions with alternative implementations inside the shim. Note that preloading works on dynamically linked function calls (e.g., to libc system call wrappers), but not on statically linked function calls (e.g. those made from inside of libc) or system calls made using a syscall instruction.

  • seccomp: System calls that are not interceptable via preloading are intercepted using the kernel's seccomp facility. The shim of each managed process installs a seccomp filter that traps all system calls (except those made from the shim) and a handler function to handle the trapped system calls. This facility has a very small overhead because it involves running the installed filter in kernel mode, but we infrequently incur this overhead in practice since most system calls are interceptable via the more efficient preloading method.

Emulating System Calls

System calls that are intercepted by the shim (using either preloading or seccomp) are emulated by Shadow. Hot-path system calls (e.g., time-related system calls) are handled directly in the shim by using state that is stored in shared memory. Other system calls are sent from the shim to Shadow via the control channel and handled in Shadow (the shim sends the system call number and argument registers). While the shim is waiting for a system call to be serviced by Shadow, the managed process is blocked; this allows Shadow to precisely control the running state of each process.

Shadow emulates system calls using its simulated kernel. The simulated kernel (re)implements (i.e., simulates) important system functionality, including: the passage of time; input and output operations on file, socket, pipe, timer, and event descriptors; signals; packet transmissions with respect to transport layer protocols such as TCP and UDP; and aspects of computer networking including routing, queuing, and bandwidth limits. Thus, Shadow establishes a private, simulated network environment that is completely isolated from the real network, but is internally interoperable and entirely controllable.

Care is taken to ensure that all random bytes that are needed during the simulation are initiated from a seeded pseudorandom source, including during the emulation of system calls such as getrandom() and when emulating reads from files like /dev/*random. This enables Shadow to produce deterministic simulations, i.e., running a simulation twice using the same inputs and the same seed should produce the same sequence of operations in the managed process.

Managing Memory

Some system calls pass dynamically allocated memory addresses (e.g., the buffer address in the sendto() system call). To handle this system call in Shadow, this shim sends the buffer address but not the buffer contents to Shadow. Shadow uses an inter-process memory access manager to directly and efficiently read and write the memory of each managed process without extraneous data copies or control messages. Briefly, the memory manager (re)maps the memory of each managed process into a shared memory file that is accessible by both Shadow and the managed process. When Shadow needs to copy data from a memory address passed to it by the shim, the memory manager translates the managed process's memory address to a shared memory address and brokers requested data copies. This approach minimizes the number of data copies and system calls needed to transfer the buffer contents from the managed process to Shadow.

Scheduling

Shadow is designed to be high performance: it uses a thread for every virtual host configured in an experiment while only allowing a number of threads equal to the number of available CPU cores to run in parallel to avoid performance degradation caused by CPU oversubscription. Work stealing is used to ensure that each core is always running a worker thread as long as remaining work exists. Shadow also effectively uses CPU pinning to reduce the frequency of cache misses, CPU migrations, and context switches.

Research

Shadow's design is based on the following published research articles. Please cite our work when using Shadow in your projects.

Shadow version 2 (latest)

This is the latest v2 design described above:

Co-opting Linux Processes for High-Performance Network Simulation
by Rob Jansen, Jim Newsome, and Ryan Wails
in the 2022 USENIX Annual Technical Conference, 2022.

@inproceedings{netsim-atc2022,
  author = {Rob Jansen and Jim Newsome and Ryan Wails},
  title = {Co-opting Linux Processes for High-Performance Network Simulation},
  booktitle = {USENIX Annual Technical Conference},
  year = {2022},
  note = {See also \url{https://netsim-atc2022.github.io}},
}

Shadow version 1 (original)

This is the original v1 design, using plugins loaded into the Shadow process rather than independent processes:

Shadow: Running Tor in a Box for Accurate and Efficient Experimentation
by Rob Jansen and Nicholas Hopper
in the Symposium on Network and Distributed System Security, 2012.

@inproceedings{shadow-ndss12,
  title = {Shadow: Running Tor in a Box for Accurate and Efficient Experimentation},
  author = {Rob Jansen and Nicholas Hopper},
  booktitle = {Symposium on Network and Distributed System Security},
  year = {2012},
  note = {See also \url{https://shadow.github.io}},
}

Supported Platforms

Officially supported platforms

We support the following Linux x86-64 distributions:

  • Ubuntu 20.04, 22.04, 24.04
  • Debian 10, 11, and 12
  • Fedora 40

We do not provide official support for other platforms. This means that we do not ensure that Shadow successfully builds and passes tests on other platforms. However, we will review pull requests that allow Shadow to build and run on unsupported platforms.

Our policy regarding supported platforms can be found in our "stability guarantees".

Supported Linux kernel versions

Some Linux distributions support multiple kernel versions, for example an older General Availability (GA) kernel and newer hardware-enablement (HWE) kernels. We try to allow Shadow to run on the oldest kernel supported on each distribution (the GA kernel). However:

  • On Debian 10 (buster) We do not support the GA kernel. We do support the HWE kernel (e.g. installed via backports).
  • We are currently only able to regularly test on the latest Ubuntu kernel, since that's what GitHub Actions provides.

By these criteria, Shadow's oldest supported kernel version is currently 5.4 (the GA kernel in Ubuntu 20.04.0).

Docker

If you are installing Shadow within a Docker container, you must increase the size of the container's /dev/shm mount and disable the seccomp security profile. You can do this by passing additional flags to docker run.

Example:

docker run -it --shm-size=1024g --security-opt seccomp=unconfined ubuntu:24.04

If you are having difficulty installing Shadow on any supported platforms, you may find the continuous integration build steps helpful.

Installing Dependencies

Required:

  • gcc, gcc-c++
  • python (version >= 3.6)
  • glib (version >= 2.58.0)
  • cmake (version >= 3.13.4)
  • make
  • pkg-config
  • xz-utils
  • lscpu
  • rustup (version ~ latest)
  • libclang (version >= 9)

APT (Debian/Ubuntu):

# required dependencies
sudo apt-get install -y \
    cmake \
    findutils \
    libclang-dev \
    libc-dbg \
    libglib2.0-0 \
    libglib2.0-dev \
    make \
    netbase \
    python3 \
    python3-networkx \
    xz-utils \
    util-linux \
    gcc \
    g++

# rustup: https://rustup.rs
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

On older versions of Debian or Ubuntu, the default version of libclang is too old, which may cause bindgen to have errors finding system header files, particularly when compiling with gcc. In this case you will need to explicitly install a newer-than-default version of libclang. e.g. on debian-10 install libclang-13-dev.

DNF (Fedora):

Warning: dnf often installs 32-bit (i686) versions of libraries. You may want to use the --best option to make sure you're installing the 64-bit (x86_64) versions, which are required by Shadow.

# required dependencies
sudo dnf install -y \
    cmake \
    findutils \
    clang-devel \
    glib2 \
    glib2-devel \
    make \
    python3 \
    python3-networkx \
    xz \
    xz-devel \
    yum-utils \
    diffutils \
    util-linux \
    gcc \
    gcc-c++

# rustup: https://rustup.rs
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Shadow Setup

After building and testing Shadow, the install step is optional. If you do not wish to install Shadow, you can run it directly from the build directory (./build/src/main/shadow). Shadow only supports building from directories that do not have whitespace characters.

git clone https://github.com/shadow/shadow.git
cd shadow
./setup build --clean --test
./setup test
# Optionally install (to ~/.local/bin by default). Can otherwise run the binary
# directly at build/src/main/shadow.
./setup install

For the remainder of this documentation, we assume the Shadow binary is in your PATH. The default installed location of /home/${USER}/.local/bin is probably already in your PATH. If it isn't, you can add it by running:

echo 'export PATH="${PATH}:/home/${USER}/.local/bin"' >> ~/.bashrc && source ~/.bashrc

The path that Shadow is installed to must not contain any space characters as they are not supported by the dynamic linker's LD_PRELOAD mechanism.

Check that Shadow is installed and runs:

shadow --version
shadow --help

Uninstall Shadow

After running ./setup install, you can find the list of installed files in ./build/install_manifest.txt. To uninstall Shadow, remove any files listed.

Setup Notes

  • All build output is generated to the ./build directory.

  • Use ./setup build --help to see all build options; some useful build options are:

    • -g or --debug to build Shadow with debugging symbols and additional runtime checks. This option will significantly reduce the simulator performance.
    • --search if you installed dependencies to non-standard locations. Used when searching for libraries, headers, and pkg-config files. Appropriate suffixes like /lib and /include of the provided path are also searched when looking for files of the corresponding type.
    • --prefix if you want to install Shadow somewhere besides ~/.local.
  • The setup script is a wrapper to cmake and make. Using cmake and make directly is also possible, but unsupported. For example:

    # alternative installation method
    rm -r build && mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX="~/.local" -DSHADOW_TEST=ON ..
    make
    ctest
    make install
    

System Configs and Limits

Some Linux system configuration changes are needed to run large-scale Shadow simulations (more than about 1000 processes). If you're just trying Shadow or running small simulations, you can skip these steps.

Number of Open Files

There is a default Linux system limit on the total number of open files. Since Shadow opens files from within its own process space and not from within the managed processes, both the system limit and the per-process limit must be greater than the combined total number of files opened by all managed processes. If each managed process in your simulation opens many files, you'll likely want to increase the limit so that your application doesn't receive EMFILE errors when calling open().

System-wide Limits

Check the system-wide limits with:

sysctl fs.nr_open # per-process open file limit
sysctl fs.file-max # system-wide open file limit

Use cat /proc/sys/fs/file-nr to find:

  1. the current, system-wide number of used file handles
  2. the current, system-wide number of free file handles
  3. and the system-wide limit on the maximum number of open files for all processes

Change the limits, persistent across reboots, and apply now:

sudo sysctl -w fs.nr_open=10485760
echo "fs.nr_open = 10485760" | sudo tee -a /etc/sysctl.conf
sudo sysctl -w fs.file-max=10485760
echo "fs.file-max = 10485760" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

User Limits

Check the maximum number of open file descriptors currently allowed in your session:

ulimit -n

Check the number of files currently used in a process with pid=PID:

/bin/ls -l /proc/PID/fd/ | wc -l

You will want to almost certainly want to raise the user file limit by modifying /etc/security/limits.conf. For example:

rjansen soft nofile 10485760
rjansen hard nofile 10485760

The max you can use is your fs.nr_open system-wide limit setting from above. You need to either log out and back in or reboot for the changes to take affect. You can watch /proc/sys/fs/file-nr and reduce the limit according to your usage, if you'd like.

systemd Limits

systemd may place a limit on the number of tasks that a user can run in its slice. You can check to see if a limit is in place by running

$ systemctl status user-$UID.slice

Here's a listing of an example response:

● user-1027.slice - User Slice of <user>
   Loaded: loaded
Transient: yes
  Drop-In: /run/systemd/system/user-1027.slice.d
           └─50-After-systemd-logind\x2eservice.conf, 50-After-systemd-user-sessions\x2eservice.conf, 50-Description.conf, 50-TasksMax.conf
   Active: active since Wed 2020-05-06 21:20:08 EDT; 1 years 2 months ago
    Tasks: 81 (limit: 12288)

The last line of the listing shows that this user has a task limit of 12288 tasks.

If this task limit is too small, it can be removed with the following command:

$ sudo systemctl set-property user-$UID.slice TasksMax=infinity

Number of Maps

There is a system limit on the number of mmap() mappings per process. Most users will not have to modify these settings. However, if an application running in Shadow makes extensive use of mmap(), you may need to increase the limit.

Process Limit

The process limit can be queried in these ways:

sysctl vm.max_map_count
cat /proc/sys/vm/max_map_count

You can check the number of maps currently used in a process with pid=PID like this:

wc -l /proc/PID/maps

Set a new limit, make it persistent, apply it now:

sudo sysctl -w vm.max_map_count=1073741824
echo "vm.max_map_count = 1073741824" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Process / Thread Count Limits

System-Wide Limits

The kernel may limit the max-pid value to a small value, which will limit the total number of possible processes running on the machine. This limit can be raised by the command

sudo sysctl -w kernel.pid_max=4194304
echo "kernel.pid_max = 4194304" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

The kernel may also limit the total number of threads running on the machine. This limit can be raised, too.

sudo sysctl -w kernel.threads-max=4194304
echo "kernel.threads-max = 4194304" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

The kernel may cap the kernel.threads-max value automatically so that, in the maximum limit, the memory consumed by kernel thread control structures do not consume more than approx. (1/8)th of system memory (see https://stackoverflow.com/a/21926745).

User Limits

You may need to raise the maximum number of user processes allowed in /etc/security/limits.conf. For example, user limits can be removed with the lines:

rjansen soft nproc unlimited
rjansen hard nproc unlimited

For more information

https://www.kernel.org/doc/Documentation/sysctl/fs.txt
https://www.kernel.org/doc/Documentation/sysctl/vm.txt

man proc
man ulimit
cat /proc/sys/fs/file-max
cat /proc/sys/fs/inode-max

Running Shadow

When installing Shadow, the main executable was placed in /bin in your install prefix (~/.local/bin by default). As a reminder, it would be helpful if this location was included in your environment PATH.

The main Shadow binary executable, shadow, contains most of the simulator's code, including events and the event engine, the network stack, and the routing logic. Shadow's event engine supports multi-threading using the -p or --parallelism flags (or their corresponding configuration file option) to simulate multiple hosts in parallel.

In the following sections we provide some examples to help you get started, but Shadow's configuration format is entirely specified in the "Shadow Config Specification" and "Network Graph Specification" documents. You will find these useful once you begin writing your own simulations.

Basic File Transfer Example

Here we present a basic example that simulates the network traffic of an HTTP server with 3 clients, each running on different virtual hosts. If you do not have Python or cURL installed, you can download them through your distribution's package manager.

Configuring the Simulation

Each client uses cURL to make an HTTP request to a basic Python HTTP server.

Shadow requires a configuration file that specifies information about the network graph and the processes to run within the simulation. This example uses a built-in network graph for simplicity.

shadow.yaml:

general:
  # stop after 10 simulated seconds
  stop_time: 10s
  # old versions of cURL use a busy loop, so to avoid spinning in this busy
  # loop indefinitely, we add a system call latency to advance the simulated
  # time when running non-blocking system calls
  model_unblocked_syscall_latency: true

network:
  graph:
    # use a built-in network graph containing
    # a single vertex with a bandwidth of 1 Gbit
    type: 1_gbit_switch

hosts:
  # a host with the hostname 'server'
  server:
    network_node_id: 0
    processes:
    - path: python3
      args: -m http.server 80
      start_time: 3s
      # tell shadow to expect this process to still be running at the end of the
      # simulation
      expected_final_state: running
  # three hosts with hostnames 'client1', 'client2', and 'client3' using a yaml
  # anchor to avoid duplicating the options for each host
  client1: &client_host
    network_node_id: 0
    processes:
    - path: curl
      args: -s server
      start_time: 5s
  client2: *client_host
  client3: *client_host

Running the Simulation

Shadow stores simulation data to the shadow.data/ directory by default. We first remove this directory if it already exists, and then run Shadow.

# delete any existing simulation data
rm -rf shadow.data/
shadow shadow.yaml > shadow.log

This small Shadow simulation should complete almost immediately.

Viewing the Simulation Output

Shadow will write simulation output to the data directory shadow.data/. Each host has its own directory under shadow.data/hosts/. For example:

$ ls -l shadow.data/hosts/
drwxrwxr-x 2 user user 4096 Jun  2 16:54 client1
drwxrwxr-x 2 user user 4096 Jun  2 16:54 client2
drwxrwxr-x 2 user user 4096 Jun  2 16:54 client3
drwxrwxr-x 2 user user 4096 Jun  2 16:54 server

Each host directory contains the output for each process running on that host. For example:

$ ls -l shadow.data/hosts/client1/
-rw-rw-r-- 1 user user   0 Jun  2 16:54 curl.1000.shimlog
-rw-r--r-- 1 user user   0 Jun  2 16:54 curl.1000.stderr
-rw-r--r-- 1 user user 542 Jun  2 16:54 curl.1000.stdout

$ cat shadow.data/hosts/client1/curl.1000.stdout
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
...

Traffic Generation Example

We recommend getting started with the basic file transfer before running this example. It contains some basics about running Shadow simulations that are not covered here.

During Shadow simulations, it is often useful to generate background traffic flows between your simulated hosts. This example uses the TGen traffic generator for this purpose.

TGen is capable of generating basic file transfers, where you can configure how much data is transferred in each direction, how long to wait in between each transfer, and how many transfers to perform. TGen also supports more complex behavior models: you can use Markov models to configure a state machine with precise inter-packet timing characteristics. We only make use of its basic features in this example.

If you don't have it installed, you can follow the instructions here. The following example runs TGen with 10 clients that each download 10 files from a server over a simple network graph.

A Shadow Simulation using TGen

The following examples simulates a network with 1 TGen server and 10 TGen clients that are generating TCP traffic to and from the server.

Configuring Shadow

The shadow.yaml file instructs Shadow how to model the network that is used to carry the traffic between the hosts, and about the bandwidth available to each of the hosts. It also specifies how many processes to run in the simulation, and the configuration options for those applications.

shadow.yaml:

general:
  stop_time: 10m
  # Needed to avoid deadlock in some configurations of tgen.
  # See below.
  model_unblocked_syscall_latency: true

network:
  graph:
    # a custom single-node graph
    type: gml
    inline: |
      graph [
        node [
          id 0
          host_bandwidth_down "140 Mbit"
          host_bandwidth_up "18 Mbit"
        ]
        edge [
          source 0
          target 0
          latency "50 ms"
          packet_loss 0.01
        ]
      ]
hosts:
  server:
    network_node_id: 0
    processes:
    # Assumes `tgen` is on your shell's `PATH`.
    # Otherwise use an absolute path here.
    - path: tgen
      # The ../../../ prefix assumes that tgen.server.graph.xml in the same
      # directory as the data directory (specified with the -d CLI argument).
      # See notes below explaining Shadow's directory structure.
      args: ../../../tgen.server.graphml.xml
      start_time: 1s
      # Tell shadow to expect this process to still be running at the end of the
      # simulation.
      expected_final_state: running
  client1: &client_host
    network_node_id: 0
    processes:
    - path: tgen
      args: ../../../tgen.client.graphml.xml
      start_time: 2s
  client2: *client_host
  client3: *client_host
  client4: *client_host
  client5: *client_host

We can see that Shadow will be running 6 processes in total, and that those processes are configured using graphml.xml files (the configuration file format for TGen) as arguments.

Each host directory is also the working directory for the host's processes, which is why we specified ../../../tgen.server.graphml.xml as the path to the TGen configuration in our Shadow configuration file (./shadow.data/hosts/server/../../../tgen.server.graphml.xml./tgen.server.graphml.xml). The host directory structure is stable---it is guaranteed not to change between minor releases, so the ../../../ prefix may reliably be used to refer to files in the same directory as the data directory.

model_unblocked_syscall_latency is used to avoid deadlock in case tgen was compiled with libopenblas.

Configuring TGen

Each TGen process requires an action-dependency graph in order to configure the behavior of the clients and server. See the TGen documentation for more information about customizing TGen behaviors.

Our TGen Server

The main configuration here is the port number on which the server will listen.

tgen.server.graphml.xml:

<?xml version="1.0" encoding="utf-8"?><graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <key attr.name="serverport" attr.type="string" for="node" id="d1" />
  <key attr.name="loglevel" attr.type="string" for="node" id="d0" />
  <graph edgedefault="directed">
    <node id="start">
      <data key="d0">info</data>
      <data key="d1">8888</data>
    </node>
  </graph>
</graphml>

Our TGen Clients

The client config specifies that we connect to the server using its name and port server:8888, and that we download and upload 1 MiB 10 times, pausing 1, 2, or 3 seconds between each transfer.

tgen.client.graphml.xml:

<?xml version="1.0" encoding="utf-8"?><graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <key attr.name="recvsize" attr.type="string" for="node" id="d5" />
  <key attr.name="sendsize" attr.type="string" for="node" id="d4" />
  <key attr.name="count" attr.type="string" for="node" id="d3" />
  <key attr.name="time" attr.type="string" for="node" id="d2" />
  <key attr.name="peers" attr.type="string" for="node" id="d1" />
  <key attr.name="loglevel" attr.type="string" for="node" id="d0" />
  <graph edgedefault="directed">
    <node id="start">
      <data key="d0">info</data>
      <data key="d1">server:8888</data>
    </node>
    <node id="pause">
      <data key="d2">1,2,3</data>
    </node>
    <node id="end">
      <data key="d3">10</data>
    </node>
    <node id="stream">
      <data key="d4">1 MiB</data>
      <data key="d5">1 MiB</data>
    </node>
    <edge source="start" target="stream" />
    <edge source="pause" target="start" />
    <edge source="end" target="pause" />
    <edge source="stream" target="end" />
  </graph>
</graphml>

Running the Simulation

With the above three files saved in the same directory, you can start a simulation. Shadow stores simulation data to the shadow.data/ directory by default. We first remove this directory if it already exists, and then run Shadow. This example may take a few minutes.

# delete any existing simulation data
rm -rf shadow.data/
shadow shadow.yaml > shadow.log

Simulation Output

Shadow will write simulation output to the data directory shadow.data/. Each host has its own directory under shadow.data/hosts/.

In the TGen process output, lines containing stream-success represent completed downloads and contain useful timing statistics. From these lines we should see that clients have completed a total of 50 streams:

$ for d in shadow.data/hosts/client*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50

We can also look at the transfers from the servers' perspective:

$ for d in shadow.data/hosts/server*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50

You can also parse the TGen output logged to the stdout files using the tgentools program from the TGen repo, and plot the data in graphical format to visualize the performance characteristics of the transfers. This page describes how to get started.

Simple Tor Network Example

We recommend getting started with the basic file transfer and traffic generation examples to orient yourself with Shadow before running this slightly more complex Tor simulation.

This example requires that you have installed:

  • tor; can typically be installed via your system package manager.
  • tgen; will most likely need to be built from source.

Configuring Shadow

This simulation again uses tgen as both client and server. In addition to a tor-oblivious client and server, we add a tor network and a client that uses tor to connect to the server.

shadow.yaml:

general:
  stop_time: 30 min
network:
  graph:
    type: gml
    inline: |
      graph [
        directed 0
        node [
          id 0
          host_bandwidth_down "1 Gbit"
          host_bandwidth_up "1 Gbit"
        ]
        edge [
          source 0
          target 0
          latency "50 ms"
          jitter "0 ms"
          packet_loss 0.0
        ]
      ]
hosts:
  fileserver:
    network_node_id: 0
    processes:
    - path: tgen
      # See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
      environment: { OPENBLAS_NUM_THREADS: "1" }
      args: ../../../conf/tgen.server.graphml.xml
      start_time: 1
      expected_final_state: running
  4uthority:
    network_node_id: 0
    ip_addr: 100.0.0.1
    processes:
    - path: tor
      args: --Address 4uthority --Nickname 4uthority
            --defaults-torrc torrc-defaults -f torrc
      start_time: 1
      expected_final_state: running
  exit1:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address exit1 --Nickname exit1
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  exit2:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address exit2 --Nickname exit2
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  relay1:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address relay1 --Nickname relay1
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  relay2:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address relay2 --Nickname relay2
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  relay3:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address relay3 --Nickname relay3
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  relay4:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address relay4 --Nickname relay4
            --defaults-torrc torrc-defaults -f torrc
      start_time: 60
      expected_final_state: running
  client:
    network_node_id: 0
    processes:
    - path: tgen
      # See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
      environment: { OPENBLAS_NUM_THREADS: "1" }
      args: ../../../conf/tgen.client.graphml.xml
      start_time: 600
  torclient:
    network_node_id: 0
    processes:
    - path: tor
      args: --Address torclient --Nickname torclient
            --defaults-torrc torrc-defaults -f torrc
      start_time: 900
      expected_final_state: running
    - path: tgen
      # See https://shadow.github.io/docs/guide/compatibility_notes.html#libopenblas
      environment: { OPENBLAS_NUM_THREADS: "1" }
      args: ../../../conf/tgen.torclient.graphml.xml
      start_time: 1500

Running the Simulation

We run this example similarly as before. Here we use an additional command-line flag --template-directory to copy a template directory layout containing each host's tor configuraton files into its host directory before the simulation begins.

For brevity we omit the contents of our template directory, and configuration files that are referenced from it, but you can find them at examples/docs/tor/shadow.data.template/ and examples/docs/tor/conf/.

# delete any existing simulation data
rm -rf shadow.data/
shadow --template-directory shadow.data.template shadow.yaml > shadow.log

Simulation Output

As before, Shadow will write simulation output to the data directory shadow.data/. Each host has its own directory under shadow.data/hosts/.

In the TGen process output, lines containing stream-success represent completed downloads and contain useful timing statistics. From these lines we should see that clients have completed a total of 20 streams:

$ for d in shadow.data/hosts/*client*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50

We can also look at the transfers from the servers' perspective:

$ for d in shadow.data/hosts/fileserver*; do grep "stream-success" "${d}"/*.stdout ; done | wc -l
50

You can also parse the TGen output logged to the stdout files using the tgentools program from the TGen repo, and plot the data in graphical format to visualize the performance characteristics of the transfers. This page describes how to get started.

More Realistic Simulations

You can use the tornettools toolkit to run larger, more complex Tor networks that are meant to more accurately resemble the characteristics and state of the public Tor network.

Determinism

To improve determinism for your simulation, Shadow preloads an auxiliary library, libshadow_openssl_rng, which override's some of openssl's RNG routines. This is enabled by default, but can be controlled using the experimental use_openssl_rng_preload option.

Shadow Configuration Overview

Shadow requires a configuration file that provides a network graph and information about the processes to run during the simulation. This configuration file uses the YAML format. The options and their effect on the simulation are described in more detail (alongside a simple example configuration file) on the configuration options page.

Many of the configuration file options can also be overridden using command-line options. For example, the configuration option general.stop_time can be overridden with shadow's --stop-time option, and general.log_level can be overridden with --log-level. See shadow --help for other command-line options.

The configuration file does not perform any shell expansion, other than home directory ~/ expansion on some specific options.

Quantities with Units

Some options such as hosts.<hostname>.bandwidth_down accept quantity values containing a magnitude and a unit. For example bandwidth values can be expressed as 1 Mbit, 1000 Kbit, 977 Kibit, etc. The space between the magnitude and unit is optional (for example 5Mbit), and the unit can be pluralized (for example 5 Mbits). Units are case-sensitive.

Time

Time values are expressed as either sub-second units, seconds, minutes, or hours.

Acceptable units are:

  • nanosecond / ns
  • microsecond / us / μs
  • millisecond / ms
  • second / sec / s
  • minute / min / m
  • hour / hr / h

Examples: 30 s, 2 hr, 10 minutes, 100 ms

Bandwidth

Bandwidth values are expressed in bits-per-second with the unit bit. All bandwidth values should be divisible by 8 bits-per-second (for example 30 bit is invalid, but 30 Kbit is valid).

Acceptable unit prefixes are:

  • kilo / K
  • kibi / Ki
  • mega / M
  • mebi / Mi
  • giga / G
  • gibi / Gi
  • tera / T
  • tebi / Ti

Examples: 100 Mbit, 100 Mbits, 10 kilobits, 128 bits

Byte Sizes

Byte size values are expressed with the unit byte or B.

Acceptable unit prefixes are:

  • kilo / K
  • kibi / Ki
  • mega / M
  • mebi / Mi
  • giga / G
  • gibi / Gi
  • tera / T
  • tebi / Ti

Examples: 20 B, 100 MB, 100 megabyte, 10 kibibytes, 30 MiB, 1024 Mbytes

Unix Signals

Several options allow the user to specify a Unix Signal. These can be specified either as a string signal name (e.g. SIGKILL), or an integer signal number (e.g. 9). String signal names must be capitalized and include the SIG prefix.

Realtime signals (signal numbers 32+) are not supported.

YAML Extensions

Shadow supports the extended YAML conventions for merge keys and extension fields).

For examples, see Managing Complex Configurations.

Shadow Configuration Specification

Shadow uses the standard YAML 1.2 format to accept configuration options, with the following extensions:

The following describes Shadow's YAML format and all of the options that Shadow supports that can be used to customize a simulation.

Example:

general:
  stop_time: 2 min
network:
  graph:
    type: gml
    inline: |
      graph [
        node [
          id 0
          host_bandwidth_down "140 Mbit"
          host_bandwidth_up "18 Mbit"
        ]
        edge [
          source 0
          target 0
          latency "50 ms"
          packet_loss 0.01
        ]
      ]
hosts:
  server:
    network_node_id: 0
    processes:
    - path: /usr/sbin/nginx
      args: -c ../../../nginx.conf -p .
      start_time: 1
      expected_final_state: running
  client1: &client_host
    network_node_id: 0
    host_options:
      log_level: debug
    processes:
    - path: /usr/bin/curl
      args: server --silent
      start_time: 5
  client2: *client_host
  client3: *client_host

Contents:

general

Required

General experiment settings.

general.bootstrap_end_time

Default: "0 sec"
Type: String OR Integer

The simulated time that ends Shadow's high network bandwidth/reliability bootstrap period.

If the bootstrap end time is greater than 0, Shadow uses a simulation bootstrapping period where hosts have unrestricted network bandwidth and no packet drop. This can help to bootstrap large networks quickly when the network hosts have low network bandwidth or low network reliability.

general.data_directory

Default: "shadow.data"
Type: String

Path to store simulation output.

general.heartbeat_interval

Default: "1 sec"
Type: String OR Integer OR null

Interval at which to print simulation heartbeat messages.

general.log_level

Default: "info"
Type: "error" OR "warning" OR "info" OR "debug" OR "trace"

Log level of output written on stdout. If Shadow was built in release mode, then messages at level 'trace' will always be dropped.

general.model_unblocked_syscall_latency

Default: false
Type: Bool

Whether to model syscalls and VDSO functions that don't block as having some latency. This should have minimal effect on typical simulations, but can be helpful for programs with "busy loops" that otherwise deadlock under Shadow.

general.parallelism

Default: 0
Type: Integer

How many parallel threads to use to run the simulation. Optimal performance is usually obtained with the number of physical CPU cores (nproc without hyperthreading or nproc/2 with hyperthreading).

A value of 0 will allow Shadow to choose the number of threads, typically the number of physical CPU cores available in the current CPU affinity mask and cgroup.

Virtual hosts depend on network packets that can potentially arrive from other virtual hosts, so each worker can only advance according to the propagation delay to avoid dependency violations. Therefore, not all threads will have 100% CPU utilization.

general.progress

Default: false
Type: Bool

Show the simulation progress on stderr.

When running in a tty, the progress will be updated every second and shown at the bottom of the terminal. Otherwise the progress will be printed without ANSI escape codes at intervals which increase as the simulation progresses.

general.seed

Default: 1
Type: Integer

Initialize randomness using seed N.

general.stop_time

Required
Type: String OR Integer

The simulated time at which the simulation ends.

general.template_directory

Default: null
Type: String OR null

Path to recursively copy during startup and use as the data-directory.

network

Required

Network settings.

network.graph

Required

The network topology graph.

A network topology represented by a connected graph with certain attributes specified on the network nodes and edges. For more information on how to structure this data, see the Network Graph Overview.

Example:

network:
  graph:
    type: gml
    inline: |
      graph [
        ...
      ]

network.graph.type

Required
Type: "gml" OR "1_gbit_switch"

The network graph can be specified in the GML format, or a built-in "1_gbit_switch" graph with a single network node can be used instead.

The built-in "1_gbit_switch" graph contains the following:

graph [
  directed 0
  node [
    id 0
    host_bandwidth_up "1 Gbit"
    host_bandwidth_down "1 Gbit"
  ]
  edge [
    source 0
    target 0
    latency "1 ms"
    packet_loss 0.0
  ]
]

network.graph.<file|inline>

Required if network.graph.type is "gml"
Type: Object OR String

If the network graph type is not a built-in network graph, the graph data can be specified as a path to an external file, or as an inline string.

network.graph.file.path

Required
Type: String

The path to the file.

If the path begins with ~/, it will be considered relative to the current user's home directory. No other shell expansion is performed on the path.

network.graph.file.compression

Default: null
Type: "xz" OR null

The file's compression format.

network.use_shortest_path

Default: true
Type: Bool

When routing packets, follow the shortest path rather than following a direct edge between network nodes. If false, the network graph is required to be complete (including self-loops) and to have exactly one edge between any two nodes.

experimental

Experimental experiment settings. Unstable and may change or be removed at any time, regardless of Shadow version.

experimental.host_heartbeat_interval

Default: "1 sec"
Type: String OR Integer OR null

Amount of time between host heartbeat messages.

experimental.host_heartbeat_log_info

Default: ["node"]
Type: Array of ("node" OR "socket" OR "ram")

List of information to show in the host's heartbeat message.

experimental.host_heartbeat_log_level

Default: "info"
Type: "error" OR "warning" OR "info" OR "debug" OR "trace"

Log level at which to print host heartbeat messages.

experimental.interface_qdisc

Default: "fifo"
Type: "fifo" OR "round-robin"

The queueing discipline to use at the network interface.

experimental.max_unapplied_cpu_latency

Default: "1 microsecond"
Type: String

Max amount of execution-time latency allowed to accumulate before the clock is moved forward. Moving the clock forward is a potentially expensive operation, so larger values reduce simulation overhead, at the cost of coarser time jumps.

Note also that accumulated-but-unapplied latency is discarded when a thread is blocked on a syscall.

Ignored when general.model_unblocked_syscall_latency is false.

experimental.runahead

Default: "1 ms"
Type: String OR null

If set, overrides the automatically calculated minimum time workers may run ahead when sending events between virtual hosts.

experimental.scheduler

Default: "thread-per-core"
Type: "thread-per-core" OR "thread-per-host"

The host scheduler implementation, which decides how to assign hosts to threads and threads to CPU cores.

experimental.socket_recv_autotune

Default: true
Type: Bool

Enable receive window autotuning.

experimental.socket_recv_buffer

Default: "174760 B"
Type: String OR Integer

Initial size of the socket's receive buffer.

experimental.socket_send_autotune

Default: true
Type: Bool

Enable send window autotuning.

experimental.socket_send_buffer

Default: "131072 B"
Type: String OR Integer

Initial size of the socket's send buffer.

experimental.strace_logging_mode

Default: "off"
Type: "off" OR "standard" OR "deterministic"

Log the syscalls for each process to individual "strace" files.

The mode determines the format that the syscalls are logged in. For example, the "deterministic" mode will avoid logging memory addresses or potentially uninitialized memory.

The logs will be stored at shadow.data/hosts/<hostname>/<procname>.<pid>.strace.

Limitations:

  • Syscalls run natively will not log the syscall arguments or return value (for example SYS_getcwd).
  • Syscalls processed within Shadow's C code will not log the syscall arguments.
  • Syscalls that are interrupted by a signal may not be logged (for example SYS_read).
  • Syscalls that are interrupted by a signal may be logged inaccurately. For example, the log may show syscall(...) = -1 (EINTR), but the managed process may not actually see this return value. Instead the syscall may be restarted.

experimental.unblocked_syscall_latency

Default: "1 microseconds"
Type: String

The simulated latency of an unblocked syscall. For simulation efficiency, this latency is only added when max_unapplied_cpu_latency is reached.

Ignored when general.model_unblocked_syscall_latency is false.

experimental.unblocked_vdso_latency

Default: "10 nanoseconds"
Type: String

The simulated latency of an unblocked vdso function. For simulation efficiency, this latency is only added when max_unapplied_cpu_latency is reached.

Ignored when general.model_unblocked_syscall_latency is false.

experimental.use_cpu_pinning

Default: true
Type: Bool

Pin each thread and any processes it executes to the same logical CPU Core to improve cache affinity.

experimental.use_dynamic_runahead

Default: false
Type: Bool

Update the minimum runahead dynamically throughout the simulation.

experimental.use_memory_manager

Default: false
Type: Bool

Use the MemoryManager in memory-mapping mode. This can improve performance, but disables support for dynamically spawning processes inside the simulation (e.g. the fork syscall).

experimental.use_new_tcp

Default: false
Type: Bool

Use the rust TCP implementation.

experimental.use_object_counters

Default: true
Type: Bool

Count object allocations and deallocations. If disabled, we will not be able to detect object memory leaks.

experimental.use_preload_libc

Default: true
Type: Bool

Preload our libc library for all managed processes for fast syscall interposition when possible.

experimental.use_preload_openssl_crypto

Default: false
Type: Bool

Preload our OpenSSL crypto library for all managed processes to skip some AES crypto operations, which may speed up simulation if your CPU lacks AES-NI support. However, it changes the behavior of your application and can cause bugs in OpenSSL that are hard to notice. You should probably not use this option unless you really know what you're doing.

experimental.use_preload_openssl_rng

Default: true
Type: Bool

Preload our OpenSSL RNG library for all managed processes to mitigate non-deterministic use of OpenSSL.

experimental.use_sched_fifo

Default: false
Type: Bool

Use the SCHED_FIFO scheduler. Requires CAP_SYS_NICE. See sched(7), capabilities(7).

experimental.use_syscall_counters

Default: true
Type: Bool

Count the number of occurrences for individual syscalls.

experimental.use_worker_spinning

Default: true
Type: Bool

Each worker thread will spin in a sched_yield loop while waiting for a new task. This is ignored if not using the thread-per-core scheduler.

This may improve runtime performance in some environments.

experimental.log_errors_to_tty

Default: true
Type: Bool

Log Error-level log lines to shadow's stderr in addition to stdout, if stdout is not a tty but stderr is.

host_option_defaults

Default options for all hosts. These options can also be overridden for each host individually in the host's hosts.<hostname>.host_options section.

host_option_defaults.log_level

Default: null
Type: "error" OR "warning" OR "info" OR "debug" OR "trace" OR null

Log level at which to print host log messages.

host_option_defaults.pcap_capture_size

Default: "65535 B"
Type: String OR Integer

How much data to capture per packet (header and payload) if pcap logging is enabled.

The default of 65535 bytes is the maximum length of an IP packet.

host_option_defaults.pcap_enabled

Default: false
Type: Bool

Should Shadow generate pcap files?

Logs all network input and output for this host in PCAP format (for viewing in e.g. wireshark). The pcap files will be stored in the host's data directory, for example shadow.data/hosts/myhost/eth0.pcap.

hosts

Required
Type: Object

The simulated hosts which execute processes. Each field corresponds to a host configuration, with the field name being used as the network hostname. A hostname must follow the character requirements of hostname(7).

Shadow assigns each host to a network node in the network graph.

In Shadow, each host is given an RNG whose seed is derived from the global seed (general.seed) and the hostname. This means that changing a host's name will change that host's RNG seed, subtly affecting the simulation results.

hosts.<hostname>.bandwidth_down

Default: null
Type: String OR Integer OR null

Downstream bandwidth capacity of the host.

Overrides any default bandwidth values set in the assigned network graph node.

hosts.<hostname>.bandwidth_up

Default: null
Type: String OR Integer OR null

Upstream bandwidth capacity of the host.

Overrides any default bandwidth values set in the assigned network graph node.

hosts.<hostname>.ip_addr

Default: null
Type: String OR null

IP address to assign to the host.

This IP address must not conflict with the address of any other host (two hosts must not have the same IP address).

hosts.<hostname>.network_node_id

Required
Type: Integer

Network graph node ID to assign the host to.

hosts.<hostname>.host_options

See host_option_defaults for supported fields.

Example:

hosts:
  client:
    ...
    host_options:
      log_level: debug

hosts.<hostname>.processes

Required
Type: Array

Virtual software processes that the host will run.

hosts.<hostname>.processes[*].args

Default: ""
Type: String OR Array of String

Process arguments.

The arguments can be specified as a string in a shell command-line format:

args: "--user-agent 'Mozilla/5.0 (compatible; ...)' http://myserver:8080"

Or as an array of strings:

args: ['--user-agent', 'Mozilla/5.0 (compatible; ...)', 'http://myserver:8080']

Shell expansion (which includes ~/ expansion) is not performed on either format. In the command-line format, the string is parsed as an argument vector following typical shell quotation parsing rules.

hosts.<hostname>.processes[*].environment

Default: ""
Type: Object

Environment variables passed when executing this process.

Shell expansion (which includes ~/ expansion) is not performed on any fields.

Examples:

environment:
  ENV_A: "1"
  ENV_B: foo
environment: { ENV_A: "1", ENV_B: foo }

hosts.<hostname>.processes[*].expected_final_state

Default: {exited: 0}
Type: {"exited": <Integer>} OR {"signaled": Unix Signal} OR "running"

The expected state of the process at the end of the simulation. If the process exits before the end of the simulation with an unexpected state, or is still running at the end of the simulation when this was not running, shadow will log an error and return a non-zero status for the simulation.

Use exited to indicate that a process should have exited normally; e.g. by returning from main or calling exit.

Use signaled to indicate that a process should have been killed by a signal.

Use running for a process expected to still be running at the end of the simulation, such as a server process that you didn't arrange to shutdown before the end of the simulation. (All processes will be killed by Shadow when the simulation ends).

Examples:

  • {exited: 0}
  • {exited: 1}
  • {signaled: SIGINT}
  • {signaled: 9}
  • running

Only processes started directly from the configuration have an expected_final_state. Processes that those processes start (e.g. via fork in C, or running an executable in a shell script) don't have one. Generally it's the parent process's responsibility to do any necessary validation of the exit status of its children (e.g. via waitpid in C, or checking $? in a bash script).

hosts.<hostname>.processes[*].path

Required
Type: String

If the path begins with ~/, it will be considered relative to the current user's home directory. No other shell expansion is performed on the path.

Bare file basenames like sleep will be located using Shadow's PATH environment variable (e.g. to /usr/bin/sleep).

hosts.<hostname>.processes[*].shutdown_signal

Default: "SIGTERM"
Type: Unix Signal

The signal that will be sent to the process at hosts.<hostname>.processes[*].shutdown_time. Signals specified by name should be all-caps and include the SIG prefix; e.g. "SIGTERM".

Many long-running processes support exiting cleanly when sent SIGTERM or SIGINT.

If the process is expected to be killed directly by the signal instead of catching it and exiting cleanly, you can set expected_final_state to prevent Shadow from interpreting this as an error. e.g. SIGKILL cannot be caught, so will always result in an end state of {signaled: SIGKILL} if the process didn't already exit before the signal was sent.

path: sleep
args: "1000"
start_time: 1s
shutdown_time: 2s
shutdown_signal: SIGKILL
expected_final_state: {signaled: SIGKILL}

hosts.<hostname>.processes[*].shutdown_time

Default: null
Type: String OR Integer OR null

The simulated time at which to send hosts.<hostname>.processes[*].shutdown_signal to the process. This must be before general.stop_time.

hosts.<hostname>.processes[*].start_time

Default: "0 sec"
Type: String OR Integer

The simulated time at which to execute the process. This must be before general.stop_time.

Managing Complex Configurations

It is sometimes useful to generate shadow configuration files dynamically. Since Shadow accepts configuration files in YAML 1.2 format, there are many options available; even more so since JSON is also valid YAML 1.2.

YAML templating

YAML itself has some features to help avoid repetition. When using these features, it can be helpful to use shadow's --show-config flag to examine the "flat" generated config.

An individual node can be made into an anchor (&AnchorName x), and referenced via an alias (*AnchorName). For example, here we create and use the anchors Node, Fast, Slow, ClientPath, and ServerPath:

general:
  stop_time: 10s
network:
  graph:
    type: 1_gbit_switch
hosts:
  fast_client:
    network_node_id: &Node 0
    bandwidth_up: &Fast "100 Mbit"
    bandwidth_down: *Fast
    processes:
    - path: &ClientPath "/path/to/client"
    # ...
  slow_client:
    network_node_id: *Node
    bandwidth_up: &Slow "1 Mbit"
    bandwidth_down: *Slow
    processes:
    - path: *ClientPath
    # ...
  fast_server:
    network_node_id: *Node
    bandwidth_up: *Fast
    bandwidth_down: *Fast
    processes:
    - path: &ServerPath "/path/to/server"
    # ...
  slow_server:
    network_node_id: *Node
    bandwidth_up: *Slow
    bandwidth_down: *Slow
    processes:
    - path: *ServerPath

We can use extension fields to move our constants into one place:

x-constants:
  - &Node 0
  - &Fast "100 Mbit"
  - &Slow "1 Mbit"
  - &ClientPath "/path/to/client"
  - &ServerPath "/path/to/server"
general:
  stop_time: 10s
network:
  graph:
    type: 1_gbit_switch
hosts:
  fast_client:
    network_node_id: *Node
    bandwidth_up: *Fast
    bandwidth_down: *Fast
    processes:
    - path: *ClientPath
  slow_client:
    network_node_id: *Node
    bandwidth_up: *Slow
    bandwidth_down: *Slow
    processes:
    - path: *ClientPath
  fast_server:
    network_node_id: *Node
    bandwidth_up: *Fast
    bandwidth_down: *Fast
    processes:
    - path: *ServerPath
  slow_server:
    network_node_id: *Node
    bandwidth_up: *Slow
    bandwidth_down: *Slow
    processes:
    - path: *ServerPath

We can also use merge keys to make extendable templates for fast and slow hosts:

x-constants:
  - &Node 0
  - &Fast "100 Mbit"
  - &Slow "1 Mbit"
  - &ClientPath "/path/to/client"
  - &ServerPath "/path/to/server"
  - &FastHost
    network_node_id: *Node
    bandwidth_up: *Fast
    bandwidth_down: *Fast
  - &SlowHost
    network_node_id: *Node
    bandwidth_up: *Slow
    bandwidth_down: *Slow
general:
  stop_time: 10s
network:
  graph:
    type: 1_gbit_switch
hosts:
  fast_client:
    <<: *FastHost
    processes:
    - path: *ClientPath
  slow_client:
    <<: *SlowHost
    processes:
    - path: *ClientPath
  fast_server:
    <<: *FastHost
    processes:
    - path: *ServerPath
  slow_server:
    <<: *SlowHost
    processes:
    - path: *ServerPath

Dynamic Generation

There are many tools and libraries for generating YAML and JSON. These can be helpful for representing more complex relationships between parameter values.

Suppose we want to add a cleanup process to each host that runs one second before the simulation ends. Since YAML doesn't support arithmetic, the following doesn't work:

x-constants:
  - &StopTimeSec 10
  - &CleanupProcess
    # This will evaluate to the invalid time string "10 - 1"; not "9"
    start_time: *StopTimeSec - 1
    ...
# ...

In such cases it may be helpful to write your configuration in a language that does support more advanced features that can generate YAML or JSON.

Python example

We can achieve the desired effect in Python like so:

#!/usr/bin/env python3

Node = 0
StopTimeSec = 10
Fast = "100 Mbit"
Slow = "1 Mbit"
ClientPath = "/path/to/client"
ServerPath = "/path/to/server"
FastHost = {
  'network_node_id': Node,
  'bandwidth_up': Fast,
  'bandwidth_down': Fast,
}
SlowHost = {
  'network_node_id': Node,
  'bandwidth_up': Slow,
  'bandwidth_down': Slow,
}
CleanupProcess = {
  'start_time': f'{StopTimeSec - 1}s',
  'path': '/path/to/cleanup',
}
config = {
  'general': {
    'stop_time': '10s',
  },
  'network': {
    'graph': {
      'type': '1_gbit_switch'
    },
  },
  'hosts': {
    'fast_client': {
      **FastHost,
      'processes': [
        {'path': ClientPath},
        CleanupProcess,
      ],
    },
    'slow_client': {
      **SlowHost,
      'processes': [
        {'path': ClientPath},
        CleanupProcess,
      ],
    },
    'fast_server': {
      **FastHost,
      'processes': [
        {'path': ServerPath},
        CleanupProcess,
      ],
    },
    'slow_server': {
      **SlowHost,
      'processes': [
        {'path': ServerPath},
        CleanupProcess,
      ],
    },
  },
}

import yaml
print(yaml.safe_dump(config))

Nix example

There are also languages that specialize in doing this kind of advanced configuration generation. For example, using NixOs's config language:

let
  Node = 0;
  StopTimeSec = 10;
  Fast = "100 Mbit";
  Slow = "1 Mbit";
  ClientPath = "/path/to/client";
  ServerPath = "/path/to/server";
  FastHost = {
    network_node_id = Node;
    bandwidth_up = Fast;
    bandwidth_down = Fast;
  };
  SlowHost = {
    network_node_id = Node;
    bandwidth_up = Slow;
    bandwidth_down = Slow;
  };
  CleanupProcess = {
    start_time = (toString (StopTimeSec - 1)) + "s";
    path = "/path/to/cleanup";
  };
in
{
  general = {
    stop_time = (toString StopTimeSec) + "s";
  };
  network = {
    graph = {
      type = "1_gbit_switch";
    };
  };
  hosts = {
    fast_client = FastHost // {
      processes = [
        {path = ClientPath;}
        CleanupProcess
      ];
    };
    slow_client = SlowHost // {
      processes = [
        {path = ClientPath;}
        CleanupProcess
      ];
    };
    fast_server = FastHost // {
      processes = [
        {path = ServerPath;}
        CleanupProcess
      ];
    };
    slow_server = SlowHost // {
      processes = [
        {path = ServerPath;}
        CleanupProcess
      ];
    };
  };
}

This can be converted to JSON, which is also valid YAML, with:

nix eval -f example.nix --json

Network Graph Overview

Processes running in Shadow do not have access to the internet; instead, processes running on Shadow virtual hosts utilize an internal routing module to communicate with other processes running on other virtual hosts in the simulation. The routing module is used to position virtual hosts within a network topology, to compute communication paths between virtual hosts, and to enforce network path characteristics like latency and packet loss.

Importantly, the routing module is currently used to model the performance characteristics of internet paths; we do not simulate the behavior of network routers (we do not run routing protocols like BGP).

This page describes the routing module and how it can be configured.

Graph

Shadow represents a network topology over which processes can communicate using a weighted graph. The graph contains vertices that abstractly represent network locations, and edges representing network paths between those locations.

When referring to a network graph, the terms vertices and nodes are interchangeable. In our documentation, we refer to these as nodes. Note that nodes in the network graph are distinct from virtual hosts in the Shadow config file: a virtual host models an end-host machine, whereas a network node represents a location at which a host can connect to the simulated network.

Shadow requires that the network graph is connected such that there exists at least one path (a series of one or more edges) between every pair of nodes.

Behavior

The graph encodes network positioning and path characteristics as attributes on the nodes and edges. Shadow uses the connectivity graph along with the information encoded in node and edge attributes to:

  • attach virtual hosts to specific nodes (i.e., locations) in the network graph;
  • assign the bandwidth allowed for each attached virtual host;
  • compute the shortest path (weighted by edge latency) between two virtual hosts using Dijkstra's algorithm; and
  • compute the end-to-end latency and packet loss for the shortest path.

The bandwidth of the virtual hosts and the end-to-end latency and packet loss for a shortest path between two virtual hosts are then enforced for all network communication.

Important Notes

  • The network graph may be directed or undirected, as long as the graph is structured such that every node can reach every other node through a series of edges.
  • If the network graph is a complete graph (there exists a single unique edge between every pair of nodes), then we can avoid running the shortest path algorithm as a performance optimization by setting the use_shortest_path option to False.
  • Each node in the graph must have a self-loop (an edge from the node to itself). This edge will be used for communication between two hosts attached to the same node, regardless of if a shorter path exists.

Network Graph Attributes

We encode attributes on the nodes and edges that allow for configuring the simulated network characteristics. The attributes and their effect on the simulated network are described in more detail (alongside a simple example graph) on the network graph specification page.

Using an Existing Graph

We created a large network graph representing worldwide latencies and bandwidths as of 2018 using the RIPE Atlas measurement platform. The graph contains network bandwidths and latencies in and between major cities around the world, and is suitable for general usage for most types of Shadow simualtions. The graph (updated for Shadow version 2.x) is available for download as a research artifact and more details about the measurement methodology is available on the research artifacts site.

Note: the scripts we used to create the graph are also available, but are not recommended for general use. The scripts require advanced knowledge of RIPE Atlas and also require that you possess RIPE Atlas credits to conduct the measurements needed to create a new graph. We recommend using our existing graph linked above instead, which we may periodically update.

Creating Your Own Graph

The python module networkx can be used to create and manipulate more complicated graphs.

Network Graph Specification

The network graph overview provides a general summary of Shadow's use of a network graph to abstractly model network position and to connect virtual hosts in a network topology while enforcing network characteristics on paths between hosts. This page describes the specific attributes that can be configured in the network graph, and the effect that each attribute has on the simulation.

Example Graph

Below is an example of a simple network graph in the Shadow-supported GML format (note that GML calls graph vertices as nodes, but these terms are generally interchangeable).

graph [
  directed 0
  node [
    id 0
    label "node at 1.2.3.4"
    host_bandwidth_down "100 Mbit"
    host_bandwidth_up "100 Mbit"
  ]
  edge [
    source 0
    target 0
    label "path from 1.2.3.4 to 1.2.3.4"
    latency "10 ms"
    jitter "0 ms"
    packet_loss 0.0
  ]
]

Configurable Attributes

graph.directed

Required: False
Default: 0
Type: Integer

Specifies the symmetry of the edges in the graph. If set to 0 (the default), the graph is an undirected graph: an edge between node u and node v is symmetric and can be used to construct a path both from u to v and from v to u. If set to 1, the graph is a directed graph: an edge from node u to node v is assymmetric and can only be used to construct a path from u to v (a separate edge from v to u must be specified to compose a path in the reverse direction).

node.id

Required: True
Type: Integer

A unique integer identifier for a given node.

node.label

Required: False
Default: n/a
Type: String

An optional, human-meaningful string description of the node. The string may be used in log messages printed by Shadow.

node.host_bandwidth_down

Required: True
Type: String

A string defining the downstream (receive) bandwidth that will be allowed for any host attached to this node. Hosts may individually override this value in the Shadow config file. The format of the string specifies the bandwidth and its unit as described in the config documentation, e.g., 10 Mbit. Note that this bandwidth is allowed for every host that is attached to this node; it is not the total bandwidth logically available at the node (which is not defined).

node.host_bandwidth_up

Required: True
Type: String

A string defining the upstream (send) bandwidth that will be allowed for any host attached to this node. Hosts may individually override this value in the Shadow config file. The format of the string specifies the bandwidth and its unit as described in the config documentation, e.g., 10 Mbit. Note that this bandwidth is allowed for every host that is attached to this node; it is not the total bandwidth logically available at the node (which is not defined).

edge.source

Required: True
Type: Integer

The unique integer identifier of the first of two nodes of the edge. The node must exist in the graph. If the graph is directed, this node is treated as the source or start of the edge.

edge.target

Required: True
Type: Integer

The unique integer identifier of the second of two nodes of the edge. The node must exist in the graph. If the graph is directed, this node is treated as the target or end of the edge.

edge.label

Required: False
Default: n/a
Type: String

An optional, human-meaningful string description of the edge. The string may be used in log messages printed by Shadow.

edge.latency

Required: True
Type: String

The latency that will be added to packets traversing this edge. This value is used as a weight while running Dijkstra's shortest path algorithm. The format of the string specifies the latency and its unit, e.g., 10 ms. If a unit is not specified, it will be assumed that it is in the base unit of "seconds". The latency must not be 0.

edge.jitter

Required: False
Default: n/a
Type: String

This keyword is allowed but currently nonfunctional; it is reserved for future use.

edge.packet_loss

Required: True
Type: Float

A fractional value between 0 and 1 representing the chance that a packet traversing this edge will get dropped.

Disabling Sidechannel Mitigations

Sidechannel attacks in the style of Spectre and Meltdown allow malicious code to access data it otherwise wouldn't be able to. Modern systems employ countermeasures to prevent these attacks, which typically incur some performance cost, and may not be necessary when running Shadow simulations. i.e. Shadow's performance can be improved by disabling these mitigations.

Keep in mind that Shadow already isn't designed to protect itself or its host system from malicious software. See Security.

Speculative Store Bypass

The Speculative Store Bypass attack allows malicious code to read data it otherwise wouldn't be able to, e.g. due to software sandboxing such as in a javascript engine. For a high-level overview of this attack and mitigations, see: https://www.redhat.com/en/blog/speculative-store-bypass-explained-what-it-how-it-works. For a more technical overview, see https://software.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf.

We have observed the mitigation for this vulnerability to add roughly a 30% performance overhead to Shadow simulations. Because process isolation is already sufficient to mitigate this vulnerability (See "Process Isolation"), and because Shadow already makes no attempt to protect itself from malicious code within its own processes, and isn't designed to run in a managed-code environment itself, enabling this mitigation in Shadow and its managed processes doesn't have any clear benefit.

Shadow itself makes use of seccomp, but uses the SECCOMP_FILTER_FLAG_SPEC_ALLOW flag to avoid turning on this mitigation. It also logs a warning if it detects this mitigation is already enabled.

One common way this mitigation can be turned on inadvertently is by running inside a Docker container, with seccomp enabled (which is the default). You can avoid this by turning off seccomp entirely (using --security-opt seccomp=unconfined, but this might not be an option when running in a shared environment. Unfortunately, Docker currently doesn't expose an option to use its seccomp functionality without turning on this mitigation.

Another way to avoid enabling this mitigation is by changing the kernel parameter spec_store_bypass_disable. Overriding its default value of seccomp to prctl will still allow software sandboxes such as javascript engines to enable this mitigation, but will no longer enable it by default when installing a seccomp filter. In principle this could create a vulnerability if there's code running on the system that relies on the default behavior without explicitly opting in via prctl, so use some caution. For more discussion on this parameter, see this discussion on the kernel mailing list about whether the kernel default ought to be changed from seccomp to prctl: https://lore.kernel.org/lkml/20201104215702.GG24993@redhat.com/

Other mitigations

In some ad-hoc measurements we've found that disabling all sidechannel mitigations with mitigations=off also provides a significant performance boost. We haven't thoroughly evaluated the exact benefits though, and this setting could expose your system to attack. At a minimum, this isn't advised on a system that runs any untrusted code at any privilege level, including in managed environments such as running javascript in a web browser.

Parallel simulations

Some care must be taken when running multiple Shadow simulations on the same hardware at the same time. By default, Shadow pins threads to specific CPUs to avoid CPU migrations. The CPU selection logic isn't aware of other processes that may be using substantial CPU time and/or pinning, including other Shadow simulations. i.e. without some care, multiple Shadow simulations running on the same machine at the same time will generally end up trying to use the same set of CPUs, even if other CPUs on the machine are idle.

Disabling pinning

The simplest solution is to disable CPU pinning entirely. This has a substantial performance penalty (with some reports as high as 3x), but can be a reasonable solution for small simulations. Pinning can be disabled by passing --use-cpu-pinning=false to Shadow.

Setting an initial CPU affinity

Shadow checks the initial CPU affinity assigned to it, and only assigns to CPUs within that set. The easiest way to run Shadow with a subset of CPUs is with the taskset utility. e.g. to start one Shadow simulation using CPUs 0-9 and another using CPUs 10-19, you could use:

$ (cd sim1 && taskset --cpu-list 0-9 shadow sim1config.yml) &
$ (cd sim2 && taskset --cpu-list 10-19 shadow sim2config.yml) &

Shadow similarly avoids trying to pin to CPUs outside of its cgroup cpuset (see cpuset(7)). This allows Shadow to work correctly in such scenarios (such as running in a container on a shared machine that only has access to some CPUs), but is generally more complex and requires higher privilege than setting the CPU affinity with taskset.

Choosing a CPU set

When assigning Shadow a subset of CPUs, some care must be taken to get optimal performance. You can use the lscpu utility to see the layout of the CPUs on your machine.

  • Avoid using multiple CPUs on the same core (aka hyperthreading). Such CPUs compete with each-other for resources.
  • Prefer CPUs on the same socket and (NUMA) node. Such CPUs share cache, which is typically beneficial in Shadow simulations.

For example, given the lscpu output:

$ lscpu --parse=cpu,core,socket,node
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node
0,0,0,0
1,1,1,1
2,2,0,0
3,3,1,1
4,4,0,0
5,5,1,1
6,6,0,0
7,7,1,1
8,8,0,0
9,9,1,1
10,10,0,0
11,11,1,1
12,12,0,0
13,13,1,1
14,14,0,0
15,15,1,1
16,16,0,0
17,17,1,1
18,18,0,0
19,19,1,1
20,20,0,0
21,21,1,1
22,22,0,0
23,23,1,1
24,24,0,0
25,25,1,1
26,26,0,0
27,27,1,1
28,28,0,0
29,29,1,1
30,30,0,0
31,31,1,1
32,32,0,0
33,33,1,1
34,34,0,0
35,35,1,1
36,36,0,0
37,37,1,1
38,38,0,0
39,39,1,1
40,0,0,0
41,1,1,1
42,2,0,0
43,3,1,1
44,4,0,0
45,5,1,1
46,6,0,0
47,7,1,1
48,8,0,0
49,9,1,1
50,10,0,0
51,11,1,1
52,12,0,0
53,13,1,1
54,14,0,0
55,15,1,1
56,16,0,0
57,17,1,1
58,18,0,0
59,19,1,1
60,20,0,0
61,21,1,1
62,22,0,0
63,23,1,1
64,24,0,0
65,25,1,1
66,26,0,0
67,27,1,1
68,28,0,0
69,29,1,1
70,30,0,0
71,31,1,1
72,32,0,0
73,33,1,1
74,34,0,0
75,35,1,1
76,36,0,0
77,37,1,1
78,38,0,0
79,39,1,1

A reasonable configuration for two simulations might be taskset --cpu-list 0-39:2 (CPUs 0,2,...,38) and taskset --cpu-list 1-39:2. (CPUs 1,3,...,39). This assignment leaves CPUs 40-79 idle, since those share the same physical cores at CPUs 0-39, puts the first simulation on socket 0 and numa node 0, and the second simulation on socket 1 and numa node 1.

Configuration options

Shadow's configuration options are generally tuned for optimal performance using Tor benchmarks, but not all system architectures and simulation workloads are the same. Shadow has several configuration options that may improve the simulation performance. Many of these options are considered "experimental", which means that they may be changed or removed at any time. If you find any of these options useful, let us know.

Be careful as these options may also worsen the simulation performance.

bootstrap_end_time

Shadow supports an optional "bootstrapping period" of high network bandwidth and reliability for simulations which require network-related bootstrapping (for example Tor). While the network performance characteristics will be unrealistic during this time period, it can significantly reduce the simulation's wall clock time. After this bootstrapping period ends, the network bandwidth/reliability is reverted back to the values specified in the simulation and network configuration.

heartbeat_interval and host_heartbeat_interval

Shadow logs simulation statistics at given simulation time intervals. If any of these time intervals are small relative to the total time of the simulation, a large number of log lines will be written. If the log is being written to disk, this increased disk I/O may slow down the simulation dramatically.

parallelism

Simulations with multiple hosts can be parallelized across multiple threads. By default Shadow tries to choose an optimal number of threads to run in parallel, but a different number of threads may yield better run time performance.

use_cpu_pinning

CPU pinning is enabled by default and should improve the simulation performance, but in shared computing environments it might be beneficial to disable this option.

scheduler

Shadow supports two different types of work schedulers. The default thread_per_core scheduler has been found to be significantly faster on most machines, but may perform worse than the thread_per_host scheduler in rare circumstances.

use_memory_manager

Shadow supports a memory manager that uses shared memory maps to reduce the overhead of accessing a managed process' data from Shadow's main process, but this is disabled by default as it does not support other Shadow features such as emulating the fork/exec syscalls. If you do not need support for these features, enabling this memory manager may slightly improve simulation performance.

use_worker_spinning

Shadow's thread-per-core scheduler uses a spinloop by default. While this results in significant performance improvements in our benchmarks, it may be worth testing Shadow's performance with this disabled.

max_unapplied_cpu_latency

If model_unblocked_syscall_latency is enabled, increasing the max unapplied CPU latency may improve the simulation run time performance.

runahead

This option effectively sets a minimum network latency. Increasing this value will allow for better simulation parallelisation and possibly better run time performance, but will affect the network characteristics of the simulation.

Profiling

Profiling can be useful for improving the performance of experiments, either as improvements to the implementation of Shadow itself, or in altering the configuration of the experiments you are running.

Profiling with top/htop

Tools like top and htop will give good first-order approximations for what Shadow is doing. While they can only give system-wide to thread-level granularity, this can often still tell you important details such as whether Shadow, the simulated processes, or the kernel are consuming memory and processor cycles. E.g., if you're running into memory constraints, the RES or MEM column of these tools can tell you where to start looking for ways to address that. If execution time is too long, sorting by CPU or TIME can provide insight into where that time is being spent.

One limitation to note is that Shadow relies on spinlocks in barriers for some of its operation. Especially when running with many threads, these spinlocks will show as consuming most of the CPU anytime the simulation is bottlenecked on few simulated processes. Telling when this is happening can be difficult in these tools, because no symbol information is available.

Profiling with perf

The perf tool is a powerful interface to the Linux kernel's performance counter subsystem. See man perf or the perf wiki for full details on how to use it, but some highlights most relevant to Shadow execution time are given here.

Regardless of how you are using perf, the aforementioned complication of spinlocks in Shadow apply. Namely, when there is any bottleneck on the barrier, the symbols associated with the spinlocks will dominate the sample counts. Improving the performance of the spinlocks will not improve the performance of the experiment, but improving the performance of whatever is causing the bottleneck (likely something towards the top of non-spinlock symbols) can.

perf top

The perf top command will likely be the most practical mode of perf for profiling all parts of a Shadow experiment. It requires one of: root access, appropriately set up Linux capabilities, or a system configured to allow performance monitoring (similar to attaching to processes with gdb), so isn't always available, but is very simple when it is. The interface is similar to top's, but provides information on the granularity of symbols, across the entire system. This means you will be able to tell which specific functions in Shadow, the simulated processes, and the kernel are consuming CPU time.

When perf top can't find symbol information for a process, it will display the offset of the instruction as hex instead. (Note this means it will be ranked by instruction, rather than the entire function.) If you know where the respective executable or shared object file is, you can look up the name of the symbol for that instruction's function by opening the file with gdb and running info symbol [ADDRESS]. If gdb can't find the symbols either, you can look it up manually using readelf -s and finding the symbol with the largest address smaller than the offset you are looking for (note that readelf does not output the symbols in order of address; you can pipe the output to awk '{$1=""; print $0}' | sort to get a sorted list).

Details on more options (e.g., for filtering the sampled CPUs or processes) can be found in man perf top.

perf record

If you know which particular process you wish to profile, perf record can give far greater detail than other options. To use it for Shadow, either run it when starting Shadow:

perf record shadow shadow.config.yaml > shadow.log

Or, attach to a running Shadow process:

perf record -p <PID>

Attaching to a process requires similar permissions as perf top, but can be used to profile any process, including the simulated processes launched by Shadow.

The perf record process will write a perf.data file when you press Ctrl-c, or Shadow ends. You can then analyze the report:

perf report

More details are available in man perf record and man perf report.

Stability Guarantees

Shadow generally follows the semantic versioning principles:

  • PATCH version increases (ex: 2.0.1 to 2.0.2) are intended for bug fixes.
  • MINOR version increases (ex: 2.0.2 to 2.1.0) are intended for new backwards-compatible features and changes.
  • MAJOR version increases (ex: 1.2.2 to 2.0.0) are intended for incompatible changes.

More specifically, we aim to provide the following guarantees between MINOR versions:

  • Command line and configuration option changes and additions will be backwards-compatible.
    • Default values for existing options will not change.
  • File and directory names in Shadow's data directory (general.data_directory) will not change.
  • Support for any of Shadow's supported platforms will not be dropped, unless those platforms no longer receive free updates and support from the distribution's developer.
  • We will not change the criteria for the minimum supported Linux kernel version as documented in supported platforms. (Note though that this still allows us to increase the minimum kernel version as a result of dropping support for a platform, which we may do as noted above).

The following may change between ANY versions (MAJOR, MINOR, or PATCH):

  • The log format and messages.
  • Experimental options may change or be removed.
  • The simulation may produce different results.
  • New files may be added in Shadow's data directory (general.data_directory).
    • If new files are added in Shadow's host-data directories, they will begin with the prefix <process name>.<pid>.

Non-goal: Security

Never run code under Shadow that you wouldn't trust enough to run outside of Shadow on the same system at the same level of privilege.

While Shadow uses some of the same techniques used by other systems to isolate potentially vulnerable or malicious software, this is not a design goal of Shadow. A managed program in a Shadow simulation can, if it tries to, detect that it's running under such a simulation and break out of the "sandbox" to issue native system calls.

For example:

  • Shadow currently doesn't restrict access to the host file system. A malicious managed program can read and modify the same files that Shadow itself can.
  • Shadow inserts some code via LD_PRELOAD into managed processes. This code intentionally has the ability to make non-interposed system calls (which it uses to communicate with the Shadow process), and makes no effort to protect itself from the managed code running in the same process.

Reporting security issues

Security issues can be reported to unique_halberd_0m@icloud.com .

Limitations and workarounds

Shadow can typically run applications without modification, but there are a few limitations to be aware of.

If you are severely affected by one of these limitations (or another not listed here) let us know, as this can help us prioritize our improvements to Shadow. You may reach out in our discussions or issues.

Unimplemented system calls and options

When Shadow encounters a syscall or a syscall option that it hasn't implemented, it will generally return ENOSYS and log at warn level or higher. In many such cases the application is able to recover, and this has little or no effect on the ultimate results of the simulation.

There are some syscalls that shadow doesn't quite emulate faithfully, but has a "best effort" implementation. As with unimplemented sysalls, shadow logs at warn level when encountering such a syscall.

vfork

A notable example of a not-quite faithfully implemented syscall is vfork, which shadow effectively implements as a synonym for fork. Usage of vfork that is compliant with the POSIX.1 specification that "behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value...". However, usage that relies on specific Linux implementation details of vfork (e.g. that a write to a global variable from the child will be observed by the parent) won't work correctly.

As in other such cases, shadow logs a warning when it encounters vfork, so that users can identify it as the potential source of problems if a simulation doesn't work as expected.

IPv6

Shadow does not yet implement IPv6. Most applications can be configured to use IPv4 instead. Tracking issue: #2216.

Statically linked executables

Shadow relies on LD_PRELOAD to inject code into the managed processes. This doesn't work for statically linked executables. Tracking issue: #1839.

Most applications can be dynamically linked, though occasionally you may need to edit build scripts and/or recompile.

golang

golang typically defaults to producing statically linked executables, unless the application uses cgo. Using the networking functionality of golang's standard library usually pulls in cgo by default and thus results in a dynamically linked executable.

You can also explicitly force go to produce a dynamically linked executable. e.g.

# Install a dynamically linked `std`
go install -buildmode=shared std
# Build your application with dynamic linking
go build -linkshared myapp.go

Busy loops

By default, Shadow runs each thread of managed processes until it's blocked by a syscall such as nanosleep, read, select, futex, or epoll. Likewise, time only moves forward when Shadow is blocked on such a call - Shadow effectively models the CPU as being infinitely fast. This model is generally sufficient for modeling non-CPU-bound network applications.

Unfortunately this model can lead to deadlock in the case of "busy loops", where a thread repeatedly checks for something to happen indefinitely or until some amount of wall-clock-time has passed. e.g., a worker thread might repeatedly check whether work is available for some amount of time before going to sleep on a futex, to avoid the latency of going to sleep and waking back up in cases where work arrives quickly. However since Shadow normally doesn't advance time when making non-blocking syscalls or allow other threads to run, such a loop can run indefinitely, deadlocking the whole simulation.

When feasible, it's usually good practice to modify such loops to have a bound on the number of iterations instead of or in addition to a bound on wallclock time.

For cases where modifying the loop is infeasible, Shadow provides the option --model-unblocked-syscall-latency. When this option is enabled, Shadow moves time forward a small amount on every syscall (and VDSO function call), and switches to another thread if one becomes runnable in the meantime (e.g. because network data arrived when the clock moved forward, unblocking it).

This feature should only be used when it's needed to get around such loops. Some limitations:

  • It may cause the simulation to run slower.

    • Enabling this feature forces Shadow to switch between threads more frequently, which is costly and hurts cache performance. We have minimized this effect to the extent that we can, but it can especially hurt performance when there are multiple unblocked threads on a single simulated Host, forcing Shadow to keep switching between them to keep the simulated time synchronized.

    • Busy loops intrinsically waste some CPU cycles. Outside of Shadow this can be a tradeoff for improved latency by avoiding a thread switch. However, in a Shadow simulation this latency isn't modeled, so busy-looping instead of blocking immediately has no benefit to simulated performance; only cost to simulation performance. If feasible, changing the busy-loop to block immediately instead of spinning should improve simulation performance without substantially affecting simulation results.

  • It's not meant as an accurate model of syscall latency. It generally models syscalls as being somewhat faster than they would be on a real system to minimize the impact on simulation results.

  • Nonetheless it does affect simulation results. Arguably this model is more accurate, since syscalls on real systems do take non-zero time, but it makes the time model more complex to understand and reason about.

  • It still doesn't account for time spent by the CPU executing code, which also means that a busy-loop that makes no syscalls at all can still lead to deadlock. Fortunately such busy loops are rare and are generally agreed upon to be bugs, since they'd also potentially monopolize a CPU indefinitely when run natively.

For more about this topic, see #1792.

Compatibility Notes

libopenblas

libopenblas is a fairly low-level library, and can get pulled in transitively via dependencies. e.g., tgen uses libigraph, which links against liblapack, which links against blas.

libopenblas, when compiled with pthread support, uses busy-loops in its worker threads.

There are several known workarounds:

  • Use Shadow's --model-unblocked-syscall-latency feature. See busy-loops for details and caveats.

  • Use a different implementation of libblas. e.g. on Ubuntu, there are several alternative packages that can provide libblas. In particular, libblas3 doesn't have this issue.

  • Install libopenblas compiled without pthread support. e.g. on Ubuntu this can be obtained by installing libopenblas0-serial instead of libopenblas0-pthread.

  • Configure libopenblas to not use threads at runtime. This can be done by setting the environment variable OPENBLAS_NUM_THREADS=1, in the process's environment attribute in the Shadow config. Example: tor-minimal.yaml:109

See also:

cURL

Example

general:
  stop_time: 10s
  model_unblocked_syscall_latency: true

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: python3
      args: -m http.server 80
      start_time: 0s
      expected_final_state: running
  client1: &client_host
    network_node_id: 0
    processes:
    - path: curl
      args: -s server
      start_time: 2s
  client2: *client_host
  client3: *client_host
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/curl.1000.stdout

Notes

  1. Older versions of cURL use a busy loop that is incompatible with Shadow and will cause Shadow to deadlock. model_unblocked_syscall_latency works around this (see busy-loops). Newer versions of cURL, such as the version provided in Ubuntu 20.04, don't have this issue. See issue #1794 for details.

Wget2

Example

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: python3
      args: -m http.server 80
      start_time: 0s
      expected_final_state: running
  client1: &client_host
    network_node_id: 0
    processes:
    - path: wget2
      args: --no-tcp-fastopen server
      start_time: 2s
  client2: *client_host
  client3: *client_host
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/index.html

Notes

  1. Shadow doesn't support TCP_FASTOPEN so you must run Wget2 using the --no-tcp-fastopen option.

Nginx

Example

shadow.yaml

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: nginx
      args: -c ../../../nginx.conf -p .
      start_time: 0s
      expected_final_state: running
  client1: &client_host
    network_node_id: 0
    processes:
    - path: curl
      args: -s server
      start_time: 2s
  client2: *client_host
  client3: *client_host

nginx.conf

error_log stderr;

# shadow wants to run nginx in the foreground
daemon off;

# shadow doesn't support some syscalls that nginx uses to set up and control
# worker child processes.
# https://github.com/shadow/shadow/issues/3174
master_process off;
worker_processes 0;

# don't use the system pid file
pid nginx.pid;

events {
  # we're not using any workers, so this is the maximum number
  # of simultaneous connections we can support
  worker_connections 1024;
}

http {
  include             /etc/nginx/mime.types;
  default_type        application/octet-stream;

  # shadow does not support sendfile()
  sendfile off;

  access_log off;

  server {
    listen 80;

    location / {
      root /var/www/html;
      index index.nginx-debian.html;
    }
  }
}
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client1/curl.1000.stdout

Notes

  1. Shadow currently doesn't support some syscalls that nginx uses to set up and control worker child processes, so you must disable additional processes using master_process off and worker_processes 0. See https://github.com/shadow/shadow/issues/3174.

  2. Shadow doesn't support sendfile() so you must disable it using sendfile off.

iPerf 2

Example

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: iperf
      args: -s
      start_time: 0s
      expected_final_state: running
  client:
    network_node_id: 0
    processes:
    - path: iperf
      args: -c server -t 5
      start_time: 2s
rm -rf shadow.data; shadow shadow.yaml > shadow.log

Notes

  1. You must use an iPerf 2 version >= 2.1.1. Older versions of iPerf have a no-syscall busy loop that is incompatible with Shadow.

iPerf 3

Example

general:
  stop_time: 10s
  model_unblocked_syscall_latency: true

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: iperf3
      args: -s --bind 0.0.0.0
      start_time: 0s
      # Tell shadow to expect this process to still be running at the end of the
      # simulation.
      expected_final_state: running
  client:
    network_node_id: 0
    processes:
    - path: iperf3
      args: -c server -t 5
      start_time: 2s
rm -rf shadow.data; shadow shadow.yaml > shadow.log

Notes

  1. By default iPerf 3 servers bind to an IPv6 address, but Shadow doesn't support IPv6. Instead you need to bind the server to an IPv4 address such as 0.0.0.0.

  2. The iPerf 3 server exits with a non-zero error code and the message "unable to start listener for connections: Address already in use" after the client disconnects. This is likely due to Shadow not supporting the SO_REUSEADDR socket option.

  3. iPerf 3 uses a busy loop that is incompatible with Shadow and will cause Shadow to deadlock. A workaround is to use the model_unblocked_syscall_latency option.

etcd (distributed key-value store)

Example

Example for etcd version 3.3.x.

general:
  stop_time: 30s

network:
  graph:
    type: gml
    inline: |
      graph [
        node [
          id 0
          host_bandwidth_down "20 Mbit"
          host_bandwidth_up "20 Mbit"
        ]
        edge [
          source 0
          target 0
          latency "150 ms"
          packet_loss 0.01
        ]
      ]

hosts:
  server1:
    network_node_id: 0
    processes:
    - path: etcd
      args:
        --name server1
        --log-output=stdout
        --initial-cluster-token etcd-cluster-1
        --initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
        --listen-client-urls http://0.0.0.0:2379
        --advertise-client-urls http://server1:2379
        --listen-peer-urls http://0.0.0.0:2380
        --initial-advertise-peer-urls http://server1:2380
      expected_final_state: running
    - path: etcdctl
      args: set my-key my-value
      start_time: 10s
  server2:
    network_node_id: 0
    processes:
    - path: etcd
      # each etcd peer must have a different start time
      # https://github.com/shadow/shadow/issues/2858
      start_time: 1ms
      args:
        --name server2
        --log-output=stdout
        --initial-cluster-token etcd-cluster-1
        --initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
        --listen-client-urls http://0.0.0.0:2379
        --advertise-client-urls http://server2:2379
        --listen-peer-urls http://0.0.0.0:2380
        --initial-advertise-peer-urls http://server2:2380
      expected_final_state: running
    - path: etcdctl
      args: get my-key
      start_time: 12s
  server3:
    network_node_id: 0
    processes:
    - path: etcd
      start_time: 2ms
      args:
        --name server3
        --log-output=stdout
        --initial-cluster-token etcd-cluster-1
        --initial-cluster 'server1=http://server1:2380,server2=http://server2:2380,server3=http://server3:2380'
        --listen-client-urls http://0.0.0.0:2379
        --advertise-client-urls http://server3:2379
        --listen-peer-urls http://0.0.0.0:2380
        --initial-advertise-peer-urls http://server3:2380
      expected_final_state: running
    - path: etcdctl
      args: get my-key
      start_time: 12s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/*/etcdctl.*.stdout

Notes

  1. The etcd binary must not be statically linked. You can build a dynamically linked version by replacing CGO_ENABLED=0 with CGO_ENABLED=1 in etcd's scripts/build.sh and scripts/build_lib.sh scripts. The etcd packages included in the Debian and Ubuntu APT repositories are dynamically linked, so they can be used directly.

  2. Each etcd peer must be started at a different time since etcd uses the current time as an RNG seed. See issue #2858 for details.

  3. If using etcd version greater than 3.5.4, you must build etcd from source and comment out the keepalive period assignment as Shadow does not support this.

CTorrent and opentracker

Example

general:
  stop_time: 60s

network:
  graph:
    type: 1_gbit_switch

hosts:
  tracker:
    network_node_id: 0
    processes:
    - path: opentracker
      # Tell shadow to expect this process to still be running at the end of the
      # simulation.
      expected_final_state: running
  uploader:
    network_node_id: 0
    processes:
    - path: cp
      args: ../../../foo .
      start_time: 10s
    # Create the torrent file
    - path: ctorrent
      args: -t foo -s example.torrent -u http://tracker:6969/announce
      start_time: 11s
    # Serve the torrent
    - path: ctorrent
      args: example.torrent
      start_time: 12s
      expected_final_state: running
  downloader1: &downloader_host
    network_node_id: 0
    processes:
    # Download and share the torrent
    - path: ctorrent
      args: ../uploader/example.torrent
      start_time: 30s
      expected_final_state: running
  downloader2: *downloader_host
  downloader3: *downloader_host
  downloader4: *downloader_host
  downloader5: *downloader_host
echo "bar" > foo
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/downloader1/foo

Notes

  1. Shadow must be run as a non-root user since opentracker will attempt to drop privileges if it detects that the effective user is root.

http-server

Example

general:
  stop_time: 10s
  model_unblocked_syscall_latency: true

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: node
      args: /usr/local/bin/http-server -p 80 -d
      start_time: 3s
      expected_final_state: running
  client:
    network_node_id: 0
    processes:
    - path: curl
      args: -s server
      start_time: 5s
rm -rf shadow.data; shadow shadow.yaml > shadow.log
cat shadow.data/hosts/client/curl.1000.stdout

Notes

  1. Either the Node.js runtime or http-server uses a busy loop that is incompatible with Shadow and will cause Shadow to deadlock. model_unblocked_syscall_latency works around this (see busy-loops).

Contributing

Summary:

  • contribute changes through pull requests
  • encouraged to use issue and discussion posts to notify us beforehand
  • changes must include tests
    • System call tests (domain-specific and fuzz tests) (required)
    • Unit tests (preferred when possible)
    • Regression tests (as needed)
    • Application tests (as needed)
  • pull requests should be easy to review
  • changes should be easy to maintain
  • new code should be written in Rust
  • see our coding guide and best practices for Shadow

New features, bug fixes, documentation changes, etc. can be submitted through a GitHub pull request. For large changes we encourage you to post an issue or discussion before submitting your pull request so that you can make sure your changes fit well with the direction of the project. This is especially applicable to large changes. This way you won't spend time writing a pull request that we can't merge into Shadow. For details about how to draft pull requests and respond to reviewer feedback, see our additional documentation.

All pull requests with new or changed features should contain tests to validate that they work as expected and that they mirror similar behaviour in Linux. If changes or additions are made that affect Shadow's system call support, the pull request must also include system call tests that test the new or changed behaviour. The more tests that you include, the more confident that we'll be that the changes are correct, and the more likely it will be that your changes can be merged. We know that tests aren't very exciting to write, but Shadow relies heavily on tests to catch broken features and discrepancies with Linux. For more information about writing tests for Shadow, see our "Writing Tests" documentation.

Shadow is a community-supported project and the maintainers might not have a lot of time to review pull requests. Submitting pull requests with good documentation, tests, clear commit messages, and concise changes will help the maintainers with their reviews, and also help increase the likelihood that we will be able to merge your changes.

A core principle of Shadow development is that the project should be easy to maintain. This means that we try to reduce the number of dependencies when possible, and when we need to add new dependencies they should be popular well-used dependencies with community support. This also means that it is unlikely that we will add new dependencies for non-rust packages (for example distro packages). Shadow is supported on multiple Linux platforms with different packaging styles (APT and DNF) and different package versions, so distro packages are difficult to support and maintain across all of our supported platforms.

The main Shadow code base currently consists of both Rust and C code. We have been migrating our C code to Rust, but this migration is still in progress. All new code should be written in Rust. This includes the main Shadow application, the shim, and tests. Exceptions may be made for bug fixes or when the change is small and is in existing C code.

While we've been moving Shadow to Rust, we've learned a lot and have changed some designs. This means that the existing Shadow code is not always consistent in the way that it designs features or uses third-party libraries. For best practices and details about writing new code for Shadow, see our coding documentation.

If you have any questions about contributing to Shadow, feel free to ask us by making a new discussion post.

Coding style

Logging

In Rust code, we use the log framework for logging. In C code we use a wrapper library that also uses Rust's log framework internally.

For general guidance on what levels to log at, see log::Level.

Some shadow-specific log level policies:

  • We reserve the Error level for situations in which the shadow process as a whole will exit with a non-zero code. Conversely, when shadow exits with a non-zero code, the user should be able to get some idea of what caused it by looking at the Error-level log entries.

  • Warning should be used for messages that ought to be checked by the user before trusting the results of a simulation. For example, we use these in syscall handlers when an unimplemented syscall or option is used, and shadow is forced to return something like ENOTSUP, EINVAL or ENOSYS. In such cases the simulation is able to continue, and might still be representative of what would happen on a real Linux system; e.g. libc will often fall back to an older syscall, resulting in minimal impact on the simulated behavior of the managed process.

Clang-format

Our C code formatting style is defined in our clang-format configuration file. We try to avoid mass re-formatting, but generally any lines you modify should be reformatted using clang-format.

To add Ctrl-k as a "format region" in visual and select modes of vim, add the following to your .vimrc:

vmap <C-K> :py3f /usr/share/vim/addons/syntax/clang-format.py<cr>

Alternatively you can use the git-clang-format tool on the command-line to modify the lines touched by your commits.

Rustfmt

To format your Rust code, run cargo fmt in the src directory.

(cd src && cargo fmt)

Clippy

We use Clippy to help detect errors and non-idiomatic Rust code. You can run clippy locally with:

(cd src && cargo clippy)

Including headers

Which headers to include

Every source and header file should directly include the headers that export all referenced symbols and macros.

In a C file, includes should be broken into blocks, with the includes sorted alphabetically within each block. The blocks should occur in this order:

  • The C file's corresponding header; e.g. foo.h for foo.c. This enforces that the header is self-contained; i.e. doesn't depend on other headers to be included before it.
  • System headers are included next to minimize unintentionally exposing any macros we define to them.
  • Any other necessary internal headers.

This style is loosely based on that used in glib and supported by the include what you use tool.

Inclusion style

Headers included from within the project should use quote-includes, and should use paths relative to src/. e.g. #include "main/utility/byte_queue.h", not #include "byte_queue.h" (even from within the same directory), and not #include <main/utility/byte_queue.h>.

Headers included external to this repository should use angle-bracket includes. e.g. #include <glib.h>, not #include "glib.h".

Writing Tests

Tests for Shadow generally fall into four categories:

  • system call tests
  • regression tests
  • application tests
  • unit tests

Some of these tests may be marked as "extra" tests, which means they are not run by default.

System call tests

Shadow executes real unmodified applications and co-opts them by intercepting and interposing at the system call API. This means that Shadow must try to emulate Linux system calls. Shadow doesn't always need to emulate every system call exactly as Linux does, but it's usually good to try to emulate Linux as closely as possible. When Shadow deviates from Linux behaviour, Shadow is less likely to accurately represent real-world behaviour in its simulation.

When writing new system call handlers or modifying the behaviour of existing ones, it's important to write tests that verify the correctness of the new behaviour. These system call tests are required in pull requests that add to or modify the behaviour of Shadow's system calls. Usually this means that tests are written which execute the system call with a variety of arguments, and we verify that the system call returns the same values in both Linux and Shadow.

These tests fall into two categories: domain-specific system call tests and fuzzing tests. The domain-specific tests should test the system call under a variety of typical use cases, as well as some edge cases (for example passing NULL pointers, negative lengths, etc). The fuzz tests should test many various combinations of the possible argument values. These two types of tests are discussed further below.

Our existing tests are not always consistent in how the tests are organized or designed, so you don't need to follow the exact same design as other tests in the Shadow repository. If you're adding new tests to an existing file, you should try to write the tests in a similar style to the existing tests.

These tests typically use the libc library to test the system calls; for example libc::listen(fd, 10). For the most part the tests assume that the libc system call wrappers are the same as the kernel system calls themselves, but this is not always the case. Sometimes they differ and you might want to make the system call directly (for example the glibc fork() system call wrapper usually makes a clone system call, not a fork system call), or there might not be a libc wrapper for the system call that you wish to test (for example set_tid_address). In this case you probably want to use the linux-api library which makes the system call directly without using a third-party library like glibc. The linux-api library only implements a handful of system calls, and we've been adding more as we need them. You may need to add support for the system call you wish to test to linux-api.

These tests are run emulated within Shadow and natively outside of Shadow. This is done using the CMake add_linux_tests and add_shadow_tests macros. The tests are built by Cargo and then run by CMake. For example the listen tests use:

add_linux_tests(BASENAME listen COMMAND sh -c "../../../target/debug/test_listen --libc-passing")
add_shadow_tests(BASENAME listen)

which results in the CMake tests:

1/2 Test #110: listen-shadow ....................   Passed    0.56 sec
2/2 Test #109: listen-linux .....................   Passed   10.12 sec

Domain-specific system call tests

Here is an example of an existing test for the listen system call:

#![allow(unused)]
fn main() {
/// Test listen using a backlog of 0.
fn test_zero_backlog(
    domain: libc::c_int,
    sock_type: libc::c_int,
    flag: libc::c_int,
    bind: Option<SockAddr>,
) -> Result<(), String> {
    let fd = unsafe { libc::socket(domain, sock_type | flag, 0) };
    assert!(fd >= 0);

    if let Some(address) = bind {
        bind_fd(fd, address);
    }

    let args = ListenArguments { fd, backlog: 0 };

    let expected_errno = match (domain, sock_type, bind) {
        (libc::AF_INET, libc::SOCK_STREAM, _) => None,
        (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, Some(_)) => None,
        (libc::AF_UNIX, libc::SOCK_STREAM | libc::SOCK_SEQPACKET, None) => Some(libc::EINVAL),
        (_, libc::SOCK_DGRAM, _) => Some(libc::EOPNOTSUPP),
        _ => unimplemented!(),
    };

    test_utils::run_and_close_fds(&[fd], || check_listen_call(&args, expected_errno))
}
}

There are many listen tests including the one above, such as test_zero_backlog, test_negative_backlog, test_large_backlog, test_listen_twice, test_reduced_backlog, and more.

Fuzz tests

"Fuzz"-style testing is another way we test syscalls: they use some support code to test many various combinations of the possible argument values expected by a syscall, and verify that the return value for each combination of arguments is the same as what Linux returns. Because the developer usually writes these tests to cover most or all possible argument combinations, it ensures that Shadow's emulation of the syscall is highly accurate.

Fuzz tests can be a bit trickier to write, especially for more complicated syscalls, and sometimes they don't make sense (e.g., when testing what happens when trying to connect() to a TCP server with a full accept queue). They often help us find inconsistent behavior between Shadow and Linux and help us make Shadow more accurate, so we prefer that fuzz tests are included with pull requests when possible.

There are some good examples of writing fuzz tests in our time-related test code in src/test/time. For example, the clock_nanosleep test demonstrates how to test the syscall with all combinations of its arguments with both valid and invalid values.

Unit tests

Shadow supports unit tests for rust code. These can be written as standard rust unit tests. These tests run natively and not under Shadow, but they are also run under Miri and Loom as "extra" tests.

For example see the IntervalMap tests.

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    // ...

    #[test]
    fn test_insert_into_empty() {
        let mut m = IntervalMap::new();
        insert_and_validate(&mut m, 10..20, "x", &[], &[(10..20, "x")]);
    }

    // ...
}
}
1/1 Test #1: rust-unit-tests ..................   Passed  149.52 sec

Regression tests

Sometimes it's useful to write a regression test that doesn't belong under any specific system call tests. These tests can be written like the system call tests above, but are stored in the src/test/regression/ directory.

Application tests

It's often useful to test that applications behave correctly in Shadow. These tests do not replace the need for the system call tests above, but can complement them. For example we have tor and tgen tests. These help prevent regressions where we accidentally break Tor behaviour.

We also run our examples as tests. These examples include those in our documentation (for example see the "getting started" example) as well as other application examples.

Extra tests

Any of the tests above may be configured as an "extra" test. These tests are not run by default and require that Shadow is built and tested using the --extra flag.

./setup build --test --extra
./setup test --extra

These are usually tests that require extra dependencies, tests which take a long time to build or run, or tests which might be difficult to maintain. These tests may be removed at any time if they become difficult to maintain or they update to require features that Shadow doesn't or can't support. An example could be if an application is using epoll, but then updates to use io_uring which Shadow doesn't support (and would take a lot of effort to support), we would need to remove the test.

Extra tests currently run in the CI environment but only under a single platform, so they're not as well tested as non-"extra" tests.

Pull requests (PRs)

Clean commits

Ideally, every commit in history of the main branch should:

  • Be a focused, self-contained change.
  • Have a commit message that summarizes the change and explains why the change is being made, if not self-evident.
  • Build (./setup build --test).
  • Pass tests (./setup test).

Drafting a PR

PRs should be split into smaller, more focused, changes when feasible. However, we also want to avoid polluting the history with commits that don't build or pass tests, or commits within a single PR that fix a mistake earlier in the PR. While iterating on the PR, the --fixup and --squash flags are useful for committing changes that should ultimately be merged with one of the earlier commits.

When creating a pull request, we suggest you first create it as a draft. This will still trigger our continuous-integration checks, and give you a chance resolve any issues with those (i.e. broken tests) before requesting review.

Once done iterating, first consider using git rebase -i --autosquash to clean up your commit history, and then force pushing to update your PR. Finally, take the pull request out of draft mode to signal that you're ready for review.

Responding to review feedback

During PR review, please do not rebase or force-push, since this makes it difficult to see what's changed between rounds of review. Consider using --fixup and --squash for commits responding to review feedback, so that they can be appropriately squashed before the final merge. git autofixup can also be useful for generating --fixup commits.

Merging

When the PR is ready to be merged, the reviewer might ask you to git rebase and force push to clean up history, or might do it themselves.

For the maintainer doing the merge:

If the PR is relatively small, or if it's not worth the effort of rewriting history into clean commits, use the "squash and merge" strategy.

If the individual commits appear to be useful to keep around in our history, instead use the "create a merge commit" strategy. There's no need to review every individual commit when using this strategy, but if the intermediate commits are obviously low quality consider using the "squash and merge strategy" instead. Note that since this strategy creates a merge commit, we can still later identify and filter out the intermediate commits if desired, e.g. with git log --first-parent main.

We've disabled the "Rebase and merge" option, since it does a fast-forward merge, which makes the intermediate commits indistingishuable from the validated and reviewed final state of the PR.

A common task is to rebase a PR on main so that it is up to date, perhaps fix some conflicts or add some changes to the PR, and then push the updated branch to test it in the CI before merging. Suppose a user contributor submitted a branch bugfix as PR 1234, and has allowed the maintainers to update the PR. Then you could fetch the branch to perform work on the PR locally:

git fetch origin pull/1234/head:pr-1234
git checkout pr-1234
git rebase main
<fix conflicts or commit other changes>
git push -f git@github.com:contributor/shadow.git pr-1234:bugfix
git checkout main
git branch -D pr-1234

If it passes the tests, you can merge the PR in the Github interface as usual.

Coding

Building the guide

cargo install mdbook
(cd mdbook && mdbook build)
firefox build/guide/index.html

Building the rust docs

(cd src && cargo doc --workspace --exclude shadow-tests)

Generating compiler command database

Many tools benefit from a compiler command database, conventionally in a file called compile_commands.json. If shadow's setup script finds the bear tool on your PATH, it will automatically use it to create and update build/compile_commands.json when running setup build.

Files and descriptors

Shadow currently has two ways of simulating descriptors. The first is LegacyDescriptor which is written in C and is used for most descriptor/file types (IP sockets, epoll, files, etc). With this type, the epoll file / posix description and its descriptor live in the same object. The second way of simulating descriptors is in Rust, where we have a File type that can be referenced by many Descriptor objects. This allows us to easily implement dup() for descriptors implemented with this new code. Our plan is to move existing legacy descriptors over to these new Rust file types.

Platform (libc and Linux) crates

We use several Rust crates for accessing platform functionality and definitions. Roughly from lowest-level to highest-level:

  • Our linux-api crate provides fairly low-level bindings over the Linux kernel headers, and a few nix-style higher-level wrappers. It does not depend on std or libc. It also re-exports these definitions as a C library that can be used without creating conflicts with libc headers or linux system headers. Use this when working with the syscall ABI (such as when implementing syscall handlers), and for making syscalls when none of the higher-level crates are suitable (see below).

  • libc provides fairly low-level bindings of system libc standard headers. If you need syscall-level ABI-compatibility, use linux-api instead. If you don't, prefer one of the higher-level crates.

  • nix provides a safer and more Rust-idiomatic layer on top of the libc crate, as well as adapters for underlying libc definitions. There's currently a lot of usage of this in Shadow, but we're trying to move away from it. In most scenarios, one of the other crates mentioned here is a more appropriate choice.

  • rustix provides a similar API to nix, but can be configured not to depend on std or libc. This is useful in code that's linked into Shadow's shim, where we don't want to depend on std or libc.

  • Rust's std crate provides, well, the standard way of interacting with the platform, in a portable and Rust-idiomatic way. This is generally the right choice for code that doesn't run in Shadow's shim, in places we're not concerned about the precise syscalls that get executed.

When choosing which one to use:

  • For code that will be linked into shadow's shim, prefer rustix. In cases where rustix doesn't provide the desired functionality, or in C code, or when we need precise control over what syscall is made with what parameters, use linux-api.

    We want to minimize, and ideally eliminate, usage of libc from the shim. libc has global state that can easily become corrupted when we use it from the shim, which is LD_PRELOADed into managed programs. This is especially because much of the shim executes in the context of SIGSYS signal handlers, meaning we might already be in a non-reentrant, non-async-signal-safe libc function higher in the stack. See also https://github.com/shadow/shadow/milestone/54.

  • For shadow's syscall handler implementations, prefer linux-api.

    Since we are intercepting and implementing at the syscall level, the interface we are implementing is the Linux syscall ABI interface. Therefore we should be careful to use Linux's definitions for the parameters and return values. While types and constants in libc are often equivalent to kernel types and constants with the same names, there are many known cases where they aren't, and in general there's no guarantee even that one that is consistent today will remain consistent tomorrow. See also https://github.com/shadow/shadow/issues/3007.

    This also applies when implementing a syscall by delegating to the host system. For example suppose we implement a fcntl syscall by by making a native fcntl syscall on the native file descriptor. Making the syscall directly is the most straightforward way to "pass through" exactly the original intended semantics. If we use a higher level interface, even libc, we have to be careful about translating the parameters and return values back and forth between the two different API layers.

  • For code that runs in the shadow process, where we are acting as a "normal" program that wants to interact with the kernel, generally prefer the highest-level interface that provides the necessary functionality. e.g. when creating worker threads in Rust, we generally use std::thread; there's no reason to use one of the lower level crates. Occasionally we need some functionality not provided in std though, in which case it makes sense to drop down to one of the lower level crates.

  • In tests, any of the above can make sense. In places we're specifically trying to test shadow's emulation of some functionality, making direct syscalls, e.g. with the linux-api crate or libc's syscall function, is the most direct and precise approach. On the other hand, we often want to test higher level interfaces as a form of integration testing, since those are more typically what managed programs use. We usually focus on testing at the libc interface, since most managed programs use that interface, and it's low-level enough to be able to control and understand what's happening at the syscall level. For incidental system functionality in tests (e.g. creating a temp file, in a test that isn't specifically trying to test that functionality) it usually makes sense to use whatever interface is most idiomatic and convenient.

deny(unsafe_op_in_unsafe_fn)

All crates should use #![deny(unsafe_op_in_unsafe_fn)]. When adding a new crate, remember to add this to the lib.rs or main.rs.

[https://github.com/rust-lang/rfcs/blob/master/text/2585-unsafe-block-in-unsafe-fn.md]

No longer treat the body of an unsafe fn as being an unsafe block. To avoid a breaking change, this is a warning now and may become an error in a future edition.

This helps make it clearer where "unsafe" code is being used and can make reviewing code easier.

Debugging

Debugging the Shadow process

Shadow is currently built with debugging symbols in both debug and release builds, though it may be easier to debug a debug build (generated by passing the --debug flag to setup build).

Shadow can be run under GDB by prepending gdb --args to its command-line. e.g.:

gdb --args shadow shadow.yaml

An alternative is to run Shadow with the --gdb flag, which will pause shadow after startup and print its PID. You can then simply attach GDB to Shadow in a new terminal and continue the experiment.

Example:

# terminal 1
# shadow will print its PID and pause
$ shadow --gdb shadow.yaml > shadow.log
** Starting Shadow
** Pausing with SIGTSTP to enable debugger attachment (pid 1234)

# terminal 2
$ gdb --pid=1234
> continue

Troubleshooting

If when loading the shadow binary in gdb you see the error:

Reading symbols from /my/binary...
Dwarf Error: DW_FORM_strx1 found in non-DWO CU [in module /my/binary]
(No debugging symbols found in /my/binary)

It's likely that the version of Rust that you're building Shadow with is incompatible with your version of GDB. You can read more at Rust issue #98746.

Debugging managed processes

A simulation's managed processes are implemented as native OS processes, with their syscalls interposed by Shadow. Since they are native processes, many normal tools for inspecting native processes can be used on those as well. e.g. top will show how much CPU and memory they are using.

Generating a core file

If a managed process is crashing, it is sometimes easiest to let the native process to generate a core file, and then use GDB to inspect it afterwards.

# Enable core dumps.
ulimit -c unlimited

# Ensure core dumps are written to a file.
# e.g. this is sometimes needed in Ubuntu to override the default behavior of
# piping the core file to the system crash handler.
echo core | sudo tee /proc/sys/kernel/core_pattern

# Run the simulation in which a process is crashing.
shadow shadow.yaml

# Tell gdb to inspect the core file. From within gdb you'll be able to
# inspect the state of the process when it was killed.
gdb <path-to-process-executable> <path-to-core-file>

Attaching with GDB

You can attach GDB directly to the managed process. To make this easier you can use the --debug-hosts option to pause Shadow after it launches each managed process on the given hosts. Shadow will print the native process' PID before stopping. For example, --debug-hosts client,server will pause Shadow after launching any managed processees on hosts "client" and "server". This allows you to attach GDB directly to those managed processes before resuming Shadow.

# terminal 1
$ shadow --debug-hosts client,server shadow.yaml > shadow.log
** Starting Shadow
** Pausing with SIGTSTP to enable debugger attachment to managed process 'server.nginx.1000' (pid 1234)
** If running Shadow under Bash, resume Shadow by pressing Ctrl-Z to background this task and then typing "fg".
** (If you wish to kill Shadow, type "kill %%" instead.)
** If running Shadow under GDB, resume Shadow by typing "signal SIGCONT".

# terminal 2
$ gdb --pid=1234

Debugging with GDB

In managed processes, Shadow uses SIGSYS and SIGSEGV to intercept system calls and some CPU instructions. By default, GDB stops every time these signals are raised. In most cases you'll want to override this behavior to silently continue executing instead:

(gdb) handle SIGSYS noprint
(gdb) handle SIGSEGV noprint

Once you have reached a point of interest, it's often useful to look at the backtrace for the current stack:

(gdb) bt

In multi-threaded applications, you can get a backtrace for all stacks:

(gdb) thread apply all bt

Profiling

Profiling can be useful for improving the performance of experiments, either as improvements to the implementation of Shadow itself, or in altering the configuration of the experiments you are running.

Profiling with top/htop

Tools like top and htop will give good first-order approximations for what Shadow is doing. While they can only give system-wide to thread-level granularity, this can often still tell you important details such as whether Shadow, the simulated processes, or the kernel are consuming memory and processor cycles. E.g., if you're running into memory constraints, the RES or MEM column of these tools can tell you where to start looking for ways to address that. If execution time is too long, sorting by CPU or TIME can provide insight into where that time is being spent.

One limitation to note is that Shadow relies on spinlocks in barriers for some of its operation. Especially when running with many threads, these spinlocks will show as consuming most of the CPU anytime the simulation is bottlenecked on few simulated processes. Telling when this is happening can be difficult in these tools, because no symbol information is available.

Profiling with perf

The perf tool is a powerful interface to the Linux kernel's performance counter subsystem. See man perf or the perf wiki for full details on how to use it, but some highlights most relevant to Shadow execution time are given here.

Regardless of how you are using perf, the aforementioned complication of spinlocks in Shadow apply. Namely, when there is any bottleneck on the barrier, the symbols associated with the spinlocks will dominate the sample counts. Improving the performance of the spinlocks will not improve the performance of the experiment, but improving the performance of whatever is causing the bottleneck (likely something towards the top of non-spinlock symbols) can.

perf top

The perf top command will likely be the most practical mode of perf for profiling all parts of a Shadow experiment. It requires one of: root access, appropriately set up Linux capabilities, or a system configured to allow performance monitoring (similar to attaching to processes with gdb), so isn't always available, but is very simple when it is. The interface is similar to top's, but provides information on the granularity of symbols, across the entire system. This means you will be able to tell which specific functions in Shadow, the simulated processes, and the kernel are consuming CPU time.

When perf top can't find symbol information for a process, it will display the offset of the instruction as hex instead. (Note this means it will be ranked by instruction, rather than the entire function.) If you know where the respective executable or shared object file is, you can look up the name of the symbol for that instruction's function by opening the file with gdb and running info symbol [ADDRESS]. If gdb can't find the symbols either, you can look it up manually using readelf -s and finding the symbol with the largest address smaller than the offset you are looking for (note that readelf does not output the symbols in order of address; you can pipe the output to awk '{$1=""; print $0}' | sort to get a sorted list).

Details on more options (e.g., for filtering the sampled CPUs or processes) can be found in man perf top.

perf record

If you know which particular process you wish to profile, perf record can give far greater detail than other options. To use it for Shadow, either run it when starting Shadow:

perf record shadow shadow.config.yaml > shadow.log

Or, attach to a running Shadow process:

perf record -p <PID>

Attaching to a process requires similar permissions as perf top, but can be used to profile any process, including the simulated processes launched by Shadow.

The perf record process will write a perf.data file when you press Ctrl-c, or Shadow ends. You can then analyze the report:

perf report

More details are available in man perf record and man perf report.

Testing for Nondeterminism

If you run Shadow twice with the same seed (the -s or --seed command line options), then it should produce deterministic results (it's a bug if it doesn't).

If you find non-deterministic behavior in your Shadow experiment, please consider helping to diagnose the problem by opening a new issue.

Comparing strace output (experimental)

Shadow has an experimental feature for logging most system calls made by the managed process in a format similar to the strace tool. You can enable this with the strace_logging_mode option. You can compare this strace log from two simulations to look for non-deterministic behaviour. To avoid capturing memory addresses and uninitialized memory in the log, you should use the deterministic logging mode.

For example, after running two simulations with --strace-logging-mode deterministic where the results are in the shadow.data.1 and shadow.data.2 directories, you could run something like the following bash script:

#!/bin/bash

found_difference=0

for SUFFIX in \
    hosts/fileserver/tgen.1000.strace \
    hosts/client/tgen.1000.strace
do
    diff --brief shadow.data.1/${SUFFIX} shadow.data.2/${SUFFIX}
    exit_code=$?

    if (($exit_code != 0)); then
      found_difference=1
    fi
done

if (($found_difference == 1)); then
  echo -e "\033[0;31mDetected difference in output (Shadow may be non-deterministic).\033[0m"
else
  echo -e "\033[0;32mDid not detect difference in Shadow output (Shadow may be deterministic).\033[0m"
fi

Comparing application output

A good way to check this is to compare the log output of an application that was run in Shadow. For example, after running two TGen simulations where the results are in the shadow.data.1 and shadow.data.2 directories, you could run something like the following bash script:

#!/bin/bash

found_difference=0

for SUFFIX in \
    hosts/fileserver/tgen.1000.stdout \
    hosts/client/tgen.1000.stdout
do
    ## ignore memory addresses in log file with `sed 's/0x[0-9a-f]*/HEX/g' FILENAME`
    sed -i 's/0x[0-9a-f]*/HEX/g' shadow.data.1/${SUFFIX}
    sed -i 's/0x[0-9a-f]*/HEX/g' shadow.data.2/${SUFFIX}

    diff --brief shadow.data.1/${SUFFIX} shadow.data.2/${SUFFIX}
    exit_code=$?

    if (($exit_code != 0)); then
      found_difference=1
    fi
done

if (($found_difference == 1)); then
  echo -e "\033[0;31mDetected difference in output (Shadow may be non-deterministic).\033[0m"
else
  echo -e "\033[0;32mDid not detect difference in Shadow output (Shadow may be deterministic).\033[0m"
fi

Extra Tests

Shadow includes tests that require additional dependencies, such as Tor, TGen, networkx, obfs4proxy, and golang. These aren't run by default, but are run as part of the CI tests.

To run them locally, first make sure that both tor and tgen are located on your shell's PATH You should also install all of Shadow's optional dependencies.

To run the golang tests you will need to both install golang, and install a dynamic version of the golang standard library. The latter can be done with go install -buildmode=shared -linkshared std.

It is recommended to build Shadow in release mode, otherwise the Tor tests may not complete before the timeout.

./setup build --test --extra
./setup test --extra
# To exclude the TGen and Tor tests (for example if you built Shadow in debug mode)
./setup test --extra -- --label-exclude "tgen|tor"
# To include only the TGen tests
./setup test --extra tgen
# To run a specific TGen test
./setup test --extra tgen-duration-1mbit_300ms-1000streams-shadow

If you change the version of tor located at ~/.local/bin/tor, make sure to re-run ./setup build --test.

Miri

rustup toolchain install nightly
rustup +nightly component add miri

# Disable isolation for some tests that use the current time (Instant::now).
# Disable leak-checking for now. Some tests intentionally panic, causing leaks.
export MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-ignore-leaks"

(cd src && cargo +nightly miri test --workspace)

Continuous integration tests

On GitHub

Our continuous integration tests build and test Shadow on every supported platform and configuration. GitHub runs these tests automatically when making or modifying a pull request, in the build and test workflow. Pull requests without passing integration tests are blocked from merging.

Running locally

We also have scripts for running the continuous integration tests locally, inside Docker containers. This can be useful for debugging and for quickly iterating on a test that's failing in GitHub's test runs.

The run.sh script builds shadow inside a Docker image, and runs our tests in it.

By default, the script will attempt to use a Docker image with already shadow built, perform an incremental build on top of that, and then run shadow's tests. If you don't already have a local image, the script will implicitly try to pull from the shadowsim/shadow-ci on dockerhub. You can override this repo with -r or force the script to build a new image locally with -i.

For example, to perform an incremental build and test on ubuntu 24.04, with the gcc compiler in debug mode:

ci/run.sh -c ubuntu:24.04 -C gcc -b debug

If the tests fail, shadow's build directory, including test outputs, will be copied from the ephemeral Docker container into ci/build.

For additional options, run ci/run.sh -h.

Debugging locally

After a local run fails, you can use Docker to help debug it. If you previously ran the tests without the -i option, re-run with the -i option to rebuild the Docker image(s). If Shadow was built successfully and the failure happened at the testing step, then the Docker image was built and tagged, and you can run an interactive shell in a container built from that image.

e.g.:

docker run --shm-size=1024g --security-opt=seccomp=unconfined -it shadowsim/shadow-ci:ubuntu-24.04-gcc-debug /bin/bash

If the failure happened in the middle of building the Docker image, you can do the same with the last intermediate layer that was built successfully. e.g. given the output:

$ ci/run.sh -i -c ubuntu:24.04 -C gcc -b debug
<snip>
Step 13/13 : RUN . ci/container_scripts/build_and_install.sh
 ---> Running in a11c4a554ef8
<snip>
    516 [ERROR] Non - zero return code from make.

You can start a container from the image where Docker tried (and failed) to run ci/the build_and_install.sh script was executed with:

docker run --shm-size=1024g --security-opt=seccomp=unconfined -it a11c4a554ef8 /bin/bash

Maintainer playbook

Tagging Shadow releases

Before creating a new release, be sure to handle all issues in its GitHub Project. Issues that can wait until the next release can be moved to the next release's project (which you may need to create). Remaining issues should be resolved before continuing with the release process.

We use Semantic Versioning, and increment version numbers with the bumpversion tool.

The following commands can be used to tag a new version of Shadow, after which an archive will be available on github's releases page.

Install bumpversion if needed:

python3 -m venv bumpenv
source bumpenv/bin/activate
pip install -U pip
pip install bumpversion

Make sure main is up to date:

git checkout main
git pull

The bumpversion command is run like this (it is recommended to add --dry-run --verbose until you are confident in the result):

bumpversion --dry-run --verbose <major|minor|patch|release|build>

Decide which part of the version you are bumping. Our format is {major}.{minor}.{patch}-{release}.{build}. Bumping earlier parts of the version will cause later parts to get reset to 0 (or 'pre' for the release part). For example, if you are at 2.0.0, going to 2.1.0-pre is easy:

bumpversion minor --tag --commit

In the above case, we can just tag and commit immediately. But if you are going from 2.0.0 to 2.1.0, you'll need to either run twice (first to bump the minor from 0 to 2, then to bump the release from 'pre' to the invisible 'stable'):

bumpversion minor
bumpversion --allow-dirty release --commit --tag

or use the serialize option to specify the intended format of the next version:

bumpversion minor --serialize '{major}.{minor}.{patch}' --commit --tag

Now check that things worked and get the new version number:

git log -1 --stat
git describe --tags
VERSION=`awk -F "=" '/current_version/ {print $2}' .bumpversion.cfg | tr -d ' '`

Update the Cargo lock file, then ammend the commit and tag to include the update (closely check and update the Bump version: from → to messages as needed):

(cd src && cargo update --workspace)
git add src/Cargo.lock
git commit --amend
git tag -f -a "v$VERSION"

Check again:

git log -1 --stat
git describe --tags

Now if everything looks good, push to GitHub:

git push origin "v$VERSION"

Our releases will then be tagged off of the main branch.

You probably want to also reset the CHANGELOG.md file in a new commit after tagging/pushing the release.

Format of Shadow Log Messages

❗ Warning
The format of the log messages is not
stable and may change at any time.

Log Line Prefix

Shadow produces simulator log messages in the following format:

real-time [thread-id:thread-name] virtual-time [loglevel] [hostname:ip] [src-file:line-number] [function-name] MESSAGE
  • real-time:
    the wall clock time since the start of the experiment, represented as hours:minutes:seconds
  • thread-id:
    the thread id (as returned by gettid) of the system thread that generated the message.
  • thread-name:
    the name of the system thread that generated the message
  • virtual-time:
    the simulated time since the start of the experiment, represented as hours:minutes:seconds
  • loglevel:
    one of ERROR < WARN < INFO < DEBUG < TRACE, in that order
  • hostname:
    the name of the host as specified in hosts.<hostname> of the simulation config
  • ip:
    the IP address of the host as specified in hosts.<hostname>.ip_address_hint of the simulation config, or a random IP address if one is not specified
  • src-file:
    the name of the source code file where the message is logged
  • line-number:
    the line number in the source code file where the message is logged
  • function-name:
    the name of the function logging the message
  • MESSAGE:
    the actual message to be logged

By default, Shadow only prints core messages at or below the info log level. This behavior can be changed using the Shadow option -l or --log-level to increase or decrease the verbosity of the output. As mentioned in the example from the previous section, the output from each application process is stored in separate log files beneath the shadow.data directory, and the format of those log files is application-specific (i.e., Shadow writes application output directly to file).

Heartbeat Messages

Shadow logs simulator heartbeat messages that contain useful system information for each virtual node in the experiment, in messages containing the string shadow-heartbeat. By default, these heartbeats are logged once per second, but the frequency can be changed using the --heartbeat-frequency option to Shadow (see shadow --help).

There are currently three heartbeat statistic subsystems: node, socket, and ram. For each subsystem that is enabled, Shadow will print a 'header' message followed by regular message every frequency interval. The 'header' messages generally describe the statistics that are printed in the regular messages for that subsystem.

The following are examples of the statistics that are available for each subsystem:

Node:

[node-header] interval-seconds,recv-bytes,send-bytes,cpu-percent,delayed-count,avgdelay-milliseconds;inbound-localhost-counters;outbound-localhost-counters;inbound-remote-counters;outbound-remote-counters where counters are: packets-total,bytes-total,packets-control,bytes-control-header,packets-control-retrans,bytes-control-header-retrans,packets-data,bytes-data-header,bytes-data-payload,packets-data-retrans,bytes-data-header-retrans,bytes-data-payload-retrans

Socket:

[socket-header] descriptor-number,protocol-string,hostname:port-peer;inbuflen-bytes,inbufsize-bytes,outbuflen-bytes,outbufsize-bytes;recv-bytes,send-bytes;inbound-localhost-counters;outbound-localhost-counters;inbound-remote-counters;outbound-remote-counters|...where counters are: packets-total,bytes-total,packets-control,bytes-control-header,packets-control-retrans,bytes-control-header-retrans,packets-data,bytes-data-header,bytes-data-payload,packets-data-retrans,bytes-data-header-retrans,bytes-data-payload-retrans

Ram:

[ram-header] interval-seconds,alloc-bytes,dealloc-bytes,total-bytes,pointers-count,failfree-count

Parsing Shadow Log Messages

❗ Warning
The heartbeat/tracker log messages are considered experimental
and may change or be removed at any time.

Shadow logs simulator heartbeat messages that contain useful system information for each virtual host in the experiment. For example, Shadow logs the number of bytes sent/received, number of bytes allocated/deallocated, CPU usage, etc. You can parse these heartbeat log messages to get insight into the simulation. Details of these heartbeat messages can be found here, and they can be enabled by setting the experimental experimental.host_heartbeat_interval configuration option.

Example Simulation Data

The methods we describe below can be used on the output from and Shadow simulation. Here, we use the output from the Traffic Generation example simulation for illustrative purposes.

Parsing and Plotting Results

Shadow includes some Python scripts that can parse important statistics from the Shadow log messages, including network throughput over time, client download statistics, and client load statistics, and then visualize the results. The following will parse and plot the output produced from the above experiment:

# parse the shadow output file
src/tools/parse-shadow.py --help
src/tools/parse-shadow.py --prefix results shadow.log
# plot the results
src/tools/plot-shadow.py --help
src/tools/plot-shadow.py --data results "example-plots"

The parse-*.py scripts generate stats.*.json.xz files. The (heavily trimmed) contents of stats.shadow.json look like the following:

$ xzcat results/stats.shadow.json.xz
{
  "nodes": {
    "client:11.0.0.1": {
      "recv": {
        "bytes_control_header": {
          "0": 0,
          "1": 0,
          "2": 0,
          ...
          "599": 0
        },
        "bytes_control_header_retrans": { ... },
        "bytes_data_header": { ... },
        "bytes_data_header_retrans": { ... },
        "bytes_data_payload": { ... },
        "bytes_data_payload_retrans": { ... },
        "bytes_total": { ... }
      },
      "send": { ... }
    },
    "server:11.0.0.2": { ... }
  },
  "ticks": {
    "2": {
      "maxrss_gib": 0.162216,
      "time_seconds": 0.070114
    },
    "3": { ... },
    ...
    "599": { ... }
  }
}

The plot-*.py scripts generate graphs. Open the PDF file that was created to see the graphed results.

Comparing Data from Multiple Simulations

Consider a set of experiments where we would like to analyze the effect of changing our hosts' socket receive buffer sizes. We run the following 2 experiments:

# delete any existing simulation data and post-processing
rm -rf shadow.{data,log} 10KiB.{data,results,log} 100KiB.{data,results,log} *.results.pdf
shadow --socket-recv-buffer  10KiB --socket-recv-autotune false \
       --data-directory  10KiB.data shadow.yaml >  10KiB.log
shadow --socket-recv-buffer 100KiB --socket-recv-autotune false \
       --data-directory 100KiB.data shadow.yaml > 100KiB.log

To parse these log files, we use the following scripts:

src/tools/parse-shadow.py --prefix=10KiB.results   10KiB.log
src/tools/parse-shadow.py --prefix=100KiB.results 100KiB.log

Each of the directories 10KiB.results/ and 100KiB.results/ now contain data statistics files extracted from the log files. We can now combine and visualize these results with the plot-shadow.py script:

src/tools/plot-shadow.py --prefix "recv-buffer" --data 10KiB.results/ "10 KiB" --data 100KiB.results/ "100 KiB"

Open the PDF file that was created to compare results from the experiments.

National Science Foundation Sponsorship

Project Title: Expanding Research Frontiers with a Next-Generation Anonymous Communication Experimentation (ACE) Framework

Project Period: October 1, 2019 - September 30, 2022 2023 (extended)

Abstract: NSF Award Abstract #1925497

The goal of this project is to develop a scalable and mature deterministic network simulator, capable of quickly and accurately simulating large networks such as Tor. This project builds on the Shadow Simulator.

NSF Project Overview

ACE will be developed with the following features:

  • Application Emulation. Learning from the community’s experience, ACE will directly execute software and run applications as normal operating system processes. By supporting the general execution of applications (i.e., anything that can be executed as a process: network servers, web browsers, scripts, etc.), ACE will support software independent of the programming language chosen by developers, and ACE will maximize its applicability to a large range of evaluation approaches that CISE researchers choose to utilize. As a result, ACE will be well-suited to website fingerprinting and censorship circumvention research focus areas, which typically require running a variety of tools written in a variety of languages.
  • Network Simulation. ACE will feature a light-weight network simulation component that will allow applications to communicate with each other through the ACE framework rather than over the Internet. ACE will simulate common transport protocols, such as TCP and UDP. ACE will also simulate virtual network routers and other network path components between end-hosts, and support evaluation under dynamic changes to timing, congestion, latency, bandwidth, network location, and network path elements. Therefore, ACE will support both network-aware and location-aware anonymous communication research and allow researchers to continue to advance this research agenda in current and future Internet architectures.
  • Function Interposition. ACE will utilize function interposition in order to connect the processes being run by the operating system to the core network simulation component. ACE will support an API of common system calls that are used to, e.g., send and receive data to and from the network. Therefore, all processes executed in ACE will be isolated from the Internet and connected through ACE’s simulated network, and the simulation component will drive process execution.
  • Controlled, Deterministic Execution. ACE features a deterministic discrete-event engine, and will therefore control time and operate in simulated timescales. As a result, ACE will be disconnected from the computational abilities of the host machine: ACE will run as-fast-as-possible, which could be faster or slower than real time depending on experiment load. ACE is deterministic so that research results can be independently and identically reproduced and verified across research labs.
  • Parallel and Distributed Execution. ACE will rely on the operating system kernel to run and manage processes. Operating system kernels have been optimized for this task, and ACE will benefit in terms of better performance and a smaller code base. Moreover, ACE will be embarrassingly parallel: the Linux kernel generally scales to millions of processes that can be run in parallel, and we will design ACE such that any number of processes can be executed across multiple distinct machines. Therefore, ACE will scale to realistically-sized anonymous communication networks containing millions of virtual hosts, and can be deployed on whatever existing infrastructure is available at community members' institutions.

As part of the ACE framework, we will also develop a user interface to control and monitor the experimental process, a toolkit to help users set up and configure experiments (including network, mobility, and traffic characteristics and models) and to visualize results, and a data repository where researchers can share and archive experimental results.

Project Goals/Activities

Here we outline some high level tasks that we are completing or plan to complete under this project. We are using Github for project development, including for tracking progress on major milestones and development tasks. We provide an outline of our agenda here, and link to the appropriate Github page where appropriate. Tasks without corresponding Github links means we don't yet have progress to share at this time.

  • Task 0: Investigate Architectural Improvements

    • Build prototype of a process-based simulation architecture - milestone
    • Evaluate and compare against a plugin-based simulation architecture
    • Decide which architecture is right for ACE
  • Task 1: Develop Core ACE System

    • Improve test coverage and infrastructure - shadow milestone, shadow-plugin-tor milestone
    • Enable new code to be written in Rust - milestone
    • Improve consistency of simulation options and configuration
    • Improve maintainability and accuracy of TCP implementation - milestone
    • Simplify event scheduler, implement continuous event execution model
    • Build a distributed core simulation engine
    • Develop CPU usage model to ensure virtual process CPU utilization consumes simulation time
  • Task 2: Develop User Interface and Visualizations

    • Design control protocol and API for interacting with Shadow
    • Specify/document protocol
    • Develop user interface that uses the control API
    • Improve tools for analyzing and understanding simulation results
  • Task 3: Develop Simulation Models for ACE

    • Improve tools for generating and configuring private Tor networks
    • Improve tools for generating and configuring background traffic models
    • Improve tools for modeling Internet paths and latency
    • Develop support for mobile hosts
    • Create realistic host mobility models
  • Task 4: Engage Community

    • Create data repository where users can share configs and results
    • Create user outreach material and surveys to collect feedback
    • Improve user documentation and usage instructions

Over all tasks, we plan to significantly improve documentation, test coverage, and code maintainability.

People

  • Rob Jansen - Project Leader, Principal Investigator, U.S. Naval Research Laboratory
  • Roger Dingledine - Principal Investigator, The Tor Project
  • Micah Sherr - Principal Investigator, Georgetown University
  • Jim Newsome - Developer, The Tor Project
  • Steven Engler - Developer, Georgetown University / The Tor Project