Limitations and workarounds

Shadow can typically run applications without modification, but there are a few limitations to be aware of.

If you are severely affected by one of these limitations (or another not listed here) let us know, as this can help us prioritize our improvements to Shadow. You may reach out in our discussions or issues.

Unimplemented system calls and options

When Shadow encounters a syscall or a syscall option that it hasn't implemented, it will generally return ENOSYS and log at warn level or higher. In many such cases the application is able to recover, and this has little or no effect on the ultimate results of the simulation.

There are some syscalls that shadow doesn't quite emulate faithfully, but has a "best effort" implementation. As with unimplemented sysalls, shadow logs at warn level when encountering such a syscall.

vfork

A notable example of a not-quite faithfully implemented syscall is vfork, which shadow effectively implements as a synonym for fork. Usage of vfork that is compliant with the POSIX.1 specification that "behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value...". However, usage that relies on specific Linux implementation details of vfork (e.g. that a write to a global variable from the child will be observed by the parent) won't work correctly.

As in other such cases, shadow logs a warning when it encounters vfork, so that users can identify it as the potential source of problems if a simulation doesn't work as expected.

IPv6

Shadow does not yet implement IPv6. Most applications can be configured to use IPv4 instead. Tracking issue: #2216.

Statically linked executables

Shadow relies on LD_PRELOAD to inject code into the managed processes. This doesn't work for statically linked executables. Tracking issue: #1839.

Most applications can be dynamically linked, though occasionally you may need to edit build scripts and/or recompile.

golang

golang typically defaults to producing statically linked executables, unless the application uses cgo. Using the networking functionality of golang's standard library usually pulls in cgo by default and thus results in a dynamically linked executable.

You can also explicitly force go to produce a dynamically linked executable. e.g.

# Install a dynamically linked `std`
go install -buildmode=shared std
# Build your application with dynamic linking
go build -linkshared myapp.go

Busy loops

By default, Shadow runs each thread of managed processes until it's blocked by a syscall such as nanosleep, read, select, futex, or epoll. Likewise, time only moves forward when Shadow is blocked on such a call - Shadow effectively models the CPU as being infinitely fast. This model is generally sufficient for modeling non-CPU-bound network applications.

Unfortunately this model can lead to deadlock in the case of "busy loops", where a thread repeatedly checks for something to happen indefinitely or until some amount of wall-clock-time has passed. e.g., a worker thread might repeatedly check whether work is available for some amount of time before going to sleep on a futex, to avoid the latency of going to sleep and waking back up in cases where work arrives quickly. However since Shadow normally doesn't advance time when making non-blocking syscalls or allow other threads to run, such a loop can run indefinitely, deadlocking the whole simulation.

When feasible, it's usually good practice to modify such loops to have a bound on the number of iterations instead of or in addition to a bound on wallclock time.

For cases where modifying the loop is infeasible, Shadow provides the option --model-unblocked-syscall-latency. When this option is enabled, Shadow moves time forward a small amount on every syscall (and VDSO function call), and switches to another thread if one becomes runnable in the meantime (e.g. because network data arrived when the clock moved forward, unblocking it).

This feature should only be used when it's needed to get around such loops. Some limitations:

  • It may cause the simulation to run slower.

    • Enabling this feature forces Shadow to switch between threads more frequently, which is costly and hurts cache performance. We have minimized this effect to the extent that we can, but it can especially hurt performance when there are multiple unblocked threads on a single simulated Host, forcing Shadow to keep switching between them to keep the simulated time synchronized.

    • Busy loops intrinsically waste some CPU cycles. Outside of Shadow this can be a tradeoff for improved latency by avoiding a thread switch. However, in a Shadow simulation this latency isn't modeled, so busy-looping instead of blocking immediately has no benefit to simulated performance; only cost to simulation performance. If feasible, changing the busy-loop to block immediately instead of spinning should improve simulation performance without substantially affecting simulation results.

  • It's not meant as an accurate model of syscall latency. It generally models syscalls as being somewhat faster than they would be on a real system to minimize the impact on simulation results.

  • Nonetheless it does affect simulation results. Arguably this model is more accurate, since syscalls on real systems do take non-zero time, but it makes the time model more complex to understand and reason about.

  • It still doesn't account for time spent by the CPU executing code, which also means that a busy-loop that makes no syscalls at all can still lead to deadlock. Fortunately such busy loops are rare and are generally agreed upon to be bugs, since they'd also potentially monopolize a CPU indefinitely when run natively.

For more about this topic, see #1792.