Parallel simulations
Some care must be taken when running multiple Shadow simulations on the same hardware at the same time. By default, Shadow pins threads to specific CPUs to avoid CPU migrations. The CPU selection logic isn't aware of other processes that may be using substantial CPU time and/or pinning, including other Shadow simulations. i.e. without some care, multiple Shadow simulations running on the same machine at the same time will generally end up trying to use the same set of CPUs, even if other CPUs on the machine are idle.
Disabling pinning
The simplest solution is to disable CPU pinning entirely. This has a substantial
performance penalty (with some reports as high as 3x), but can be a reasonable
solution for small simulations. Pinning can be disabled by passing
--use-cpu-pinning=false
to Shadow.
Setting an initial CPU affinity
Shadow checks the initial CPU affinity assigned to it, and only assigns to CPUs
within that set. The easiest way to run Shadow with a subset of CPUs is with the
taskset
utility. e.g. to start one Shadow simulation using CPUs 0-9 and
another using CPUs 10-19, you could use:
$ (cd sim1 && taskset --cpu-list 0-9 shadow sim1config.yml) &
$ (cd sim2 && taskset --cpu-list 10-19 shadow sim2config.yml) &
Shadow similarly avoids trying to pin to CPUs outside of its cgroup cpuset (see
cpuset(7)). This allows Shadow to work correctly in such scenarios (such as
running in a container on a shared machine that only has access to some CPUs),
but is generally more complex and requires higher privilege than setting the CPU
affinity with taskset
.
Choosing a CPU set
When assigning Shadow a subset of CPUs, some care must be taken to get optimal
performance. You can use the lscpu
utility to see the layout of the CPUs on
your machine.
- Avoid using multiple CPUs on the same core (aka hyperthreading). Such CPUs compete with each-other for resources.
- Prefer CPUs on the same socket and (NUMA) node. Such CPUs share cache, which is typically beneficial in Shadow simulations.
For example, given the lscpu
output:
$ lscpu --parse=cpu,core,socket,node
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node
0,0,0,0
1,1,1,1
2,2,0,0
3,3,1,1
4,4,0,0
5,5,1,1
6,6,0,0
7,7,1,1
8,8,0,0
9,9,1,1
10,10,0,0
11,11,1,1
12,12,0,0
13,13,1,1
14,14,0,0
15,15,1,1
16,16,0,0
17,17,1,1
18,18,0,0
19,19,1,1
20,20,0,0
21,21,1,1
22,22,0,0
23,23,1,1
24,24,0,0
25,25,1,1
26,26,0,0
27,27,1,1
28,28,0,0
29,29,1,1
30,30,0,0
31,31,1,1
32,32,0,0
33,33,1,1
34,34,0,0
35,35,1,1
36,36,0,0
37,37,1,1
38,38,0,0
39,39,1,1
40,0,0,0
41,1,1,1
42,2,0,0
43,3,1,1
44,4,0,0
45,5,1,1
46,6,0,0
47,7,1,1
48,8,0,0
49,9,1,1
50,10,0,0
51,11,1,1
52,12,0,0
53,13,1,1
54,14,0,0
55,15,1,1
56,16,0,0
57,17,1,1
58,18,0,0
59,19,1,1
60,20,0,0
61,21,1,1
62,22,0,0
63,23,1,1
64,24,0,0
65,25,1,1
66,26,0,0
67,27,1,1
68,28,0,0
69,29,1,1
70,30,0,0
71,31,1,1
72,32,0,0
73,33,1,1
74,34,0,0
75,35,1,1
76,36,0,0
77,37,1,1
78,38,0,0
79,39,1,1
A reasonable configuration for two simulations might be taskset --cpu-list 0-39:2
(CPUs 0,2,...,38) and taskset --cpu-list 1-39:2
. (CPUs 1,3,...,39).
This assignment leaves CPUs 40-79 idle, since those share the same physical
cores at CPUs 0-39, puts the first simulation on socket 0 and numa node 0, and
the second simulation on socket 1 and numa node 1.