Profiling
Profiling can be useful for improving the performance of experiments, either as improvements to the implementation of Shadow itself, or in altering the configuration of the experiments you are running.
Profiling with top
/htop
Tools like top
and htop
will give good first-order approximations for what
Shadow is doing. While they can only give system-wide to thread-level
granularity, this can often still tell you important details such as whether
Shadow, the simulated processes, or the kernel are consuming memory and
processor cycles. E.g., if you're running into memory constraints, the RES
or
MEM
column of these tools can tell you where to start looking for ways to
address that. If execution time is too long, sorting by CPU
or TIME
can
provide insight into where that time is being spent.
One limitation to note is that Shadow relies on spinlocks in barriers for some of its operation. Especially when running with many threads, these spinlocks will show as consuming most of the CPU anytime the simulation is bottlenecked on few simulated processes. Telling when this is happening can be difficult in these tools, because no symbol information is available.
Profiling with perf
The perf
tool is a powerful interface to the Linux kernel's performance
counter subsystem. See man perf
or the perf
wiki for full details on how
to use it, but some highlights most relevant to Shadow execution time are given
here.
Regardless of how you are using perf
, the aforementioned complication of
spinlocks in Shadow apply. Namely, when there is any bottleneck on the barrier,
the symbols associated with the spinlocks will dominate the sample
counts. Improving the performance of the spinlocks will not improve the
performance of the experiment, but improving the performance of whatever is
causing the bottleneck (likely something towards the top of non-spinlock
symbols) can.
perf top
The perf top
command will likely be the most practical mode of
perf
for profiling all parts of a Shadow experiment. It requires one
of: root access, appropriately set up Linux capabilities, or a system
configured to allow performance monitoring (similar to attaching to
processes with gdb
), so isn't always available, but is very simple
when it is. The interface is similar to top
's, but provides
information on the granularity of symbols, across the entire
system. This means you will be able to tell which specific functions
in Shadow, the simulated processes, and the kernel are consuming CPU
time.
When perf top
can't find symbol information for a process, it will display
the offset of the instruction as hex instead. (Note this means it will be
ranked by instruction, rather than the entire function.) If you know where the
respective executable or shared object file is, you can look up the name of the
symbol for that instruction's function by opening the file with gdb
and
running info symbol [ADDRESS]
. If gdb
can't find the symbols either, you
can look it up manually using readelf -s
and finding the symbol with the
largest address smaller than the offset you are looking for (note that
readelf
does not output the symbols in order of address; you can pipe the
output to awk '{$1=""; print $0}' | sort
to get a sorted list).
Details on more options (e.g., for filtering the sampled CPUs or processes) can
be found in man perf top
.
perf record
If you know which particular process you wish to profile, perf record
can
give far greater detail than other options. To use it for Shadow, either run it
when starting Shadow:
perf record shadow shadow.config.yaml > shadow.log
Or, attach to a running Shadow process:
perf record -p <PID>
Attaching to a process requires similar permissions as perf top
, but can be
used to profile any process, including the simulated processes launched by
Shadow.
The perf record
process will write a perf.data
file when you press Ctrl-c,
or Shadow ends. You can then analyze the report:
perf report
More details are available in man perf record
and man perf report
.