Debugging
Debugging is an art. Everyone has their own favorite method. Here we offer a few tips we have found to be useful.
To help debugging, AMReX handles various signals in the C standard
library raised in the runs. This gives us a chance to print out more
information using Linux/Unix backtrace capability. The signals
include segmentation fault (or “segfault”), interruption by the user (control-c), assertion
errors, and floating point exceptions (NaNs, divided by zero and
overflow). The handling of segfault, assertion errors and
interruption by control-C are enabled by default. Note that
AMREX_ASSERT() is only on when compiled with DEBUG=TRUE or
USE_ASSERTION=TRUE in GNU make, or with -DCMAKE_BUILD_TYPE=Debug or
-DAMReX_ASSERTIONS=YES in CMake. The trapping of floating point exceptions is not
enabled by default unless the code is compiled with DEBUG=TRUE in GNU make, or with
-DCMAKE_BUILD_TYPE=Debug or -DAMReX_FPE=YES in CMake to turn on compiler flags
if supported. Alternatively, one can always use runtime parameters to control the
handling of floating point exceptions: amrex.fpe_trap_invalid for
NaNs, amrex.fpe_trap_zero for division by zero and
amrex.fpe_trap_overflow for overflow. To more effectively trap the
use of uninitialized values, AMReX also initializes FArrayBoxs in
MultiFabs and arrays allocated by bl_allocate to signaling NaNs when it is compiled
with TEST=TRUE or DEBUG=TRUE in GNU make, or with -DCMAKE_BUILD_TYPE=Debug in CMake.
One can also control the setting for FArrayBox using the runtime parameter, fab.init_snan.
Note for Macs, M1 and M2 chips using Arm64 architecture are not able to trap division by zero.
One can get more information than the backtrace of the call stack by
instrumenting the code. Here is an example.
You know the line Real rho = state(cell,0); is causing a segfault. You
could add a print statement before that. But it might print out
thousands (or even millions) of line before it hits the segfault. What
you could do is the following,
#include <AMReX_BLBackTrace.H>
std::ostringstream ss;
ss << "state.box() = " << state.box() << " cell = " << cell;
BL_BACKTRACE_PUSH(ss.str()); // PUSH takes std::string
Real rho = state(cell,0); // state is a Fab, and cell is an IntVect.
BL_BACKTRACE_POP(); // One can omit this line. In that case,
// there is an implicit POP when "PUSH" is
// out of scope.
When it hits the segfault, you will only see the last print out.
Writing a MultiFab to disk with
VisMF::Write(const FabArray<FArrayBox>& mf, const std::string& name)
in AMReX_VisMF.H and examining it with Amrvis (section
Amrvis) can be helpful as well. In
AMReX_MultiFabUtil.H, function
void print_state(const MultiFab& mf, const IntVect& cell, const int n=-1,
const IntVect& ng = IntVect::TheZeroVector());
can output the data for a single cell. n is the component, with the default being
to print all components. ng is the number of ghost cells to include.
Valgrind is one of our favorite debugging tools. For MPI runs, one can tell Valgrind to output to different files for different processes. For example,
mpiexec -n 4 valgrind --leak-check=yes --track-origins=yes --log-file=vallog.%p ./foo.exe ...
Breaking into Debuggers
In order to break into debuggers and use modern IDEs, the backtrace signal handling described above needs to be disabled.
The following runtime options need to be set in order to prevent AMReX from catching the break signals before a debugger can attach to a crashing process:
amrex.throw_exception = 1
amrex.signal_handling = 0
This default behavior can also be modified by applications, see for example this custom application initializer.
Basic Gpu Debugging
The asynchronous nature of GPU execution can make tracking down bugs complex.
The relative timing of improperly coded functions can cause variations in output and the timing of error messages
may not linearly relate to a place in the code.
One strategy to isolate specific kernel failures is to add amrex::Gpu::synchronize() or amrex::Gpu::streamSynchronize() after every ParallelFor or similar amrex::launch type call.
These synchronization commands will halt execution of the code until the GPU or GPU stream, respectively, has finished processing all previously requested tasks, thereby making it easier to locate and identify sources of error.
CUDA-Specific Tests
To test if your kernels have launched, run:
nvprof ./main3d.xxxIf using NVIDIA Nsight Compute instead, access
nvproffunctionality with:nsys nvprof ./main3d.xxxRun
nvprof -o profile%p.nvvp ./main3d.xxxxornsys profile -o nsys_out.%q{SLURM_PROCID}.%q{SLURM_JOBID} ./main3d.xxxfor a small problem and examine page faults usingnvvpornsight-sys $(pwd)/nsys_out.#.######.qdrep.Run under
cuda-memcheckor the newer versioncompute-sanitizerto identify memory errors.Run under
cuda-gdbto identify kernel errors.To help identify race conditions, globally disable asynchronicity of kernel launches for all CUDA applications by setting
CUDA_LAUNCH_BLOCKING=1in your environment variables. This will ensure that only one CUDA kernel will run at a time.
AMD ROCm-Specific Tests
To test if your kernels have launched, run:
rocprof ./main3d.xxxRun
rocprof --hsa-trace --stats --timestamp on --roctx-trace ./main3d.xxxxfor a small problem and examine tracing usingchrome://tracing.Run under
rocgdbfor source-level debugging.To help identify if there are race conditions, globally disable asynchronicity of kernel launches by setting
CUDA_LAUNCH_BLOCKING=1orHIP_LAUNCH_BLOCKING=1in your environment variables. This will ensure only one kernel will run at a time. See the AMD ROCm docs’ chicken bits section for more debugging environment variables.
Intel GPU Specific Tests
To test if your kernels have launched, run:
./ze_tracer ./main3d.xxxRun Intel Advisor,
advisor --collect=survey ./main3d.xxxfor a small problem with 1 MPI process and examine metrics.Run under
gdbwith the Intel Distribution for GDB.To report back-end information, set
ZE_DEBUG=1in your environment variables.