FFT::R2C Class
Class template FFT::R2C supports discrete Fourier transforms between real and complex data across MPI processes. The name R2C indicates that the forward transform converts real data to complex data, while the backward transform converts complex data to real data. It should be noted that both directions of transformation are supported, not just from real to complex.
The implementation utilizes cuFFT, rocFFT, oneMKL and FFTW, for CUDA, HIP,
SYCL and CPU builds, respectively. Because the parallel communication is
handled by AMReX, it does not need the parallel version of
FFTW. Furthermore, there is no constraint on the domain decomposition such
as one Box per process. This class performs parallel FFT on AMReX’s parallel
data containers (e.g., MultiFab and
FabArray<BaseFab<ComplexData<Real>>>.
Other than using column-majored order, AMReX follows the convention of FFTW. Applying the forward transform followed by the backward transform scales the original data by the size of the input array. The layout of the complex data also follows the FFTW convention, where the complex Hermitian output array has (nx/2+1,ny,nz) elements. Here nx, ny and nz are the sizes of the real array and the division is rounded down.
Below are examples of using FFT::R2C.
Geometry geom(...);
MultiFab mfin(...);
MultiFab mfout(...);
auto scaling = 1. / geom.Domain().d_numPts();
FFT::R2C r2c(geom.Domain());
r2c.forwardThenBackward(mfin, mfout,
[=] AMREX_GPU_DEVICE (int, int, int, auto& sp)
{
sp *= scaling;
});
// Use R2C provided spectral data layout.
auto const& [cba, cdm] = r2c.getSpectralDataLayout();
cMultiFab cmf(cba, cdm, 1, 0);
FFT::R2C<Real,FFT::Direction::forward> r2c_forward(geom.Domain());
r2c_forward(mfin, cmf);
FFT::R2C<Real,FFT::Direction::backward> r2c_backward(geom.Domain());
r2c_backward(cmf, mfout);
Note that using forwardThenBackward is expected to be more efficient
than separate calls to forward and backward because some
parallel communication can be avoided. For the spectral data, the example
above builds cMultiFab using FFT::R2C provided layout. You can
also use your own BoxArray and DistributionMapping, but it
might result in extra communication. It should also be noted that a lot of
preparation works are done in the construction of an FFT::R2C
object. Therefore, one should cache it for reuse if possible. Although
FFT::R2C does not have a default constructor, one could always use
std::unique_ptr<FFT::R2C<Real>> to store an object in one’s class.
Class template FFT::R2C also supports batched FFTs. The batch size is set
in an FFT::Info object passed to the constructor of
FFT::R2C. Below is an example.
int batch_size = 10;
Geometry geom(...);
MultiFab mf(ba, dm, batch_size, 0);
FFT::Info info{};
info.setBatchSize(batch_size));
FFT::R2C<Real,FFT::Direction::both> r2c(geom.Domain(), info);
auto const& [cba, cdm] = r2c.getSpectralDataLayout();
cMultiFab cmf(cba, cdm, batch_size, 0);
r2c.forward(mf, cmf);
// Do work on cmf.
// Function forwardThenBackward is not yet supported for a batched FFT.
r2c.backward(cmf, mf);
FFT::C2C Class
FFT::C2C is a class template that supports complex to complex Fourier
transforms. It has a similar interface as FFT::R2C.
FFT::LocalR2C Class
Class template FFT::LocalR2C supports local discrete Fourier transforms between real and complex data. The name R2C indicates that the forward transform converts real data to complex data, while the backward transform converts complex data to real data. It should be noted that both directions of transformation are supported, not just from real to complex.
Below is an example of using FFT::LocalR2C.
MultiFab mf(...);
BaseFab<GpuComplex<T>> cfab;
for (MFIter mfi(mf); mfi.isValid(); ++mfi) {
FFT::LocalR2C fft(mfi.fabbox().length());
cfab.resize(IntVect(0), fft.spectralSize()-1);
fft.forward(mf[mfi].dataPtr(), cfab.dataPtr());
}
Poisson Solver
AMReX provides FFT based Poisson solvers. Here, Poisson’s equation is
FFT::Poisson supports periodic (FFT::Boundary::periodic),
homogeneous Neumann (FFT::Boundary::even), and homogeneous Dirichlet
(FFT::Boundary::odd) boundaries using FFT. Below is an example of
using the solver.
Geometry geom(...);
MultiFab soln(...);
MultiFab rhs(...);
Array<std::pair<FFT::Boundary,FFT::Boundary>,AMREX_SPACEDIM>
fft_bc{...};
bool has_dirichlet = false;
for (int idim = 0; idim < AMREX_SPACEDIM; ++idim) {
has_dirichlet = has_dirichlet ||
fft_bc[idim].first == FFT::Boundary::odd ||
fft_bc[idim].second == FFT::Boundary::odd;
}
if (! has_dirichlet) {
// Shift rhs so that its sum is zero.
auto rhosum = rhs.sum(0);
rhs.plus(-rhosum/geom.Domain().d_numPts(), 0, 1);
}
FFT::Poisson fft_poisson(geom, fft_bc);
fft_poisson.solve(soln, rhs);
FFT::PoissonOpenBC is a 3D only solver that supports open
boundaries. Its implementation utilizes FFT::OpenBCSolver, which can
be used for implementing convolution based solvers with a user provided
Green’s function. If users want to extend the open BC solver to 2D or other
types of Green’s function, they could use FFT::PoissonOpenBC as an
example. Below is an example of solving Poisson’s equation with open
boundaries.
Geometry geom(...);
MultiFab soln(...); // soln can be either nodal or cell-centered.
MultiFab rhs(...); // rhs must have the same index type as soln.
int ng = ...; // ng can be non-zero, if we want to compute potential
// outside the domain.
FFT::PoissonOpenBC openbc_solver(geom, soln.ixType(), IntVect(ng));
openbc_solver.solve(soln, rhs);
FFT::PoissonHybrid is a 3D only solver that supports Dirichlet and
Neumann boundary in the last dimension. The last dimension is solved with a
tridiagonal solver that can support non-uniform cell size in the
z-direction. For most applications, FFT::Poisson should be used.
Similar to FFT::R2C, the Poisson solvers should be cached for reuse,
and one might need to use std::unique_ptr<FFT::Poisson<MultiFab>>
because there is no default constructor.
Raw Pointer Interface
If you only want to use AMReX as a Parallel FFT library without using other functionalities and data containers, you could use the raw pointer interface. Below is an example.
MPI_Init(&argc, &argv);
// We don't need to call the full-blown amrex::Initialize
amrex::Init_FFT(MPI_COMM_WORLD);
int nprocs, myproc;
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myproc);
using RT = double;
using CT = std::complex<RT>; // or cufftDoubleComplex, etc.
std::array<int,3> domain_size{128,128,128};
// FFT between real and complex.
// Domain decomposition is flexible. The only constraint for the raw
// pointer interface is that there can be only zero or one local box
// per process, whereas the MultiFab interface can take any number of
// boxes. In this case, we choose to do manual domain decomposition for
// the real (i.e., forward) domain, and use the domain decomposition
// provided by amrex for the complex (i.e., backward) domain.
{
amrex::FFT::R2C<RT,amrex::FFT::Direction::both> r2c(domain_size);
// amrex supports 1d, 2d and 3d domain decomposition. For simplicity,
// let's do 1d decomposition in the z-direction.
int nz = (domain_size[2] + nprocs - 1) / nprocs;
int zlo = nz * myproc;
nz = std::max(std::min(nz,domain_size[2]-zlo), 0);
std::array<int,3> local_start{0,0,zlo};
std::array<int,3> local_size{domain_size[0],domain_size[1],nz};
// Let amrex know the domain decomposition in the forward domain.
r2c.setLocalDomain(local_start,local_size);
// Use amrex's domain decomposition in the backward domain.
auto const& [local_start_sp, local_size_sp] = r2c.getLocalSpectralDomain();
auto nr = std::size_t(local_size[0])
* std::size_t(local_size[1])
* std::size_t(local_size[2]);
auto* pr = (RT*)std::malloc(sizeof(RT)*nr); // or use cudaMalloc
// Initialize data ...
auto nc = std::size_t(local_size_sp[0])
* std::size_t(local_size_sp[1])
* std::size_t(local_size_sp[2]);
auto* pc = (CT*)std::malloc(sizeof(CT)*nc); // or use cudaMalloc
r2c.forward(pr, pc); // forward transform from real to complex
// work on the complex data pointed by pc ...
r2c.backward(pc, pr); // backward transform from complex to real
std::free(pr);
std::free(pc);
}
// Batched FFT between complex and complex.
// In this case, we choose to use the domain decomposition provided
// by amrex for the forward domain, and do manual domain decomposition
// for the backward domain.
int nbatch = 3; // batch size
{
amrex::FFT::Info info{};
info.setBatchSize(nbatch);
amrex::FFT::C2C<RT,amrex::FFT::Direction::both> c2c(domain_size,info);
// Use amrex's domain decomposition in the forward domain.
auto const& [local_start, local_size] = c2c.getLocalDomain();
int nx = (domain_size[0] + nprocs - 1) / nprocs;
int xlo = nx * myproc;
nx = std::max(std::min(nx,domain_size[0]-xlo), 0);
std::array<int,3> local_start_sp{xlo,0,0};
std::array<int,3> local_size_sp{nx,domain_size[1],domain_size[2]};
// Let amrex know the domain decomposition in the backward domain.
c2c.setLocalSpectralDomain(local_start_sp, local_size_sp);
auto nf = std::size_t(local_size[0])
* std::size_t(local_size[1])
* std::size_t(local_size[2]);
auto* pf = (CT*)std::malloc(sizeof(CT)*nf*nbatch); // or use cudaMalloc
// Initialize data ...
auto nb = std::size_t(local_size_sp[0])
* std::size_t(local_size_sp[1])
* std::size_t(local_size_sp[2]);
auto* pb = (CT*)std::malloc(sizeof(CT)*nb*nbatch);
c2c.forward(pf, pb); // forward transform
// work on the data pointed by pb
c2c.backward(pb, pf); // backward transform
std::free(pf);
std::free(pb);
}
amrex::Finalize_FFT();
MPI_Finalize();