DFT++ memory estimation: (for mpi_serial version)

(all objects are measured in complex, which is 2*double, 16 bytes)

Permanent objects:

FFT boxes: Nx * Ny * Nz (printed as fftbox size)
  • n -- charge density
  • d -- electrostatic potential
  • Vlocps -- local pseudopotential
  • Vscloc -- local part of self-consistent potential
(each processor has a copy of these)
Wavefunction objects: nbasis * nbands * nkpoints
  • C -- wavefunction
  • Y -- unconstrained wavefunction (used for minimization)
(these objects are spread over processors)
other smaller ones.
Matrices -- of size (nbands * nbands)

Temporary objects: (could persist over long period of time)

FFT boxes:
there could be up to 3 temporaries created (and later deallocated) in subroutines, such as solve_poisson(), diagouterI(), etc.
Wavefunction temporaries: (depend on algorithms)
  • CG:
    • Ygrad_now
    • Ygrad_old
    • Ydir -- each the same size as C, or Y.
  • PCG:
    • Ygrad_now
    • Ygrad_old
    • pYgrad_now
    • Ydir
  • CG_nocos:
    • Ygrad
    • Ydir
  • PCG_nocos:
    • Ygrad
    • pYgrad
    • Ydir
  • EOM:
    • Ygrad
  • PSD:
    • Ygrad
    • pYgrad
and there is also a temporary allocated in Y^Y type of calculation.
Nonlocal pseudopotential calculation:
temporary wavefunction type objects of size natoms * nbasis
So the total memory requirement per processor can be

(4 + 3) * FFT_boxes + (2 + 1 + 1-4) * Wave_function + 1 * Vnl
= 7 * NxNyNz + (4-7 * nbasis * nbands * nkpoints + natoms * nbasis)/Nproc