DFT++ memory estimation: (for mpi_serial version)
(all objects are measured in complex, which is 2*double, 16 bytes)
Permanent objects:
-
- FFT boxes: Nx * Ny * Nz (printed as fftbox size)
-
- n -- charge density
- d -- electrostatic potential
- Vlocps -- local pseudopotential
- Vscloc -- local part of self-consistent potential
(each processor has a copy of these)
- Wavefunction objects: nbasis * nbands * nkpoints
-
- C -- wavefunction
- Y -- unconstrained wavefunction (used for minimization)
(these objects are spread over processors)
- other smaller ones.
-
Matrices -- of size (nbands * nbands)
Temporary objects: (could persist over long period of time)
-
- FFT boxes:
-
there could be up to 3 temporaries created (and later deallocated) in
subroutines, such as solve_poisson(), diagouterI(), etc.
- Wavefunction temporaries: (depend on algorithms)
-
- CG:
- Ygrad_now
- Ygrad_old
- Ydir -- each the same size as C, or Y.
- PCG:
- Ygrad_now
- Ygrad_old
- pYgrad_now
- Ydir
- CG_nocos:
- PCG_nocos:
- EOM:
- PSD:
and there is also a temporary allocated in Y^Y type of calculation.
- Nonlocal pseudopotential calculation:
-
temporary wavefunction type objects of size
natoms * nbasis
So the total memory requirement per processor can be
(4 + 3) * FFT_boxes + (2 + 1 + 1-4) * Wave_function + 1 * Vnl
= 7 * NxNyNz + (4-7 * nbasis * nbands * nkpoints + natoms * nbasis)/Nproc