[Debichem-devel] ACESIII-3.0.7 Availability

Michael Banck mbanck at debian.org
Thu Aug 2 22:57:53 UTC 2012


Hi Ajith,

On Thu, Aug 02, 2012 at 08:00:01AM -0400, Ajith perera wrote:
> First thing that I am very puzzled by is  why you can not use lb in
> ACES II or dup in ACESIII.
 
Regarding ACESIII, I believe the issue is an infinite loop in the
DLAMC1-DLAMC3 auxiliary routines, at least this is the backtrace I get
when attaching gdb to a spinning process (32bit, test case 1.1.1.1),
after printing the geometry:

#0  dlamc3_ (a=0, b=1) at dlamc3.f:52
#1  0x08355084 in dlamc1_ (beta=0, t=0, rnd=.FALSE., ieee1=.FALSE.) at
dlamc1.f:185
#2  0x0834e93f in dlamc2_ (beta=1146246233, t=0, rnd=3217099528, eps=0,
emin=0, rmin=0, emax=0, rmax=0) at dlamc2.f:124
#3  0x08346b66 in dlamch_ (cmach=..., _cmach=12) at dlamch.f:90
#4  0x08345a2c in dsyev_ (jobz=..., uplo=..., narg=3, a=..., ldaarg=3,
w=..., work=..., lworkarg=9, info=0, _jobz=1, _uplo=1) at dsyev.F:177
#5  0x082b82a9 in eig_ (a=..., b=..., junk=3, n=3, sort=0) at eig.F:79
#6  0x082cad79 in symmetry_ (scratch=..., qtmp=..., newq=...,
nosilent=.TRUE.) at symmetry.F:120
#7  0x082be8b0 in geopt_ (z=..., memreq=15034000) at geopt.F:513
#8  0x082ba69c in aces2_joda_main_ () at aces2_joda_main.F:105
#9  0x080671bd in scf_init_ () at scf_init.F:43
#10 0x08061fe4 in aces3 () at beta.F:152
#11 0x0805d964 in main (argc=1, argv=0xbfc1b8e1 '../bin/xaces3\000') at
beta.F:1014
#12 0xb68dce16 in ?? ()
#13 0x0805d985 in _start ()

So it is entirely possible that dup works fine once this issue has been
sorted out.

It has been a while, but I have now re-checked ACESII, and it compiles
and runs fine (modulo some test suite failures which do not appear to be
related to lb).  So I retract my former comments, maybe skipping lb was
superseded by some Makefile changes we did later and we didn't check
back.

However, this is not true for gfortran-4.7 (the default for the current
development branch of the Debian distribution), I get problems with
memory allocation (test case zmat.001a):

 One- and two-electron integrals over symmetry-adapted AOs are
calculated.\n
 @READIN: Spherical harmonics are used.
  @READIN-I, Nuclear repulsion energy :    8.2225577902 a.u.
  required memory for a1 array               4451690  words 
  required memory for a2 array                  9286  words 
 @LARM: NOT ENOUGH MEMORY!!!!!
 I1(1:50) =                     1                  109
217                  225                  353                  434
515                  596                  704                  704
920                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0                    0              4451642
 I2(1:20) =                     1                   82
83                   99                  180                  261
585                    0                    0                    0
0                    0                    0                    0
0                    0                    0                    0
0                    0
 @ACES_EXIT: An ACES error has occurred.
             return status =                     1

The relevant check (at conlox.f, line 72) is:

      IF(I1(11).GE.I1(50).OR.I2(7).GT.I2(20)) CALL LARM(I1,I2)

I2(20) is zero and thus smaller than I2(7) (585). 

If I revert to gfortran-4.4.5 (using the current stable release, or the
gfortran-4.4 package in unstable), I2(20) is 46472 and the ACESII test
suite runs mostly fine.

In case you have some suggestions at what could be at fault here, I
would be interested to know, otherwise, I will file a bug report and see
what happens.

> The dup and lb is provided as a safety measure for the machines that
> do not have matching  (64BIT or 32 BIT integer) vendor provided
> mathematical libraries.  ACES II  is built with 64BIT integer flags
> (-i8 on intel) and should be linked against the correct 64BIT
> mathematical libraries. If the vendor does not provide them then lb
> must be used. ACES III is built with 32BIT integers, so it should be
> linked against 32BIT libraries. It is very likely all the vendors have
> 32bit mathematical libraries, so you can skip dup.

First of all, skipping dup entirely was not possible, as the following
routines are not provided by our system blas/lapack:

dsum.F, xscal.F, xdnrm2.f, xdscal.f and xdswap.f (for ACESII, this is
only dsum.f I believe)

Do you know which standard libraries implement those?  Some of them
appear to be from blas-xtra, according to google.

Now, to the 32/64-bits.  Thanks for explaining the respective
requirements of ACESII and ACESIII, I think it would be beneficial to
spell this out more explicitly in doc/README.install and/or as comments
to GNUMakefile.chssi.

Debian ships (to my knowledge) only one set of blas/lapack libraries;
I have to admit that documentation about how they got compiled is also
lacking, though.

For ACESIII, we are currently using these definitions:

SIP_DIR=../sip
INCLUDE_DIRS=-I../include  -I$(SIP_DIR) -I../aces2/include
LIB_DIRS=-L../lib
-Mnodefaultunit
FFLAGS=-DMPIF2C -DMPI2 -DC_SUFFIX -DCB_SUFFIX -D__fortran -D__fortran77
-g -O2 -Wall
CFLAGS=-DMPIF2C -DMPI2 -DC_SUFFIX -DCB_SUFFIX -g -O2 -Wall
CPPFLAGS=-DMPIF2C -DMPI2 -DC_SUFFIX -DCB_SUFFIX -g -O2 -Wall
FC=mpif77
CC=mpicc
CPP=mpicxx
SERIAL_CPP=g++
ARFLAGS=-rv
LIBS=-lsip -lerd -loed -lsip_shared -lframelib -lmpi -laces2 -lgeopt
-lsymcor -laces2 -ldup -lsip -lmpi -lblas -llapack
SIAL_COMPILER_LIBS=-lsial -lsip_shared -laces2 -lgfortran

for both 32bit and 64bit, does this look correct?  As the shipped
GNUMakefile.chssi is very specific to various existing systems, it was
not very easy to figure out what variables are appropriate.

For ACESII, we run make with "CMPLR=gnu PARALLEL=0" and "64BIT=1" for
64bit and "64BIT=0" for 32bit, the most important definitions in
Makefiles/GNUMakefile are currently:

   DEFINES = -D_GNU -DGFORTRAN
   C_SUFFIX = 1
   CXX = g++ -c
   CC  = gcc -c
   FC  = gfortran-4.4 -c
   F9XC       = gfortran-4.4 -c
   LD  = gfortran-4.4
   DIR_MPIINC = /usr/include/mpi
   LDFLAGS_MPILIBS = -lmpi
   LDFLAGS_NUMLIBS = -llinpack -leispack -lblas -llapack -llb
   MPCC = mpicc -c
   MPFC = mpif90 -c
   MPLD = mpif90
   CPP  = cpp -traditional
   CPPFLAGS       = -P -C
   CPPFLAG_DEPEND = -MM # do not check system includes
   MODDIRS_PREFIX = -I
   CFLAGS  = -g -O2 -Wall
   FFLAGS  = -g -O2 -Wall -fno-second-underscore -finit-local-zero
   LDFLAGS =
   ifeq (${64BIT},1)
      FFLAGS += -fdefault-integer-8
   endif

I think in the long run it would be beneficial to use a
source-configuration system like autoconf or cmake in order to
automatically configure and build ACESII/ACESIII correctly on some host
without requiring prior knowledge of it and having to edit files first.

> >From this discussion it must be clear that the dup and lb should always
> work because you are building them along with the rest of the source.
> It may hurt the performance but they should always work. If the vendor
> provides the matching libs, then it is certainly beneficial to use
> them but that is not necessary.

OK.  See above for our current problem.

To summarize, ACESII currently builds and runs fine using gfortran-4.4
on 32/64bit with internal and external blas/lapack.  ACESIII builds and runs fine
using gfortran-4.4 on 32/64bit with external blas/lapck, but jobs which
require (I guess) distributed memory like CCSD calculations fail if only
one core is used.

I hope I got that right, I compiled quite a lot today :-/


Cheers,

Michael



More information about the Debichem-devel mailing list