[Pkg-openmpi-maintainers] Bug#851918: minimum example program + hack in fftw2 that hides the bug on s390x
Boud Roukema
boud-debian at cosmo.torun.pl
Sat Jan 28 02:51:50 UTC 2017
DESCRIPTION:
The openmpi bug #851918 / mpgrafic bug #851923 on the s390x
architecture appears to be related to fortran/C interfacing
in terms of referencing/dereferencing pointers. The minimal
test program and example compilation and run below illustrate
the bug.
It looks like fftw-2.1.5/mpi/fftw_f77_mpi.h is designed to prefer that
the mpi implementation be responsible for handling this interface,
i.e. the FFTW_MPI_COMM_F2C( ) preprocessor macro is preferably
set to be MPI_COMM_F2C( ). This seems to work on most of the
official architectures, but on s390x, this gives an invalid
result (it should be a pointer, struct ompi_communicator_t *,
for openmpi).
A hack that works on s390x is to artificially turn off
HAVE_MPI_COMM_F2C in fftw/config.h . This is not a serious
sustainable solution, because config.h is normally regenerated
by autoconf. The open question seems to be whether this should
be handled in openmpi or fftw2. The fact that it can be solved by
a hack in fftw2 doesn't imply that the source of the bug is in fftw2
rather than openmpi.
VERSIONS:
fftw version: fftw_2.1.5-4.1
openmpi version: 2.0.2~git.20161225-9
architecture: s390x
distribution: sid
MINIMAL TEST PROGRAM minimal.f90:
program minimal
implicit none
integer, parameter :: fftw_estimate=0
integer, parameter :: fftw_real_to_complex=-1
integer, parameter :: i8b=selected_int_kind(18)
integer :: ierr, nx = 32
integer(i8b) :: plan
#include "mpif.h"
call mpi_init(ierr)
call mpi_barrier(mpi_comm_world,ierr)
call rfftw3d_f77_mpi_create_plan(plan,mpi_comm_world,nx,nx,nx, &
fftw_real_to_complex, fftw_estimate)
call mpi_finalize(ierr)
stop
end program minimal
Build-Depends: fftw-dev, gfortran, mpi-default-dev, mpi-default-bin
COMPILATION:
mpifort -cpp minimal.f90 -o ./minimal -lrfftw_mpi -lfftw_mpi -lrfftw -lfftw
RUN:
mpirun -n 1 --mca plm_rsh_agent sh ./minimal
RESULT:
- amd64/jessie - (apparently incorrect) warning Deprecated parameter: plm_rsh_agent
given twice, no errors.
- s390x/sid -
[zelenka:21289] *** An error occurred in MPI_Comm_dup
[zelenka:21289] *** reported by process [3591634945,0]
[zelenka:21289] *** on communicator MPI_COMM_WORLD
[zelenka:21289] *** MPI_ERR_COMM: invalid communicator
[zelenka:21289] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[zelenka:21289] *** and potentially your MPI job)
SOURCE:
In fftw-2.1.5 source directory:
# --disable-float means choose double precision
autoreconf -f -i && ./configure --disable-float --enable-mpi --enable-shared && make clean && make
mkdir -p ../usr/lib ../usr/include # temporary
cp -pv */.libs/lib*[^i] ../usr/lib/ && cp -pv */*.h ../usr/include/ # install
In the minimal.f90 source directory:
mpifort -L../usr/lib -I../usr/include -cpp minimal.f90 \
-o ./minimal -lrfftw_mpi -lfftw_mpi -lrfftw -lfftw
LD_LIBRARY_PATH=../usr/lib mpirun -n 1 --mca plm_rsh_agent sh ./minimal
Result:
[zelenka:21289] *** An error occurred in MPI_Comm_dup
[zelenka:21289] *** reported by process [3591634945,0]
[zelenka:21289] *** on communicator MPI_COMM_WORLD
[zelenka:21289] *** MPI_ERR_COMM: invalid communicator
[zelenka:21289] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[zelenka:21289] *** and potentially your MPI job)
HACK:
--- fftw/config.h.orig 2017-01-28 01:28:14.747221802 +0000
+++ fftw/config.h 2017-01-28 02:05:13.630034126 +0000
@@ -126,8 +126,8 @@
/* Define if you have the MPI library. */
#define HAVE_MPI 1
-/* desc */
-#define HAVE_MPI_COMM_F2C /**/
+/* See mpi/fftw_f77_mpi.h; undefine this for s390x */
+/* #define HAVE_MPI_COMM_F2C */
/* Define if you have POSIX threads libraries and header files. */
/* #undef HAVE_PTHREAD */
RECOMPILE:
make && cp -pv */.libs/lib*[^i] ../usr/lib/
In the minimal.f90 source directory, compile using this hacked version of fftw-2.1.5-4.1:
mpifort -L../usr/lib -I../usr/include -cpp minimal.f90 \
-o ./minimal -lrfftw_mpi -lfftw_mpi -lrfftw -lfftw
LD_LIBRARY_PATH=../usr/lib mpirun -n 1 --mca plm_rsh_agent sh ./minimal
This runs correctly, without errors (and no output).
This hack cannot count as a long-term sustainable fix, but
hopefully may help in finding a fix.
More information about the Pkg-openmpi-maintainers
mailing list