[Debichem-devel] GROMACS 4.6-beta1 release

Fri Dec 21 01:07:58 UTC 2012

On Fri, Dec 21, 2012 at 1:22 AM, Susi Lehtola <
jussilehtola at fedoraproject.org> wrote:

> On Fri, 21 Dec 2012 00:40:34 +0100
> "Dominik 'Rathann' Mierzejewski" <dominik at greysector.net> wrote:
> > > The default build produces an executable that targets the best
> > > instruction set supported by the build host, and the executables
> > > will not run if that set is not supported. From the point of view
> > > of packaging, that's ugly. Either there needs to be multiple
> > > packages, or you will need to choose an instruction set that is
> > > conservative enough to always work. -DGMX_CPU_ACCELERATION=None
> > > will always work, but will be horribly slow.
> > > -DGMX_CPU_ACCELERATION=SSE2 probably runs about as fast as the 4.x
> > > series, but is probably faster through our better use of force-only
> > > kernels. -DGMX_CPU_ACCELERATION=SSE4.1 is only slightly faster than
> > > SSE2. And then there is AVX, which is faster still, but stil having
> > > teething issues.
> > >
> > > I'd suggest having an SSE2 and a None, but it really depends on
> > > what the downstream people want.
> >
> > For x86_64, SSE2 is supported by all processors, so it shouldn't be an
> > issue, but for x86_32, the binary should run even on an ancient
> > Pentium Pro (Fedora x86_32 minimum CPU requirement). Have you
> > considered implementing runtime CPU detection?
>
> IIRC the old assembly kernels had runtime CPU detection.
>

We do do runtime detection - we fail gracefully when run on hardware not
capable of the configure-time instruction set, and issue a warning if
better performance would be available for a different configuration.

We have no plans to build "universal" binaries capable of supporting all
the available x86-family acceleration paths. High performance of GROMACS on
these architectures is a key feature, but with ~200 kernel functions per
acceleration path, and currently generic, SSE2, SSE4.1, AVX_128_FMA and
AVX_256 acceleration paths supported, things will get ugly fast if we were
to try to build universal binaries now and in the future.

> I don't think the current situation is that bad, if there is no
> significant speed difference between SSE2 and the most heavily
> optimized version.
>

The differences in performance are quite significant (or we wouldn't bother
with it!), but will vary according to hardware details of the actual
execution environment. As always, GROMACS pushes the silicon right to the
limits, currently via extensive use of compiler SIMD intrinisics on x86.

Frankly, the configuration of the fftw dependency will be more significant
than the SSE2->SSE4.1 effect, though. (And we strongly encourage having the
fftw package dependency - we do provide an internal fallback for FFTs, but
you may as well run GROMACS on your phone as actually use the fallback
code!)

I'm perfectly happy to ship an SSE2 version on x86_64 and a version
> without CPU acceleration on other architectures, since everyone who
> actually does calculations surely is on x86_64 architecture.
>

... and anybody doing serious calculations will see our warning that their
use of an SSE2-enabled binary is losing performance and then find out that
they need to roll their own. That's the usual price of high performance, of
course. I expect they'll have to do that for GPU support, anyway.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debichem-devel/attachments/20121221/d07c454f/attachment.html>