jessie: help debugging NFS shares not mounted at boot, double mounts with mount -a, and @reboot cronjobs

Mon Jan 25 12:04:26 GMT 2016

Hello,
we are converting our installations to jessie and we are facing some
issues, which we believe is somehow related to how systemd boots the
system, and that they can be summarized as (more details below):

1. some NFS shares are not mounted at boot
2. using 'mount -a' as @reboot cronjob (to workaround 1.) duplicates mounts
3. @reboot jobs for programs on NFS mountpoints are started too early
(when the share is not yet mounted)

(1.) we believe that the NFS shares not mounted at boot is also
causing the other problems, so let's start from this one.

we define our NFS mounts in /etc/fstab like this:

NFS_SERVER:VOLUME    MOUNTPOINT      nfs
rw,intr,tcp,bg,rdirplus,noatime,_netdev

the mount options are always the same, and we have several of these
mounts (from different servers), even up to 20 on some hosts.

some of those (not always the same and not on all the machines), are
not mounted during the boot.

(2.) as a workaround, we added a "ugly" @reboot cronjob to 'mount -a',
which indeed mounts all the NFS shares, but sometimes (not always on
the same mount) it duplicates the mount:

FQDN:VOLUME on MOUNTPOINT type nfs
(ro,noatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=IPADDR1,mountvers=3,mountport=300,mountproto=tcp,local_lock=none,addr=IPADDR1)
FQDN:VOLUME on MOUNTPOINT type nfs
(ro,noatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=IPADDR2,mountvers=3,mountport=300,mountproto=tcp,local_lock=none,addr=IPADDR2,_netdev)

(we do indeed expose FQDN from multiple IP addresses, could this be
the issue?), which are the same share (mounted on the same
mountpoint), with just the _netdev option on the second one. It's not
causing any issue (after all we got the share mounted where we want)
but it seems a signal for a latent problem.

(3.) we have some @reboot cronjobs to start programs stored on some
NFS shares, but as the cronjobs are started too early, and not all the
NFS shares are present (as they are backgrounded, i guess to speedup
boot) they just do nothing.

We havent found a lot of information on-line regarding this problems,
and while #739721 seems related it is now closed and all the proposed
solutions are already applied (or not applicable) and we still see
this issue.

Could you help us understand where the problem is and how to fix it?
We are more than willing to experiment multiple solutions and -if
needed- we can provide (up to a certain point) more details on our
infrastructure.

thanks a ton in advance!

-- 
Sandro "morph" Tosi
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi