[Debian-ha-maintainers] Bug#617451: heartbeat: LSB init script handling is suboptimal

Faidon Liambotis paravoid at debian.org
Wed Mar 9 01:24:47 UTC 2011


Package: heartbeat
Version: 1:3.0.3-2
Severity: important

During the upgrade of a couple of systems to squeeze, I encountered a
problem that took me a while to debug (although it's nothing specific to
heartbeat changes in squeeze).

More specifically, I was using heartbeat in a simple setup with a couple
of NFS mounts and LSB resources. One of them was postgresql, which was
getting skipped altogether as a resource.

After some head-scratching, debugging and code-reading, the problem was
pinpointed to this:

Before starting a resource, ResourceManager tries to find if that
service is already running (for LSB scripts, that's “/etc/init.d/foo
status”) and if so, skips it¹ altogether.

So, while ResourceManager tries to behave as per LSB wrt exit codes, it
fails to do so when running status on init scripts. Instead of looking
at the exit code, it performs the following horrendous heuristic
(ResourceManager:209):

    case `$spath $arg status` in
      *[Nn][Oo][Tt]\ *[Rr]unning*)        return 3;;
      *[Rr]unning*|*OK*)                  return 0;;
      *)                                  return 3;;
    esac

That bit me during the upgrade, because PostgreSQL's init script in
squeeze produces the following output:

  ## when running
  $ /etc/init.d/postgresql status
  Running clusters: 8.4/main 
  
  ## when not running
  # /etc/init.d/postgresql status
  Running clusters: 

The second one is meant to say that /nothing/ is running but is
mistakenly considered by heartbeat as running because of the string
match above.

There's no way to workaround this without writing your own resource
(which I did). But I think this would be better solved in heartbeat
itself, by e.g. adhering to LSB and checking the exit code of the init
script itself, instead of pattern matching on its output.

[ I'm setting severity to important on this one, as my feeling is that
postgresql+heartbeat installations are common ]

Regards,
Faidon

¹: Without logging /anything/, which probably is a bug on its own.





More information about the Debian-ha-maintainers mailing list