[ClusterLabs Developers] --verbose breaks stonithd + some fencing agents

Wed Oct 14 21:10:13 EDT 2015

Hi all,

I'm certainly no expert on stonithd or the way it interfaces with
fence-agents, but I think I found a bug in either stonithd or
fencing.py.

If a fencing agent is invoked with the --verbose CLI argument (or
something like 'verbose=1' via STDIN), then any invocations of
logging.debug() will cause output to STDERR:

  https://github.com/ClusterLabs/fence-agents/blob/master/fence/agents/lib/fencing.py.py#L640

This confuses stonithd, because it dup(2)s STDOUT and STDERR to the
same fd which is the writeable end of a pipe used by stonithd to read
output from the forked child which runs the fencing agent:

  https://github.com/ClusterLabs/pacemaker/blob/master/lib/fencing/st_client.c#L782

Therefore from the point of view of stonithd, debug output on STDERR
gets intermingled with "real" output on STDOUT, and when it comes to
parse this, the result is warnings in the logs beginning:

  stonith-ng[5399]:  warning: Could not parse ...

Since we already log to syslog, I wonder if it's not needed to also
log to STDERR, so my first instinct was this fix:

  https://github.com/aspiers/fence-agents/commit/4c7148b8046eb9cef950811c26fab73672f403bc

However, subsequently I realised that the root cause is the way
stonithd mixes STDOUT and STDERR from the child fencing agent process
together, so now I'm wondering if it would be better to change
lib/fencing/st_client.c to create a third pipe for handling STDERR
independently.

Thoughts?