[ClusterLabs] strange behaviour from pacemaker_remote
Ken Gaillot
kgaillot at redhat.com
Sun Oct 1 12:32:01 EDT 2017
On Thu, 2017-09-28 at 01:39 +0200, Adam Spiers wrote:
> Hi all,
>
> When I do a
>
> pkill -9 -f pacemaker_remote
>
> to simulate failure of a remote node, sometimes I see things like:
>
> 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: No
> ipc providers available for uid 0 gid 0
> 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: Error
> in connection setup (5806-5805-15): Remote I/O error (121)
>
> ... and the node doesn't get fenced as expected. Other times it
> does.
> Is this my fault for using an invalid way of simulating failure, or
> some kind of bug?
>
> Sadly I don't have the exact version of pacemaker_remoted to hand,
> but
> I can provide it tomorrow if necessary. It's not the latest release,
> maybe not even the one immediately preceding it.
>
> Thanks!
> Adam
Before fencing, the cluster will try re-establishing the connection. If
you've got pacemaker_remote enabled via systemd, systemd may be
respawning it quick enough that the cluster reconnect succeeds.
Also, until a recent master branch commit, remote nodes would not get
fenced if they were not running any resources.
And of course, a fencing resource has to be configured for the remote
node.
If none of those things were the reason, there may be a bug -- a PE
input file from the DC for that transition would be helpful.
More information about the Users
mailing list