[ClusterLabs] Pacemaker shutting down peer node

Jaz Khan jazmphone at gmail.com
Fri Jun 16 12:21:26 EDT 2017


Hi,

I have checked node ha-apex2.
The log on that machine from /var/log/messages says "systemd: Power button
pressed" and "Shutting down...."  but this message appeared just when the
ha-apex1 node scheduled the shutdown with difference in seconds.

It seems like the peer node (ha-apex1) has sent some kind of power off
request and it obeyed to the request.

On node ha-apex1 it clearly says "Scheduling Node ha-apex2 for shutdown"
which seems like it has scheduled this task to be executed on peer node.

My servers are running in production, please help me out. I really do not
want anything to happen to any of node. I hope you understand the
seriousness of this issue.

NOTE: This didn't only happen on this cluster group of nodes. It also
happened few times on another cluster group of machines as well.

Look at this two messages from ha-apex1 node.

Jun 14 15:52:23 apex1 pengine[18732]:  notice: Scheduling Node ha-apex2 for
shutdown

Jun 14 15:52:27 apex1 crmd[18733]:  notice: do_shutdown of peer ha-apex2 is
complete


Best regards,
Jaz




>
> Message: 1
> Date: Thu, 15 Jun 2017 13:53:00 -0500
> From: Ken Gaillot <kgaillot at redhat.com>
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker shutting down peer node
> Message-ID: <5d122183-2030-050d-3a8e-9c158fa5fb5d at redhat.com>
> Content-Type: text/plain; charset=utf-8
>
> On 06/15/2017 12:38 AM, Jaz Khan wrote:
> > Hi,
> >
> > I have been encountering this serious issue from past couple of months.
> > I really have no idea that why pacemaker sends shutdown signal to peer
> > node and it goes down. This is very strange and I am too much worried .
> >
> > This is not happening daily, but it surely does this kind of behavior
> > after every few days.
> >
> > Version:
> > Pacemaker 1.1.16
> > Corosync 2.4.2
> >
> > Please help me out with this bug! Below is the log message.
> >
> >
> >
> > Jun 14 15:52:23 apex1 crmd[18733]:  notice: State transition S_IDLE ->
> > S_POLICY_ENGINE
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: On loss of CCM Quorum:
> Ignore
> >
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Scheduling Node ha-apex2
> > for shutdown
>
> This is not a fencing, but a clean shutdown. Normally this only happens
> in response to a user request.
>
> Check the logs on both nodes before this point, to try to see what was
> the first indication that it would shut down.
>
> >
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move    vip#011(Started
> > ha-apex2 -> ha-apex1)
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move
> >  filesystem#011(Started ha-apex2 -> ha-apex1)
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move    samba#011(Started
> > ha-apex2 -> ha-apex1)
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move
> >  database#011(Started ha-apex2 -> ha-apex1)
> > Jun 14 15:52:23 apex1 pengine[18732]:  notice: Calculated transition
> > 1744, saving inputs in /var/lib/pacemaker/pengine/pe-input-123.bz2
> > Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> > vip_stop_0 on ha-apex2
> > Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> > samba_stop_0 on ha-apex2
> > Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> > database_stop_0 on ha-apex2
> > Jun 14 15:52:26 apex1 crmd[18733]:  notice: Initiating stop operation
> > filesystem_stop_0 on ha-apex2
> > Jun 14 15:52:27 apex1 kernel: drbd apexdata apex2.br <http://apex2.br>:
> > peer( Primary -> Secondary )
> > Jun 14 15:52:27 apex1 crmd[18733]:  notice: Initiating start operation
> > filesystem_start_0 locally on ha-apex1
> >
> > Jun 14 15:52:27 apex1 crmd[18733]:  notice: do_shutdown of peer ha-apex2
> > is complete
> >
> > Jun 14 15:52:27 apex1 attrd[18731]:  notice: Node ha-apex2 state is now
> lost
> > Jun 14 15:52:27 apex1 attrd[18731]:  notice: Removing all ha-apex2
> > attributes for peer loss
> > Jun 14 15:52:27 apex1 attrd[18731]:  notice: Lost attribute writer
> ha-apex2
> > Jun 14 15:52:27 apex1 attrd[18731]:  notice: Purged 1 peers with id=2
> > and/or uname=ha-apex2 from the membership cache
> > Jun 14 15:52:27 apex1 stonith-ng[18729]:  notice: Node ha-apex2 state is
> > now lost
> > Jun 14 15:52:27 apex1 stonith-ng[18729]:  notice: Purged 1 peers with
> > id=2 and/or uname=ha-apex2 from the membership cache
> > Jun 14 15:52:27 apex1 cib[18728]:  notice: Node ha-apex2 state is now
> lost
> > Jun 14 15:52:27 apex1 cib[18728]:  notice: Purged 1 peers with id=2
> > and/or uname=ha-apex2 from the membership cache
> >
> >
> >
> > Best regards,
> > Jaz. K
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170616/05958df8/attachment-0003.html>


More information about the Users mailing list