[Pacemaker] Periodically appear non-existent nodes

Vladislav Bogdanov bubble at hoster-ok.com
Thu Apr 19 05:06:33 EDT 2012


19.04.2012 11:24, Andreas Kurz wrote:
> On 04/18/2012 11:46 PM, ruslan usifov wrote:
>>
>>
>> 2012/4/18 Andreas Kurz <andreas at hastexo.com <mailto:andreas at hastexo.com>>
>>
>>     On 04/17/2012 09:31 PM, ruslan usifov wrote:
>>     >
>>     >
>>     > 2012/4/17 Proskurin Kirill <k.proskurin at corp.mail.ru
>>     <mailto:k.proskurin at corp.mail.ru>
>>     > <mailto:k.proskurin at corp.mail.ru <mailto:k.proskurin at corp.mail.ru>>>
>>     >
>>     >     On 04/17/2012 03:46 PM, ruslan usifov wrote:
>>     >
>>     >         2012/4/17 Andreas Kurz <andreas at hastexo.com
>>     <mailto:andreas at hastexo.com>
>>     >         <mailto:andreas at hastexo.com <mailto:andreas at hastexo.com>>
>>     <mailto:andreas at hastexo.com <mailto:andreas at hastexo.com>
>>     >         <mailto:andreas at hastexo.com <mailto:andreas at hastexo.com>>>>
>>     >
>>     >
>>     >            On 04/14/2012 11:14 PM, ruslan usifov wrote:
>>     >             > Hello
>>     >             >
>>     >             > I remove 2 nodes from cluster, with follow sequence:
>>     >             >
>>     >             > crm_node --force -R <id of node1>
>>     >             > crm_node --force -R <id of node2>
>>     >             > cibadmin --delete --obj_type nodes --crm_xml '<node
>>     >         uname="node1"/>'
>>     >             > cibadmin --delete --obj_type status --crm_xml
>>     '<node_state
>>     >            uname="node1"/>'
>>     >             > cibadmin --delete --obj_type nodes --crm_xml '<node
>>     >         uname="node2"/>'
>>     >             > cibadmin --delete --obj_type status --crm_xml
>>     '<node_state
>>     >            uname="node2"/>'
>>     >             >
>>     >             >
>>     >             > Nodes after this deleted, but if for example i restart
>>     >         (reboot)
>>     >            one of
>>     >             > existent nodes in working cluster, this deleted nodes
>>     >         appear again in
>>     >             > OFFLINE state
>>     >
>>     >
>>     >     I have this problem some time ago.
>>     >     I "solved" it something like that:
>>     >
>>     >     crm node delete NODENAME
>>     >     crm_node --force --remove NODENAME
>>     >     cibadmin --delete --obj_type nodes --crm_xml '<node
>>     uname="NODENAME"/>'
>>     >     cibadmin --delete --obj_type status --crm_xml '<node_state
>>     >     uname="NODENAME"/>'
>>     >
>>     >     --
>>     >
>>     >
>>     > I do the same, but some times after cluster reconfiguration (node
>>     failed
>>     > due power supply failure) removed nodes appear again, and this happens
>>     > 3-4 times
>>
>>     And the same behavior if you switch your cluster into maintenance-mode
>>     (to avoid service downtime) and stop/start pacemaker and corosync
>>     completely?
>>
>>
>> We will have maintenance window at this Friday (20.04.2012) so after
>> that i can report more info.
> 
> Of course, that is the safest option ... though you won't have a service
> downtime if you enable maintenance-mode prior to cluster restart.

Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not stop
corosync - dlm_controld uses CPG.

And, DLM may use pacemaker parts for fencing (cib, attrd, stonith,
depending on version).

> 
>>
>> PS: I had similar situation on other cluster some times ago, and there i
>> fully restart cluster and problem reproduced. But after some time(about
>> 1-2 week) not existent nodes have ceased to appear
> 
> Now that is really strange ... if that happens again, the
> corosync/pacemaker log files would be really interesting to have a look at.

I recall that is a known issue for a rather long time.
One need to do a full (not rolling) restart to make node fully disappear.
I checked this again not so long ago, and yes, node deletion does not
work with current master branch (or very close to it) - it appears again
after pacemaker restart on any other node.

May be it is because of lrmd cache, like with failed actions? It looks
very similar to that.

Andrew, David?

Best,
Vladislav




More information about the Pacemaker mailing list