[ClusterLabs] Problem with stonith and starting services

Tue Jul 4 10:16:06 EDT 2017

On 07/04/2017 03:28 PM, Cesar Hernandez wrote:
>> Agreed, I don't think it's multicast vs unicast.
>>
>> I can't see from this what's going wrong. Possibly node1 is trying to
>> re-fence node2 when it comes back. Check that the fencing resources are
>> configured correctly, and check whether node1 sees the first fencing
>> succeed.
>
> Thanks. Checked fencing resource and it always returns, it's a custom script I used on other installations and it always worked.
> I think the clue are the two messages that appear when it fails:
>
> Jul  3 09:07:04 node2 pacemakerd[597]:  warning: The crmd process (608) can no longer be respawned, shutting the cluster down.
> Jul  3 09:07:04 node2 crmd[608]:     crit: We were allegedly just fenced by node1 for node1!
>
> Anyone knows what are they related to? Seems not to be much information on the Internet

The first line is the consequence of the 2nd.
And the 1st says that node2 just has seen some fencing-resource
positively reporting to have fenced himself - which
is why crmd is exiting in a way that it is not respawned
by pacemakerd.
For the reason I can just guess ...
Maybe your script is not checking whom it should actually
fence and assumes something!?
You can configure for which targets a certain fencing-resource
using pcmk_host_map, pcmk_host_check & pcmk_host_list.
Like this it would be possible that the fencing-resource was
configured differently when your script worked in a different
setup.

Regards,
Klaus

>
> Thanks
> Cesar
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org