[ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

Vladislav Bogdanov bubble at hoster-ok.com
Fri Jan 22 11:57:52 EST 2016


22.01.2016 19:28, David Teigland wrote:
> On Fri, Jan 22, 2016 at 06:59:25PM +0300, Vladislav Bogdanov wrote:
>> Hi David, list,
>>
>> recently I tried to upgrade dlm from 4.0.2 to 4.0.4 and found that it
>> no longer handles fencing of a remote node initiated by other cluster components.
>> First I noticed that during valid fencing due to resource stop failure,
>> but it is easily reproduced with 'crm node fence XXX'.
>>
>> I took logs from both 4.0.2 and 4.0.4 and "normalized" (replaced timestamps)
>> their part after fencing is originated by pacemaker.
>
> There are very few commits there, and only two I could imagine being
> related.  Could you try reverting them and see if that helps?
>
> 79e87eb5913f Make systemd stop dlm on corosync restart

There is no systemd on EL6, so this one is not a suspect.

> fb61984c9388 dlm_stonith: use kick_helper result

Tried reverting this one and a51b2bb ("If an error occurs unlink the 
lock file and exit with status 1") one-by-one and both together, the 
same result.

So problem seems to be somewhere deeper.

Best,
Vladislav





More information about the Users mailing list