[ClusterLabs] Behavior after stop action failure with the failure-timeout set and STONITH disabled

Jan Wrona wrona at cesnet.cz
Thu May 4 10:41:49 EDT 2017


I hope I'll be able to explain the problem clearly and correctly.

My setup (simplified): I have two cloned resources, a filesystem mount 
and a process which writes to that filesystem. The filesystem is Gluster 
so its OK to clone it. I also have a mandatory ordering constraint 
"start gluster-mount-clone then start writer-process-clone". I don't 
have a STONITH device, so I've disable STONITH by settin||||g 
stonith-enabled=false.

The problem: Sometimes the Gluster freezes for a while, which causes the 
gluster-mount resource's monitor with the OCF_CHECK_LEVEL=20 to timeout 
(it is unable to write the status file). When this happens, the cluster 
tries to recover by restarting the writer-process resource. But the 
writer-process is writing to the frozen filesystem which makes it 
uninterruptable, not even SIGKILL works. Then the stop operation times 
out and on-fail with disabled STONITH defaults to block (don’t perform 
any further operations on the resource):
warning: Forcing writer-process-clone away from node1.example.org after 
1000000 failures (max=1000000)
After that, the cluster continues with the recovery process by 
restarting the gluster-mount resource on that node and it usually 
succeeds. As a consequence of that remount, the uninterruptable system 
call in the writer process fails, signals are finally delivered and the 
writer-process is terminated. But the cluster doesn't know about that!

I thought I can solve this by setting the failure-timeout meta attribute 
to the writer-process resource, but it only made things worse. The 
documentation states: "Stop failures are slightly different and crucial. 
... If a resource fails to stop and STONITH is not enabled, then the 
cluster has no way to continue and will not try to start the resource 
elsewhere, but will try to stop it again after the failure timeout.", 
but I'm seeing something different. When the policy engine is launched 
after the nearest cluster-recheck-interval, following lines are written 
to the syslog:
crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE
pengine[11851]:  notice: Clearing expired failcount for writer-process:1 
on node1.example.org
pengine[11851]:  notice: Clearing expired failcount for writer-process:1 
on node1.example.org
pengine[11851]:  notice: Ignoring expired calculated failure 
writer-process_stop_0 (rc=1, 
magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on 
node1.example.org
pengine[11851]:  notice: Clearing expired failcount for writer-process:1 
on node1.example.org
pengine[11851]:  notice: Ignoring expired calculated failure 
writer-process_stop_0 (rc=1, 
magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on 
node1.example.org
pengine[11851]: warning: Processing failed op monitor for 
gluster-mount:1 on node1.example.org: unknown error (1)
pengine[11851]:  notice: Calculated transition 564, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-362.bz2
crmd[11852]:  notice: Transition 564 (Complete=2, Pending=0, Fired=0, 
Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-362.bz2): Complete
crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
crmd[11852]: warning: No reason to expect node 3 to be down
crmd[11852]: warning: No reason to expect node 1 to be down
crmd[11852]: warning: No reason to expect node 1 to be down
crmd[11852]: warning: No reason to expect node 3 to be down
pengine[11851]: warning: Processing failed op stop for writer-process:1 
on node1.example.org: unknown error (1)
pengine[11851]: warning: Processing failed op monitor for 
gluster-mount:1 on node1.example.org: unknown error (1)
pengine[11851]: warning: Forcing writer-process-clone away from 
node1.example.org after 1000000 failures (max=1000000)
pengine[11851]: warning: Forcing writer-process-clone away from 
node1.example.org after 1000000 failures (max=1000000)
pengine[11851]: warning: Forcing writer-process-clone away from 
node1.example.org after 1000000 failures (max=1000000)
pengine[11851]:  notice: Calculated transition 565, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-363.bz2
pengine[11851]:  notice: Ignoring expired calculated failure 
writer-process_stop_0 (rc=1, 
magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on 
node1.example.org
pengine[11851]: warning: Processing failed op monitor for 
gluster-mount:1 on node1.example.org: unknown error (1)
crmd[11852]:  notice: Transition 566 (Complete=0, Pending=0, Fired=0, 
Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
pengine[11851]:  notice: Calculated transition 566, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-364.bz2

Then after each cluster-recheck-interval:
crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
pengine[11851]:  notice: Ignoring expired calculated failure 
writer-process_stop_0 (rc=1, 
magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on 
node1.example.org
pengine[11851]: warning: Processing failed op monitor for 
gluster-mount:1 on node1.example.org: unknown error (1)
crmd[11852]:  notice: Transition 567 (Complete=0, Pending=0, Fired=0, 
Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

And the crm_mon is happily showing the writer-process as Started, 
although it is not running. This is very confusing. Could anyone please 
explain what is going on here?

||||

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170504/ca38e664/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3706 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170504/ca38e664/attachment-0002.p7s>


More information about the Users mailing list