[ClusterLabs] Antw: Behavior after stop action failure with the failure-timeout set and STONITH disabled

Fri May 5 12:49:26 UTC 2017

On 5.5.2017 08:15, Ulrich Windl wrote:
>>>> Jan Wrona <wrona at cesnet.cz> schrieb am 04.05.2017 um 16:41 in Nachricht
> <cac9591a-7efc-d524-5d2c-248415c5c37e at cesnet.cz>:
>> I hope I'll be able to explain the problem clearly and correctly.
>>
>> My setup (simplified): I have two cloned resources, a filesystem mount
>> and a process which writes to that filesystem. The filesystem is Gluster
>> so its OK to clone it. I also have a mandatory ordering constraint
>> "start gluster-mount-clone then start writer-process-clone". I don't
>> have a STONITH device, so I've disable STONITH by settin||||g
>> stonith-enabled=false.
>>
>> The problem: Sometimes the Gluster freezes for a while, which causes the
>> gluster-mount resource's monitor with the OCF_CHECK_LEVEL=20 to timeout
>> (it is unable to write the status file). When this happens, the cluster
> Actually I would do two things:
>
> 1) Find out why Gluster freezes, and what to do to avoid that

It freezes when one of the underlying MD RAIDs starts its regular check. 
I've decreased its speed limit (from the default 200 MB/s to the 50 
MB/s, I cannot go any lower), but it helped only a little, the mount 
still tends to freeze for a few seconds during the check.

>
> 2) Implement stonith

Currently I can't. But AFAIK Pacemaker should work properly even with 
disabled STONITH and the state I've run into doesn't seem right to me at 
all. I was asking for clarification of what the cluster is trying to do 
in such situation, I don't understand the "Ignoring expired calculated 
failure" log messages and I don't understand why the crm_mon was showing 
that the writer-process is started even though it was not.

>
> Regards,
> Ulrich
>
>
>> tries to recover by restarting the writer-process resource. But the
>> writer-process is writing to the frozen filesystem which makes it
>> uninterruptable, not even SIGKILL works. Then the stop operation times
>> out and on-fail with disabled STONITH defaults to block (don’t perform
>> any further operations on the resource):
>> warning: Forcing writer-process-clone away from node1.example.org after
>> 1000000 failures (max=1000000)
>> After that, the cluster continues with the recovery process by
>> restarting the gluster-mount resource on that node and it usually
>> succeeds. As a consequence of that remount, the uninterruptable system
>> call in the writer process fails, signals are finally delivered and the
>> writer-process is terminated. But the cluster doesn't know about that!
>>
>> I thought I can solve this by setting the failure-timeout meta attribute
>> to the writer-process resource, but it only made things worse. The
>> documentation states: "Stop failures are slightly different and crucial.
>> ... If a resource fails to stop and STONITH is not enabled, then the
>> cluster has no way to continue and will not try to start the resource
>> elsewhere, but will try to stop it again after the failure timeout.",
>> but I'm seeing something different. When the policy engine is launched
>> after the nearest cluster-recheck-interval, following lines are written
>> to the syslog:
>> crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE
>> pengine[11851]:  notice: Clearing expired failcount for writer-process:1
>> on node1.example.org
>> pengine[11851]:  notice: Clearing expired failcount for writer-process:1
>> on node1.example.org
>> pengine[11851]:  notice: Ignoring expired calculated failure
>> writer-process_stop_0 (rc=1,
>> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
>> node1.example.org
>> pengine[11851]:  notice: Clearing expired failcount for writer-process:1
>> on node1.example.org
>> pengine[11851]:  notice: Ignoring expired calculated failure
>> writer-process_stop_0 (rc=1,
>> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
>> node1.example.org
>> pengine[11851]: warning: Processing failed op monitor for
>> gluster-mount:1 on node1.example.org: unknown error (1)
>> pengine[11851]:  notice: Calculated transition 564, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-362.bz2
>> crmd[11852]:  notice: Transition 564 (Complete=2, Pending=0, Fired=0,
>> Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-362.bz2): Complete
>> crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>> crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
>> crmd[11852]: warning: No reason to expect node 3 to be down
>> crmd[11852]: warning: No reason to expect node 1 to be down
>> crmd[11852]: warning: No reason to expect node 1 to be down
>> crmd[11852]: warning: No reason to expect node 3 to be down
>> pengine[11851]: warning: Processing failed op stop for writer-process:1
>> on node1.example.org: unknown error (1)
>> pengine[11851]: warning: Processing failed op monitor for
>> gluster-mount:1 on node1.example.org: unknown error (1)
>> pengine[11851]: warning: Forcing writer-process-clone away from
>> node1.example.org after 1000000 failures (max=1000000)
>> pengine[11851]: warning: Forcing writer-process-clone away from
>> node1.example.org after 1000000 failures (max=1000000)
>> pengine[11851]: warning: Forcing writer-process-clone away from
>> node1.example.org after 1000000 failures (max=1000000)
>> pengine[11851]:  notice: Calculated transition 565, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-363.bz2
>> pengine[11851]:  notice: Ignoring expired calculated failure
>> writer-process_stop_0 (rc=1,
>> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
>> node1.example.org
>> pengine[11851]: warning: Processing failed op monitor for
>> gluster-mount:1 on node1.example.org: unknown error (1)
>> crmd[11852]:  notice: Transition 566 (Complete=0, Pending=0, Fired=0,
>> Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
>> crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>> pengine[11851]:  notice: Calculated transition 566, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-364.bz2
>>
>> Then after each cluster-recheck-interval:
>> crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
>> pengine[11851]:  notice: Ignoring expired calculated failure
>> writer-process_stop_0 (rc=1,
>> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
>> node1.example.org
>> pengine[11851]: warning: Processing failed op monitor for
>> gluster-mount:1 on node1.example.org: unknown error (1)
>> crmd[11852]:  notice: Transition 567 (Complete=0, Pending=0, Fired=0,
>> Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
>> crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>>
>> And the crm_mon is happily showing the writer-process as Started,
>> although it is not running. This is very confusing. Could anyone please
>> explain what is going on here?
>>
>> ||||
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3706 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170505/47e585d9/attachment-0002.p7s>