[ClusterLabs] Antw: Behavior after stop action failure with the failure-timeout set and STONITH disabled
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri May 5 08:15:45 CEST 2017
>>> Jan Wrona <wrona at cesnet.cz> schrieb am 04.05.2017 um 16:41 in Nachricht
<cac9591a-7efc-d524-5d2c-248415c5c37e at cesnet.cz>:
> I hope I'll be able to explain the problem clearly and correctly.
>
> My setup (simplified): I have two cloned resources, a filesystem mount
> and a process which writes to that filesystem. The filesystem is Gluster
> so its OK to clone it. I also have a mandatory ordering constraint
> "start gluster-mount-clone then start writer-process-clone". I don't
> have a STONITH device, so I've disable STONITH by settin||||g
> stonith-enabled=false.
>
> The problem: Sometimes the Gluster freezes for a while, which causes the
> gluster-mount resource's monitor with the OCF_CHECK_LEVEL=20 to timeout
> (it is unable to write the status file). When this happens, the cluster
Actually I would do two things:
1) Find out why Gluster freezes, and what to do to avoid that
2) Implement stonith
Regards,
Ulrich
> tries to recover by restarting the writer-process resource. But the
> writer-process is writing to the frozen filesystem which makes it
> uninterruptable, not even SIGKILL works. Then the stop operation times
> out and on-fail with disabled STONITH defaults to block (don’t perform
> any further operations on the resource):
> warning: Forcing writer-process-clone away from node1.example.org after
> 1000000 failures (max=1000000)
> After that, the cluster continues with the recovery process by
> restarting the gluster-mount resource on that node and it usually
> succeeds. As a consequence of that remount, the uninterruptable system
> call in the writer process fails, signals are finally delivered and the
> writer-process is terminated. But the cluster doesn't know about that!
>
> I thought I can solve this by setting the failure-timeout meta attribute
> to the writer-process resource, but it only made things worse. The
> documentation states: "Stop failures are slightly different and crucial.
> ... If a resource fails to stop and STONITH is not enabled, then the
> cluster has no way to continue and will not try to start the resource
> elsewhere, but will try to stop it again after the failure timeout.",
> but I'm seeing something different. When the policy engine is launched
> after the nearest cluster-recheck-interval, following lines are written
> to the syslog:
> crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE
> pengine[11851]: notice: Clearing expired failcount for writer-process:1
> on node1.example.org
> pengine[11851]: notice: Clearing expired failcount for writer-process:1
> on node1.example.org
> pengine[11851]: notice: Ignoring expired calculated failure
> writer-process_stop_0 (rc=1,
> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
> node1.example.org
> pengine[11851]: notice: Clearing expired failcount for writer-process:1
> on node1.example.org
> pengine[11851]: notice: Ignoring expired calculated failure
> writer-process_stop_0 (rc=1,
> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
> node1.example.org
> pengine[11851]: warning: Processing failed op monitor for
> gluster-mount:1 on node1.example.org: unknown error (1)
> pengine[11851]: notice: Calculated transition 564, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-362.bz2
> crmd[11852]: notice: Transition 564 (Complete=2, Pending=0, Fired=0,
> Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-362.bz2): Complete
> crmd[11852]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
> crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE
> crmd[11852]: warning: No reason to expect node 3 to be down
> crmd[11852]: warning: No reason to expect node 1 to be down
> crmd[11852]: warning: No reason to expect node 1 to be down
> crmd[11852]: warning: No reason to expect node 3 to be down
> pengine[11851]: warning: Processing failed op stop for writer-process:1
> on node1.example.org: unknown error (1)
> pengine[11851]: warning: Processing failed op monitor for
> gluster-mount:1 on node1.example.org: unknown error (1)
> pengine[11851]: warning: Forcing writer-process-clone away from
> node1.example.org after 1000000 failures (max=1000000)
> pengine[11851]: warning: Forcing writer-process-clone away from
> node1.example.org after 1000000 failures (max=1000000)
> pengine[11851]: warning: Forcing writer-process-clone away from
> node1.example.org after 1000000 failures (max=1000000)
> pengine[11851]: notice: Calculated transition 565, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-363.bz2
> pengine[11851]: notice: Ignoring expired calculated failure
> writer-process_stop_0 (rc=1,
> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
> node1.example.org
> pengine[11851]: warning: Processing failed op monitor for
> gluster-mount:1 on node1.example.org: unknown error (1)
> crmd[11852]: notice: Transition 566 (Complete=0, Pending=0, Fired=0,
> Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
> crmd[11852]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
> pengine[11851]: notice: Calculated transition 566, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-364.bz2
>
> Then after each cluster-recheck-interval:
> crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE
> pengine[11851]: notice: Ignoring expired calculated failure
> writer-process_stop_0 (rc=1,
> magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on
> node1.example.org
> pengine[11851]: warning: Processing failed op monitor for
> gluster-mount:1 on node1.example.org: unknown error (1)
> crmd[11852]: notice: Transition 567 (Complete=0, Pending=0, Fired=0,
> Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete
> crmd[11852]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>
> And the crm_mon is happily showing the writer-process as Started,
> although it is not running. This is very confusing. Could anyone please
> explain what is going on here?
>
> ||||
More information about the Users
mailing list