[ClusterLabs] monitor failed actions not cleared
LE COQUIL Pierre-Yves
pierre-yves.lecoquil at enfrasys.fr
Mon Oct 2 09:29:45 EDT 2017
Hi,
I finally found my mistake:
I have set up the failure-timeout like the lifetime example in the RedHat Documentation with the value PT1M.
If I set up the failure-timeout with 60, it works like it should.
Just trying a last question ...:
Couldn't it be something in the log telling the value isn't at the right format ?
Pierre-Yves
De : LE COQUIL Pierre-Yves
Envoyé : mercredi 27 septembre 2017 19:37
À : 'users at clusterlabs.org' <users at clusterlabs.org>
Objet : RE: monitor failed actions not cleared
De : LE COQUIL Pierre-Yves
Envoyé : lundi 25 septembre 2017 16:58
À : 'users at clusterlabs.org' <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Objet : monitor failed actions not cleared
Hi,
I'am using Pacemaker 1.1.15-11.el7_3.4 / Corosync 2.4.0-4.el7 under CentOS 7.3.1611
ð Is this configuration too old ? (yum indicates these versions are up to date)
ð Should I install more recent versions of Pacemaker and Corosync ?
My subject is very close to the post "clearing failed actions" initiated by Attila Megyeri in May 2017.
But the issue doesn't fit my case.
What I want to do is:
- 2 systemd resources running on 1 of the 2 nodes of my cluster,
- When 1 resource fails (by killing it or by moving the resource), I want it to be restarted on the other node, but I want the other resource still running on the same node.
ð Is this possible with Pacemaker ?
What I have done in addition to the default parameters:
- For my resources:
o migration-threshold=1,
o failure-timeout=PT1M
- For the cluster
o Cluster-recheck-interval=120
I have added for my resource operation monitor: on-fail=restart (which is the default)
I do not use Fencing (Stonith Enabled = false)
ð Is Fencing compatible with my goal ?
What happens:
- When I kill or move 1 resource, it is restarted on the other node => OK
- The failcount is incremented to 1 for this resource => OK
- The failcount is never cleared => NOK
ð I get a warning in the log :
"pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for ACTIVATION_KX on metro.cas-n1: not running (7)"
when my resource ACTIVATION_KX has been killed on node metro.cas-n1
but pcs status shows ACTIVATION_KX is started on the other node
ð Is it a bad monitor operation configuration for my resource ? (I have added "requires= nothing")
I know that my english and my pacemaker knowledge are not so high but could you please give me some explanations about that behavior that I misunderstand.
ð If something is wrong with my post, just tell me (this is my first)
Thank you
Thanks
Pierre-Yves Le Coquil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171002/ed233c39/attachment-0002.html>
More information about the Users
mailing list