[ClusterLabs] Pacemaker multi-state resource stop not running although "pcs status" indicates "Stopped"

Fri Aug 13 15:46:47 EDT 2021

Hello Team,

Hope you doing well.

Running into an issue with multi-state resources not running stop function on a node but failing over to start the resource on another node part of the cluster when corosync process is killed.

Note, in the below, actual resource names/hostnames have been changed from the original.

Snippet of pcs status before corosync is killed:

             $ hostname
pace_node_a

snippet of "pcs status"
colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_a
Master/Slave Set: main-multi-state-resource [main-multi]
     Masters: [ pace_node_a ]
     Stopped: [ pace_node_b ]

Now executed action to kill corosync process using kill -9 on "pace_node_a"

Resulting snippet of "pcs status"

colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_b
Master/Slave Set: main-multi-state-resource [main-multi]
     Stopped: [ pace_node_a ]
     Masters: [ pace_node_b ]

As you can see, pcs status indicates that "main-multi-state-resource" stopped where corosync was killed on "pace_node_a" and started on "pace_node_b". Although, this indication is right, the underlying resource managed by "main-multi-state-resource" never stopped on "pace_node_a". Also, there were no logs from crmd and other components stating it even attempted to stop on "pace_node_a". Interestingly, crmd logs indicated that the colocated resource - "colocated-resource" was being stopped and there is evidence that the resource managed by "colocated-resource" actually stopped.

Is this a known issue?

Please let us know if any additional information is needed.

Thanks for your help!

-Raghav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210813/c1edefb4/attachment.htm>