[ClusterLabs] Antw: [EXT] Pacemaker multi‑state resource stop not running although "pcs status" indicates "Stopped"

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 16 02:22:26 EDT 2021


>>> "ChittaNagaraj, Raghav" <Raghav.ChittaNagaraj at dell.com> schrieb am
13.08.2021
um 21:46 in Nachricht
<BN6PR19MB00367C1DEA7D3F43DD029319E2FA9 at BN6PR19MB0036.namprd19.prod.outlook.com>

> Hello Team,
> 
> Hope you doing well.
> 
> Running into an issue with multi‑state resources not running stop function
on 
> a node but failing over to start the resource on another node part of the 
> cluster when corosync process is killed.
> 
> Note, in the below, actual resource names/hostnames have been changed from 
> the original.
> 
> Snippet of pcs status before corosync is killed:
> 
>              $ hostname
> pace_node_a
> 
> snippet of "pcs status"
> colocated‑resource (ocf::xxx:colocated‑resource):  Started pace_node_a
> Master/Slave Set: main‑multi‑state‑resource [main‑multi]
>      Masters: [ pace_node_a ]
>      Stopped: [ pace_node_b ]
> 
> Now executed action to kill corosync process using kill ‑9 on "pace_node_a"
> 
> Resulting snippet of "pcs status"
> 
> colocated‑resource (ocf::xxx:colocated‑resource):  Started pace_node_b
> Master/Slave Set: main‑multi‑state‑resource [main‑multi]
>      Stopped: [ pace_node_a ]
>      Masters: [ pace_node_b ]
> 
> As you can see, pcs status indicates that "main‑multi‑state‑resource"
stopped 
> where corosync was killed on "pace_node_a" and started on "pace_node_b". 
> Although, this indication is right, the underlying resource managed by 
> "main‑multi‑state‑resource" never stopped on "pace_node_a". Also, there were

> no logs from crmd and other components stating it even attempted to stop on

> "pace_node_a". Interestingly, crmd logs indicated that the colocated
resource 
> ‑ "colocated‑resource" was being stopped and there is evidence that the 
> resource managed by "colocated‑resource" actually stopped.
> 
> Is this a known issue?

I think the next monitor operation should fix this. Also, what was the
motivation behind killing  corosync?

> 
> Please let us know if any additional information is needed.
> 
> Thanks for your help!
> 
> ‑Raghav





More information about the Users mailing list