[ClusterLabs] Pacemaker problems with pingd

Andrei Borzenkov arvidjaar at gmail.com
Thu Aug 5 02:04:31 EDT 2021


On 03.08.2021 22:17, Janusz Jaskiewicz wrote:
> 
> Setup details:
> 
> pcs status:
> Cluster name: dimprodcluster
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition
> with quorum
>   * Last updated: Tue Aug  3 08:20:32 2021

That is before dismprod01 was lost and dimprod02 is DC

>   * Last change:  Mon Aug  2 18:24:39 2021 by root via cibadmin on dimprod01
>   * 2 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * Online: [ dimprod01 dimprod02 ]
> 
> Full List of Resources:
>   * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * WyrDimProdServer (systemd:wyr-dim): Started dimprod01
>   * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData]
> (promotable):
>     * Masters: [ dimprod01 ]
>     * Slaves: [ dimprod02 ]
>   * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01
>   * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * Clone Set: ping-clone [ping]:
>     * Started: [ dimprod01 dimprod02 ]
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> pcs constraint
> Location Constraints:
>   Resource: DimProdClusterIP
>     Constraint: location-DimProdClusterIP
>       Rule: score=-INFINITY
>         Expression: pingd ne 1000
> Ordering Constraints:
>   start DimProdClusterIP then promote WyrDimProdServerData-clone
> (kind:Mandatory)
>   promote WyrDimProdServerData-clone then start WyrDimProdFS
> (kind:Mandatory)
>   start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)
>   start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)
> Colocation Constraints:
>   WyrDimProdServer with DimProdClusterIP (score:INFINITY)
>   DimTestClusterIP with DimProdClusterIP (score:INFINITY)
>   WyrDimProdServerData-clone with DimProdClusterIP (score:INFINITY)
> (with-rsc-role:Master)
>   WyrDimProdFS with DimProdClusterIP (score:INFINITY)
> Ticket Constraints:
> 
> 
> pcs resource config ping
>  Resource: ping (class=ocf provider=pacemaker type=ping)
>   Attributes: dampen=3s host_list=193.30.22.33 multiplier=1000

This means e second delay between request to set attribute and actual
attribute change in CIB.

>   Operations: monitor interval=5s timeout=4s (ping-monitor-interval-5s)
>               start interval=0s timeout=60s (ping-start-interval-0s)
>               stop interval=0s timeout=5s (ping-stop-interval-0s)
> 
> 
> 
> cat /etc/corosync/corosync.conf
> totem {
>     version: 2
>     cluster_name: dimprodcluster
>     transport: knet
>     crypto_cipher: aes256
>     crypto_hash: sha256
>     token: 10000

So you have 10 seconds before node will be declared lost.

...

> 
> Aug 03 08:23:17 dimprod02.my.clustertest.com pacemaker-attrd     [2827046]
> (attrd_client_update) debug: Broadcasting pingd[dimprod02]=0 (writer)
> Aug 03 08:23:17 dimprod02.my.clustertest.com attrd_updater       [3370109]
> (pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=0
> for dimprod02: OK (0)
> Aug 03 08:23:17 dimprod02.my.clustertest.com attrd_updater       [3370109]
> (crm_xml_cleanup) info: Cleaning up memory from libxml2
> Aug 03 08:23:17 dimprod02.my.clustertest.com attrd_updater       [3370109]
> (crm_exit) info: Exiting attrd_updater | with status 0
> 

That is only request to set attribute pingd. Was attribute actually
changed (was CIB updated)? Because ...

> Then the node realizes it is not connected to the other node:
> 
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-attrd     [2827046]
> (pcmk_cpg_membership) info: Group attrd event 8: dimprod01 (node 1 pid
> 2118843) left via cluster exit
...

> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (pcmk_quorum_notification) info: Quorum retained | membership=1140 members=1
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-attrd     [2827046]
> (attrd_peer_update) notice: Setting pingd[dimprod02]: 1000 -> 0 | from
> dimprod02

... at this point current value in CIB is still 1000. And actual
attribute change will happen only 3 seconds after this line due to
dampening.

> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (pcmk_quorum_notification) debug: Member[0] 2
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (crm_update_peer_state_iter) notice: Node dimprod01 state is now lost |
> nodeid=1 previous=member source=crm_reap_unseen_nodes
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (peer_update_callback) info: Cluster node dimprod01 is now lost (was member)
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (match_down_event) debug: No reason to expect node 1 to be down
> Aug 03 08:23:23 dimprod02.my.clustertest.com pacemaker-controld  [2827048]
> (peer_update_callback) warning: Stonith/shutdown of node dimprod01 was not
> expected
> 
> And then the node decides to allocate the resources to this node and start
> it, ignoring the location constraint
> 

As can be seen above, there is no violation. pingd value *at this
moment* is still 1000.
...
> 
> When the resources are started, the node decides that they can not be
> allocated and closes them:
> 
> Aug 03 08:23:27 dimprod02.my.clustertest.com pacemaker-schedulerd[2827047]
> (determine_online_status) info: Node dimprod02 is online

Yes, this is about 3 second dampening, attribute value has actually been
set and and this triggered new evaluation of resource placement.

Either attribute was changed between 08:23:17 and 08:23:23 or after
receiving request to set attribute at 08:23:17 pacemaker waited for
cluster to stabilize. Node loss was reported at 08:23:23 which means
node was not answering since 08:23:13. This need someone with better
internal knowledge to answer.

If the latter assumption is correct, this means inherent race between
setting attributes and using them. Even if dampening is 0, I guess it is
possible that attribute will be set after policy evaluation. Although in
this case race window is pretty small and probably will not result in
any resource actually being started.

Try removing dampening - does it change anything?

...

> Aug 03 08:23:27 dimprod02.my.clustertest.com pacemaker-schedulerd[2827047]
> (create_promotable_actions) debug: Creating actions for
> WyrDimProdServerData-clone
> Aug 03 08:23:27 dimprod02.my.clustertest.com pacemaker-schedulerd[2827047]
> (LogAction) notice:  * Stop       DimProdClusterIP           (
>     dimprod02 )   due to node availability
> Aug 03 08:23:27 dimprod02.my.clustertest.com pacemaker-schedulerd[2827047]
> (LogAction) notice:  * Stop       WyrDimProdServer           (
>     dimprod02 )   due to node availability
> Aug 03 08:23:27 dimprod02.my.clustertest.com pacemaker-schedulerd[2827047]
> (LogActions) info: Leave   WyrDimProdServerData:0 (Master dimprod02)

I am much more surprised by this one. According to constraints,
pacemaker should demote it if it stops DimProdClusterIP



More information about the Users mailing list