[ClusterLabs] why and when a call of crm_attribute can be delayed ?

Wed May 4 14:55:34 UTC 2016

On 04/25/2016 05:02 AM, Jehan-Guillaume de Rorthais wrote:
> Hi all,
> 
> I am facing a strange issue with attrd while doing some testing on a three node
> cluster with the pgsqlms RA [1].
> 
> pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave
> setup on top of pgsqld.
> 
> Before triggering a failure, here was the situation:
> 
>   * centos1: pgsql-ha slave
>   * centos2: pgsql-ha slave
>   * centos3: pgsql-ha master
> 
> Then we triggered a failure: the node centos3 has been kill using 
> 
>   echo c > /proc/sysrq-trigger
> 
> In this situation, PEngine provide a transition where :
> 
>   * centos3 is fenced 
>   * pgsql-ha on centos2 is promoted
> 
> During the pre-promote notify action in the pgsqlms RA, each remaining slave are
> setting a node attribute called lsn_location, see: 
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504
> 
>   crm_attribute -l reboot -t status --node "$nodename" \
>                 --name lsn_location --update "$node_lsn"
> 
> During the promotion action in the pgsqlms RA, the RA check the lsn_location of
> the all the nodes to make sure the local one is higher or equal to all others.
> See:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292
> 
> This is where we face a attrd behavior we don't understand.
> 
> Despite we can see in the log the RA was able to set its local
> "lsn_location", during the promotion action, the RA was unable to read its
> local lsn_location":
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
>     INFO: pgsql_notify: promoting instance on node "centos2" 
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
>     INFO: pgsql_notify: current node LSN: 0/1EE24000 
> 
>   [...]
> 
>   pgsqlms(pgsqld)[9023]:  2016/04/22_14:46:16
>     CRIT: pgsql_promote: can not get current node LSN location
> 
>   Apr 22 14:46:16 [5864] centos2       lrmd:
>     notice: operation_finished: pgsqld_promote_0:9023:stderr 
>     [ Error performing operation: No such device or address ] 
> 
>   Apr 22 14:46:16 [5864] centos2       lrmd:     
>     info: log_finished:      finished - rsc:pgsqld
>     action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms
>     queue-time:0ms
> 
> The error comes from:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320
> 
> **After** this error, we can see in the log file attrd set the "lsn_location" of
> centos2:
> 
>   Apr 22 14:46:16 [5865] centos2
>     attrd:     info: attrd_peer_update:
>     Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 
> 
>   Apr 22 14:46:16 [5865] centos2
>     attrd:     info: write_attribute:   
>     Write out of 'lsn_location' delayed:    update 189 in progress
> 
> 
> As I understand it, the call of crm_attribute during pre-promote notification
> has been taken into account AFTER the "promote" action, leading to this error.
> Am I right?
> 
> Why and how this could happen? Could it comes from the dampen parameter? We did
> not set any dampen anywhere, is there a default value in the cluster setup?
> Could we avoid this behavior?

Unfortunately, that is expected. Both the cluster's call of the RA's
notify action, and the RA's call of crm_attribute, are asynchronous. So
there is no guarantee that anything done by the pre-promote notify will
be complete (or synchronized across other cluster nodes) by the time the
promote action is called.

There would be no point in the pre-promote notify waiting for the
attribute value to be retrievable, because the cluster isn't going to
wait for the pre-promote notify to finish before calling promote.

Maybe someone else can come up with a better idea, but I'm thinking
maybe the attribute could be set as timestamp:lsn, and the promote
action could poll attrd repeatedly (for a small duration lower than the
typical promote timeout) until it gets lsn's with a recent timestamp
from all nodes. One error condition to handle would be if one of the
other slaves happens to fail or be unresponsive at that time.

> Please, find in attachment a tarball with :
>   * all cluster logfiles from the three nodes
>   * the content of /var/lib/pacemaker from the three nodes:
>     * CIBs
>     * PEngine transitions
> 
> 
> Regards,
> 
> [1] https://github.com/dalibo/PAF
>