[ClusterLabs] make promoted follow promoted resource ?

lejeczek peljasz at yahoo.co.uk
Sun Nov 26 13:11:29 EST 2023



On 26/11/2023 17:44, Andrei Borzenkov wrote:
> On 26.11.2023 12:32, lejeczek via Users wrote:
>> Hi guys.
>>
>> With these:
>>
>> -> $ pcs resource status REDIS-6381-clone
>>     * Clone Set: REDIS-6381-clone [REDIS-6381] (promotable):
>>       * Promoted: [ ubusrv2 ]
>>       * Unpromoted: [ ubusrv1 ubusrv3 ]
>>
>> -> $ pcs resource status PGSQL-PAF-5433-clone
>>     * Clone Set: PGSQL-PAF-5433-clone [PGSQL-PAF-5433]
>> (promotable):
>>       * Promoted: [ ubusrv1 ]
>>       * Unpromoted: [ ubusrv2 ubusrv3 ]
>>
>> -> $ pcs constraint ref REDIS-6381-clone
>> Resource: REDIS-6381-clone
>>     
>> colocation-REDIS-6381-clone-PGSQL-PAF-5433-clone-INFINITY
>>
>> basically promoted Redis should follow promoted pgSQL but
>> it's not happening, usually it does.
>> I presume pcs/cluster does something internally which
>> results in disobeying/ignoring that _colocation_ constraint
>> for these resources.
>> I presume scoring might play a role:
>>     REDIS-6385-clone with PGSQL-PAF-5435-clone (score:1001)
>> (rsc-role:Master) (with-rsc-role:Master)
>> but usually, that scoring works, only "now" it does not.
>> Any comments I appreciate much.
>> thanks, L.
>>
>> I looked at pamaker log - snippet below after
>> REDIS-6381-clone re-enabled - but cannot see explanation for
>> this.
>> ...
>>    notice: Calculated transition 110, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-3729.bz2
>>    notice: Transition 110 (Complete=0, Pending=0, Fired=0,
>> Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-3729.bz2): 
>> Complete
>>    notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>>    notice: State transition S_IDLE -> S_POLICY_ENGINE
>>    notice: Actions: Start      REDIS-6381:0
>> (                        ubusrv2 )
>>    notice: Actions: Start      REDIS-6381:1
>> (                        ubusrv3 )
>>    notice: Actions: Start      REDIS-6381:2
>> (                        ubusrv1 )
>>    notice: Calculated transition 111, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-3730.bz2
>>    notice: Initiating start operation REDIS-6381_start_0
>> locally on ubusrv2
>>    notice: Requesting local execution of start operation for
>> REDIS-6381 on ubusrv2
>> (to redis) root on none
>> pam_unix(su:session): session opened for user redis(uid=127)
>> by (uid=0)
>> pam_sss(su:session): Request to sssd failed. Connection 
>> refused
>> pam_unix(su:session): session closed for user redis
>> pam_sss(su:session): Request to sssd failed. Connection 
>> refused
>>    notice: Setting master-REDIS-6381[ubusrv2]: (unset) -> 
>> 1000
>
> This is the only line that sets master score, so 
> apparently ubusrv2 is the only node where your clone *can* 
> be promoted. Whether pacemaker is expected to fail this 
> operation because it violates constraint I do not know.
>
>>    notice: Transition 111 aborted by
>> status-2-master-REDIS-6381 doing create
>> master-REDIS-6381=1000: Transient attribute change
>> INFO: demote: Setting master to 'no-such-master'
>>    notice: Result of start operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Transition 111 (Complete=4, Pending=0, Fired=0,
>> Skipped=1, Incomplete=14,
>> Source=/var/lib/pacemaker/pengine/pe-input-3730.bz2): 
>> Stopped
>>    notice: Actions: Promote    REDIS-6381:0         (
>> Unpromoted -> Promoted ubusrv2 )
>>    notice: Actions: Start      REDIS-6381:1
>> (                        ubusrv1 )
>>    notice: Actions: Start      REDIS-6381:2
>> (                        ubusrv3 )
>>    notice: Calculated transition 112, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-3731.bz2
>>    notice: Initiating notify operation
>> REDIS-6381_pre_notify_start_0 locally on ubusrv2
>>    notice: Requesting local execution of notify operation 
>> for
>> REDIS-6381 on ubusrv2
>>    notice: Result of notify operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Initiating start operation REDIS-6381_start_0 on
>> ubusrv1
>>    notice: Initiating start operation 
>> REDIS-6381:2_start_0 on
>> ubusrv3
>>    notice: Initiating notify operation
>> REDIS-6381_post_notify_start_0 locally on ubusrv2
>>    notice: Requesting local execution of notify operation 
>> for
>> REDIS-6381 on ubusrv2
>>    notice: Initiating notify operation
>> REDIS-6381_post_notify_start_0 on ubusrv1
>>    notice: Initiating notify operation
>> REDIS-6381:2_post_notify_start_0 on ubusrv3
>>    notice: Result of notify operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Initiating notify operation
>> REDIS-6381_pre_notify_promote_0 locally on ubusrv2
>>    notice: Requesting local execution of notify operation 
>> for
>> REDIS-6381 on ubusrv2
>>    notice: Initiating notify operation
>> REDIS-6381_pre_notify_promote_0 on ubusrv1
>>    notice: Initiating notify operation
>> REDIS-6381:2_pre_notify_promote_0 on ubusrv3
>>    notice: Result of notify operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Initiating promote operation REDIS-6381_promote_0
>> locally on ubusrv2
>>    notice: Requesting local execution of promote operation
>> for REDIS-6381 on ubusrv2
>>    notice: Result of promote operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Initiating notify operation
>> REDIS-6381_post_notify_promote_0 locally on ubusrv2
>>    notice: Requesting local execution of notify operation 
>> for
>> REDIS-6381 on ubusrv2
>>    notice: Initiating notify operation
>> REDIS-6381_post_notify_promote_0 on ubusrv1
>>    notice: Initiating notify operation
>> REDIS-6381:2_post_notify_promote_0 on ubusrv3
>>    notice: Result of notify operation for REDIS-6381 on
>> ubusrv2: ok
>>    notice: Setting master-REDIS-6381[ubusrv3]: (unset) -> 1
>>    notice: Transition 112 aborted by
>> status-3-master-REDIS-6381 doing create master-REDIS-6381=1:
>> Transient attribute change
>>    notice: Setting master-REDIS-6381[ubusrv1]: (unset) -> 1
>>    notice: Transition 112 (Complete=25, Pending=0, Fired=0,
>> Skipped=5, Incomplete=5,
>> Source=/var/lib/pacemaker/pengine/pe-input-3731.bz2): 
>> Stopped
>>    notice: Calculated transition 113, saving inputs in
>> /var/lib/pacemaker/pengine/pe-input-3732.bz2
>>    notice: Initiating monitor operation
>> REDIS-6381_monitor_20000 locally on ubusrv2
>>    notice: Requesting local execution of monitor operation
>> for REDIS-6381 on ubusrv2
>>    notice: Initiating monitor operation
>> REDIS-6381_monitor_60000 on ubusrv3
>>    notice: Initiating monitor operation
>> REDIS-6381_monitor_45000 on ubusrv3
>>    notice: Initiating monitor operation
>> REDIS-6381_monitor_60000 on ubusrv1
>>    notice: Initiating monitor operation
>> REDIS-6381_monitor_45000 on ubusrv1
>>    notice: Result of monitor operation for REDIS-6381 on
>> ubusrv2: promoted
>>    notice: Transition 113 (Complete=5, Pending=0, Fired=0,
>> Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-3732.bz2): 
>> Complete
>>    notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>>
>>
>>
This is "weird" - this time it took an orderly node reboot 
and those attr of the node went back to "normal" and 
constraint got honoured - and I see this happens: usually 
when cluster got "nuked" partially by underlying hardware, 
this ha-cluster is VM environment.
Then cluster, as here, cannot "figure" it out.


More information about the Users mailing list