[ClusterLabs] trigger something at ?

Wed Jan 31 12:23:40 EST 2024


On 31/01/2024 17:13, Jehan-Guillaume de Rorthais wrote:
> On Wed, 31 Jan 2024 16:37:21 +0100
> lejeczek via Users <users at clusterlabs.org> wrote:
>
>>
>> On 31/01/2024 16:06, Jehan-Guillaume de Rorthais wrote:
>>> On Wed, 31 Jan 2024 16:02:12 +0100
>>> lejeczek via Users <users at clusterlabs.org> wrote:
>>>
>>>> On 29/01/2024 17:22, Ken Gaillot wrote:
>>>>> On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote:
>>>>>> Hi guys.
>>>>>>
>>>>>> Is it possible to trigger some... action - I'm thinking specifically
>>>>>> at shutdown/start.
>>>>>> If not within the cluster then - if you do that - perhaps outside.
>>>>>> I would like to create/remove constraints, when cluster starts &
>>>>>> stops, respectively.
>>>>>>
>>>>>> many thanks, L.
>>>>>>
>>>>> You could use node status alerts for that, but it's risky for alert
>>>>> agents to change the configuration (since that may result in more
>>>>> alerts and potentially some sort of infinite loop).
>>>>>
>>>>> Pacemaker has no concept of a full cluster start/stop, only node
>>>>> start/stop. You could approximate that by checking whether the node
>>>>> receiving the alert is the only active node.
>>>>>
>>>>> Another possibility would be to write a resource agent that does what
>>>>> you want and order everything else after it. However it's even more
>>>>> risky for a resource agent to modify the configuration.
>>>>>
>>>>> Finally you could write a systemd unit to do what you want and order it
>>>>> after pacemaker.
>>>>>
>>>>> What's wrong with leaving the constraints permanently configured?
>>>> yes, that would be for a node start/stop
>>>> I struggle with using constraints to move pgsql (PAF) master
>>>> onto a given node - seems that co/locating paf's master
>>>> results in troubles (replication brakes) at/after node
>>>> shutdown/reboot (not always, but way too often)
>>> What? What's wrong with colocating PAF's masters exactly? How does it brake
>>> any replication? What's these constraints you are dealing with?
>>>
>>> Could you share your configuration?
>> Constraints beyond/above of what is required by PAF agent
>> itself, say...
>> you have multiple pgSQL cluster with PAF - thus multiple
>> (separate, for each pgSQL cluster) masters and you want to
>> spread/balance those across HA cluster
>> (or in other words - avoid having more that 1 pgsql master
>> per HA node)
> ok
>
>> These below, I've tried, those move the master onto chosen
>> node but.. then the issues I mentioned.
> You just mentioned it breaks the replication, but there so little information
> about your architecture and configuration, it's impossible to imagine how this
> could break the replication.
>
> Could you add details about the issues ?
>
>> -> $ pcs constraint location PGSQL-PAF-5438-clone prefers
>> ubusrv1=1002
>> or
>> -> $ pcs constraint colocation set PGSQL-PAF-5435-clone
>> PGSQL-PAF-5434-clone PGSQL-PAF-5433-clone role=Master
>> require-all=false setoptions score=-1000
> I suppose "collocation" constraint is the way to go, not the "location" one.
This should be easy to replicate, 3 x VMs, Ubuntu 22.04 in 
my case

-> $ pcs resource config PGSQL-PAF-5438-clone
  Clone: PGSQL-PAF-5438-clone
   Meta Attrs: failure-timeout=60s master-max=1 notify=true 
promotable=true
   Resource: PGSQL-PAF-5438 (class=ocf provider=heartbeat 
type=pgsqlms)
    Attributes: bindir=/usr/lib/postgresql/16/bin 
datadir=/var/lib/postgresql/16/paf-5438 maxlag=1000 
pgdata=/etc/postgresql/16/paf-5438 pgport=5438
    Operations: demote interval=0s timeout=120s 
(PGSQL-PAF-5438-demote-interval-0s)
                methods interval=0s timeout=5 
(PGSQL-PAF-5438-methods-interval-0s)
                monitor interval=15s role=Master timeout=10s 
(PGSQL-PAF-5438-monitor-interval-15s)
                monitor interval=16s role=Slave timeout=10s 
(PGSQL-PAF-5438-monitor-interval-16s)
                notify interval=0s timeout=60s 
(PGSQL-PAF-5438-notify-interval-0s)
                promote interval=0s timeout=30s 
(PGSQL-PAF-5438-promote-interval-0s)
                reload interval=0s timeout=20 
(PGSQL-PAF-5438-reload-interval-0s)
                start interval=0s timeout=60s 
(PGSQL-PAF-5438-start-interval-0s)
                stop interval=0s timeout=60s 
(PGSQL-PAF-5438-stop-interval-0s)

so, regarding PAF - 1 master + 2 slaves, have a healthy 
pqSQL/PAF cluster to begin with, then
make resource prefer a specific node (with simplest variant 
of constraints I tried):
-> $ pcs constraint location PGSQL-PAF-5438-clone prefers 
ubusrv1=1002

and play with it, rebooting node(s) with OS' _reboot_
I at some point, get HA/resource unable to start pgSQL, 
unable to elect a master (logs saying with replication 
broken) and I have to "fix" pgSQL cluster outside of PAF, 
using _pg_basebackup_