[ClusterLabs] Postgres clone resource does not get "notice" events

Reid Wahl nwahl at redhat.com
Mon Jul 4 05:39:20 EDT 2022


On Mon, Jul 4, 2022 at 1:06 AM Reid Wahl <nwahl at redhat.com> wrote:
>
> On Sat, Jul 2, 2022 at 1:12 PM vitaly <vitaly at unitc.com> wrote:
> >
> > Sorry, I noticed that I am missing meta "notice=true" and after adding it to postgres-ms configuration "notice" events started to come through.
> > Item 1 still needs explanation. As pacemaker-controld keeps complaining.
>
> What happens when you run `OCF_ROOT=/usr/lib/ocf
> /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data`?

This may also be relevant:
https://lists.clusterlabs.org/pipermail/users/2022-June/030391.html

>
> > Thanks!
> > _Vitaly
> >
> > > On 07/02/2022 2:04 PM vitaly <vitaly at unitc.com> wrote:
> > >
> > >
> > > Hello Everybody.
> > > I have a 2 node cluster with clone resource “postgres-ms”. We are running following versions of pacemaker/corosync:
> > > d19-25-left.lab.archivas.com ~ # rpm -qa | grep "pacemaker\|corosync"
> > > pacemaker-cluster-libs-2.0.5-9.el8.x86_64
> > > pacemaker-libs-2.0.5-9.el8.x86_64
> > > pacemaker-cli-2.0.5-9.el8.x86_64
> > > corosynclib-3.1.0-5.el8.x86_64
> > > pacemaker-schemas-2.0.5-9.el8.noarch
> > > corosync-3.1.0-5.el8.x86_64
> > > pacemaker-2.0.5-9.el8.x86_64
> > >
> > > There are couple of issues that could be related.
> > > 1. There are following messages in the logs coming from pacemaker-controld:
> > > Jul  2 14:59:27 d19-25-right pacemaker-controld[1489734]: error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino
> > > Jul  2 14:59:27 d19-25-right pacemaker-controld[1489734]: warning: Failed to get metadata for postgres (ocf:heartbeat:pgsql-rhino)
> > >
> > > 2. ocf:heartbeat:pgsql-rhino does not get any "notice" operations which causes multiple issues with postgres synchronization during availability events.
> > >
> > > 3. Item 2 raises another question. Who is setting these values:
> > > ${OCF_RESKEY_CRM_meta_notify_type}
> > > ${OCF_RESKEY_CRM_meta_notify_operation}
> > >
> > > Here is excerpt from cluster config:
> > >
> > > d19-25-left.lab.archivas.com ~ # pcs config
> > >
> > > Cluster Name:
> > > Corosync Nodes:
> > >  d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com
> > > Pacemaker Nodes:
> > >  d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com
> > >
> > > Resources:
> > >  Clone: postgres-ms
> > >   Meta Attrs: promotable=true target-role=started
> > >   Resource: postgres (class=ocf provider=heartbeat type=pgsql-rhino)
> > >    Attributes: master_ip=172.16.1.6 node_list="d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com" pgdata=/pg_data remote_wals_dir=/remote/walarchive rep_mode=sync reppassword=XXXXXX repuser=XXXXXXX restore_command="/opt/rhino/sil/bin/script_wrapper.sh wal_restore.py  %f %p" tmpdir=/pg_data/tmp wals_dir=/pg_data/pg_wal xlogs_dir=/pg_data/pg_xlog
> > >    Meta Attrs: is-managed=true
> > >    Operations: demote interval=0 on-fail=restart timeout=120s (postgres-demote-interval-0)
> > >                methods interval=0s timeout=5 (postgres-methods-interval-0s)
> > >                monitor interval=10s on-fail=restart timeout=300s (postgres-monitor-interval-10s)
> > >                monitor interval=5s on-fail=restart role=Master timeout=300s (postgres-monitor-interval-5s)
> > >                notify interval=0 on-fail=restart timeout=90s (postgres-notify-interval-0)
> > >                promote interval=0 on-fail=restart timeout=120s (postgres-promote-interval-0)
> > >                start interval=0 on-fail=restart timeout=1800s (postgres-start-interval-0)
> > >                stop interval=0 on-fail=fence timeout=120s (postgres-stop-interval-0)
> > > Thank you very much!
> > > _Vitaly
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA



-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA



More information about the Users mailing list