[ClusterLabs] Fwd: Postgres pacemaker cluster failure

Sat Apr 27 02:15:29 EDT 2019

27.04.2019 1:04, Danka Ivanović пишет:
> Hi, here is a complete cluster configuration:
> 
> node 1: master
> node 2: secondary
> primitive AWSVIP awsvip \
>         params secondary_private_ip=10.x.x.x api_delay=5
> primitive PGSQL pgsqlms \
>         params pgdata="/var/lib/postgresql/9.5/main"
> bindir="/usr/lib/postgresql/9.5/bin" pghost="/var/run/postgresql/"
> recovery_template="/etc/postgresql/9.5/main/recovery.conf.pcmk"
> start_opts="-c config_file=/etc/postgresql/9.5/main/postgresql.conf" \
>         op start timeout=60s interval=0 \
>         op stop timeout=60s interval=0 \
>         op promote timeout=15s interval=0 \
>         op demote timeout=120s interval=0 \
>         op monitor interval=15s timeout=10s role=Master \
>         op monitor interval=16s timeout=10s role=Slave \
>         op notify timeout=60 interval=0
> primitive fencing-postgres-ha-2 stonith:external/ec2 \
>         params port=master \
>         op start interval=0s timeout=60s \
>         op monitor interval=360s timeout=60s \
>         op stop interval=0s timeout=60s
> primitive fencing-test-rsyslog stonith:external/ec2 \
>         params port=secondary \
>         op start interval=0s timeout=60s \
>         op monitor interval=360s timeout=60s \
>         op stop interval=0s timeout=60s
> ms PGSQL-HA PGSQL \
>         meta notify=true
> colocation IPAWSIP-WITH-MASTER inf: AWSVIP PGSQL-HA:Master
> order demote-then-stop-ip Mandatory: _rsc_set_ PGSQL-HA:demote AWSVIP:stop
> symmetrical=false
> location loc-fence-master fencing-postgres-ha-2 -inf: master
> location loc-fence-secondary fencing-test-rsyslog -inf: secondary
> order promote-then-ip Mandatory: _rsc_set_ PGSQL-HA:promote AWSVIP:start
> symmetrical=false
> property cib-bootstrap-options: \
>         have-watchdog=false \
>         dc-version=1.1.14-70404b0 \
>         cluster-infrastructure=corosync \
>         cluster-name=psql-ha \
>         stonith-enabled=true \
>         no-quorum-policy=ignore \
>         last-lrm-refresh=1556315444 \
>         maintenance-mode=false
> rsc_defaults rsc-options: \
>         resource-stickiness=10 \
>         migration-threshold=2
> 
> I tried to start manually postgres to be sure it is ok. There are no error
> in postgres log. I also tried with different meta parameters, but always
> with notify=true.
> I also tried this:
> ms PGSQL-HA PGSQL \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true interleave=true
> I have followed this link:
> https://clusterlabs.github.io/PAF/Quick_Start-Debian-9-crm.html
> When stonith is enabled and working I imported all other resources and
> constraints all together in the same time.
> 
> On Fri, 26 Apr 2019 at 13:46, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> wrote:
> 
>> Hi,
>>
>> On Thu, 25 Apr 2019 18:57:55 +0200
>> Danka Ivanović <danka.ivanovic at gmail.com> wrote:
>>
>>> Apr 25 16:39:50 [4213] master       lrmd:   notice:
>>> operation_finished:   PGSQL_monitor_0:5849:stderr [ ocf-exit-reason:You
>>> must set meta parameter notify=true for your master resource ]
>>
>> Resource agent pgsqlms refuse to start PgSQL because your configuration
>> lacks
>> the "notify=true" attribute in your master definition.
>>

PAF pgsqlms contains:

    # check notify=true
    $ans = qx{ $CRM_RESOURCE --resource "$OCF_RESOURCE_INSTANCE" \\
                 --meta --get-parameter notify 2>/dev/null };
    chomp $ans;
    unless ( lc($ans) =~ /^true$|^on$|^yes$|^y$|^1$/ ) {
        ocf_exit_reason(
            'You must set meta parameter notify=true for your master
resource'
        );
        exit $OCF_ERR_INSTALLED;
    }

but that is wrong - "notify" is set on ms definition, while
$OCF_RESOURCE_INSTANCE refers to individual clone member. There is no
notify option on PGSQL primitive. Why does not it check
OCF_RESKEY_CRM_meta_notify?