[ClusterLabs] STOP cluster after update resource

Wed Oct 7 08:48:44 UTC 2015

Hello.

We was looking the ways to utilize Corosync/Pacemaker stack for creating a
high-availability cluster of PostgreSQL servers with automatic failover.

We are using Corosync (2.3.4) as a messaging layer and a stateful 
master/slave
Resource Agent (pgsql) with Pacemaker (1.1.12) on CentOS 7.1.

Things work pretty well for a static cluster - where membership is 
defined up front.
However, we needed to be able to seamlessly add new machines (node) to 
the cluster and remove
existing ones from it, without service interruption. And we ran into a 
problem.

Is it possible to add a new node dynamically without interruption?

Do you know the way to add new node to cluster without this disruption?
Maybe some command or something else?

05.10.2015 13:19, Nikolay Popov пишет:
> Hello.
>
> I have got STOP cluster status when add\del new cluster node <pi05> 
> after run <update pgsql> command:
>
> How to add a node without STOP cluster?
>
> I am doing command step's:
>
> # pcs cluster auth pi01 pi02 pi03 pi05 -u hacluster -p hacluster
>
> pi01: Authorized
> pi02: Authorized
> pi03: Authorized
> pi05: Authorized
>
> # pcs cluster node add pi05 --start
>
> pi01: Corosync updated
> pi02: Corosync updated
> pi03: Corosync updated
> pi05: Succeeded
> pi05: Starting Cluster...
>
> # pcs resource show --full
>
>  Group: master-group
>   Resource: vip-master (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=192.168.242.100 nic=eth0 cidr_netmask=24
>    Operations: start interval=0s timeout=60s on-fail=restart 
> (vip-master-start-interval-0s)
>                monitor interval=10s timeout=60s on-fail=restart 
> (vip-master-monitor-interval-10s)
>                stop interval=0s timeout=60s on-fail=block 
> (vip-master-stop-interval-0s)
>   Resource: vip-rep (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=192.168.242.101 nic=eth0 cidr_netmask=24
>    Meta Attrs: migration-threshold=0
>    Operations: start interval=0s timeout=60s on-fail=stop 
> (vip-rep-start-interval-0s)
>                monitor interval=10s timeout=60s on-fail=restart 
> (vip-rep-monitor-interval-10s)
>                stop interval=0s timeout=60s on-fail=ignore 
> (vip-rep-stop-interval-0s)
>  Master: msPostgresql
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=3 
> clone-node-max=1 notify=true
>   Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
>    Attributes: pgctl=/usr/pgsql-9.5/bin/pg_ctl 
> psql=/usr/pgsql-9.5/bin/psql pgdata=/var/lib/pgsql/9.5/data/ 
> rep_mode=sync node_list="pi01 pi02 pi03" restore_command="cp 
> /var/lib/pgsql/9.5/data/wal_archive/%f %p" 
> primary_conninfo_opt="user=repl password=super-pass-for-repl 
> keepalives_idle=60 keepalives_interval=5 keepalives_count=5" 
> master_ip=192.168.242.100 restart_on_promote=true check_wal_receiver=true
>    Operations: start interval=0s timeout=60s on-fail=restart 
> (pgsql-start-interval-0s)
>                monitor interval=4s timeout=60s on-fail=restart 
> (pgsql-monitor-interval-4s)
>                monitor role=Master timeout=60s on-fail=restart 
> interval=3s (pgsql-monitor-interval-3s-role-Master)
>                promote interval=0s timeout=60s on-fail=restart 
> (pgsql-promote-interval-0s)
>                demote interval=0s timeout=60s on-fail=stop 
> (pgsql-demote-interval-0s)
>                stop interval=0s timeout=60s on-fail=block 
> (pgsql-stop-interval-0s)
>                notify interval=0s timeout=60s (pgsql-notify-interval-0s)
>
>
> # pcs resource update msPostgresql pgsql master-max=1 
> master-node-max=1 clone-max=4 clone-node-max=1 notify=true
>
> # pcs resource update pgsql pgsql node_list="pi01 pi02 pi03 pi05"
>
> # crm_mon -Afr1
>
> Last updated: Fri Oct  2 17:07:05 2015          Last change: Fri Oct  
> 2 17:06:37 2015
>  by root via cibadmin on pi01
> Stack: corosync
> Current DC: pi02 (version 1.1.13-a14efad) - partition with quorum
> 4 nodes and 9 resources configured
>
> Online: [ pi01 pi02 pi03 pi05 ]
>
> Full list of resources:
>
>  Resource Group: master-group
>      vip-master (ocf::heartbeat:IPaddr2):       Stopped
>      vip-rep    (ocf::heartbeat:IPaddr2):       Stopped
>  Master/Slave Set: msPostgresql [pgsql]
>      Slaves: [ pi02 ]
>      Stopped: [ pi01 pi03 pi05 ]
>  fence-pi01     (stonith:fence_ssh):    Started pi02
>  fence-pi02     (stonith:fence_ssh):    Started pi01
>  fence-pi03     (stonith:fence_ssh):    Started pi01
>
> Node Attributes:
> * Node pi01:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : STREAMING|SYNC
>     + pgsql-status                      : STOP
> * Node pi02:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : LATEST
>     + pgsql-status                      : STOP
> * Node pi03:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : STREAMING|POTENTIAL
>     + pgsql-status                      : STOP
> * Node pi05:
>     + master-pgsql                      : -INFINITY
>     + pgsql-status                      : STOP
>
> Migration Summary:
> * Node pi01:
> * Node pi03:
> * Node pi02:
> * Node pi05:
>
> After some time is worked:
>
> Every 2.0s: crm_mon 
> -Afr1                                                Fri Oct 2 
> 17:04:36 2015
>
> Last updated: Fri Oct  2 17:04:36 2015          Last change: Fri Oct  
> 2 17:04:07 2015 by root via
>  cibadmin on pi01
> Stack: corosync
> Current DC: pi02 (version 1.1.13-a14efad) - partition with quorum
> 4 nodes and 9 resources configured
>
> Online: [ pi01 pi02 pi03 pi05 ]
>
> Full list of resources:
>
>  Resource Group: master-group
>      vip-master (ocf::heartbeat:IPaddr2):       Started pi02
>      vip-rep    (ocf::heartbeat:IPaddr2):       Started pi02
>  Master/Slave Set: msPostgresql [pgsql]
>      Masters: [ pi02 ]
>      Slaves: [ pi01 pi03 pi05 ]
>
>  fence-pi01     (stonith:fence_ssh):    Started pi02
>  fence-pi02     (stonith:fence_ssh):    Started pi01
>  fence-pi03     (stonith:fence_ssh):    Started pi01
>
> Node Attributes:
> * Node pi01:
>     + master-pgsql                      : 100
>     + pgsql-data-status                 : STREAMING|SYNC
>     + pgsql-receiver-status             : normal
>     + pgsql-status                      : HS:sync
> * Node pi02:
>     + master-pgsql                      : 1000
>     + pgsql-data-status                 : LATEST
>     + pgsql-master-baseline             : 0000000008000098
>     + pgsql-receiver-status             : ERROR
>     + pgsql-status                      : PRI
> * Node pi03:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : STREAMING|POTENTIAL
>     + pgsql-receiver-status             : normal
>     + pgsql-status                      : HS:potential
> * Node pi05:
>     + master-pgsql      : -INFINITY
>     + pgsql-data-status                      : STREAMING|POTENTIAL
>     + pgsql-receiver-status                  : normal
>     + pgsql-status                           : HS:potential
>
> Migration Summary:
> * Node pi01:
> * Node pi03:
> * Node pi02:
> * Node pi05:
>
>
> -- 
> Nikolay Popov
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Nikolay Popov
n.popov at postgrespro.ru
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151007/48324bf9/attachment.htm>