[Pacemaker] Why "order o inf: VIP A B" starts VIP, A and B simultaneously ?

Tue Jul 22 05:27:14 EDT 2014

Thank you Andrew, perfect !
The more I use Pacemaker, the more tricky it gets when working with
third-party tools (although crmsh is great). Completing  PCMK configuration
obviously involves some manual XML file tuning. It's often more crmsh
issues than PCMK actually...

2014-06-27 7:43 GMT+02:00 Andrew Beekhof <andrew at beekhof.net>:

>
> On 25 Jun 2014, at 7:36 pm, Sékine Coulibaly <scoulibaly at gmail.com> wrote:
>
> > Hi all,
> >
> > My setup is as follows : RedHat 6.3 (yes, I know,this is quite old) ,
> Pacemaker 1.1.7, Corosync 1.4.1.
> >
> > I noticed something that is strange because since it doesn't complies
> with what I read (and understood) from the following ressources :
> > 1. http://crmsh.nongnu.org/crm.8.html#cmdhelp_configure_order
> > 2.
> http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.manual_config.html
> >
> >
> > The strange behaviour I experience is related to ordering.
> > Let's have this very basic case :
> >
> > One zookeeper process, and one PostgreSQL master. I want them to run
> alltogether on the master node. Since they bind on the VIP IP, I want the
> VIP to be started before ZK and PostgreSQL.
> > So I setup :
> >
> > primitive VIP73 ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1"
> cidr_netmask="24" iflabel="VIP73" \
> >         op monitor interval="10s" timeout="20s"
> >
> > primitive POSTGRESQL ocf:custom:postgresql \
> >         params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf"
> pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \
> >         op start interval="0" timeout="90s" \
> >         op stop interval="0" timeout="60s" \
> >         op promote interval="0" timeout="120s" \
> >         op monitor interval="53s" role="Master" \
> >         op monitor interval="60s" role="Slave"
> >
> > primitive ZK ocf:custom:zookeeper \
> >         op monitor interval="5s" timeout="10s" \
> >         op start interval="0" timeout="10s" \
> >         op stop interval="0" timeout="10s"
> >
> > ms MS_POSTGRESQL POSTGRESQL \
> >         meta clone-max="2" target-role="Started"
> resource-stickiness="100" notify="true"
> >
> > I then add an ordering such as this one :
> > order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote
> >
> > I expect the VIP to be mounted, and then ZK started, and then PostgreSQL
> master to be promoted. Instead, all the resources seem to be started in
> parallel.
>
> Yep, that'll do that.
> I think you need brackets around the group or something, my memory is a
> little hazy.
> Thats the trigger for crmsh to set sequential=true which gives the
> behaviour you're looking for.
>
> The default is quite silly.
>
> Possibly its easier to just edit the xml.
>
> >
> > Here comes a /var/log/messages extract, taken just after a corosync
> restart.
> > It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all
> started simultaneously.
> > Do I have something wrong here, or something I didn't understand ?
> >
> >
> >
> > primitive VIP73 ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1"
> cidr_netmask="24" iflabel="VIP73" \
> >         op monitor interval="10s" timeout="20s"
> >
> > primitive POSTGRESQL ocf:custom:postgresql \
> >         params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf"
> pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \
> >         op start interval="0" timeout="90s" \
> >         op stop interval="0" timeout="60s" \
> >         op promote interval="0" timeout="120s" \
> >         op monitor interval="53s" role="Master" \
> >         op monitor interval="60s" role="Slave"
> >
> > primitive ZK ocf:custom:zookeeper \
> >         op monitor interval="5s" timeout="10s" \
> >         op start interval="0" timeout="10s" \
> >         op stop interval="0" timeout="10s"
> > ms MS_POSTGRESQL POSTGRESQL \
> >         meta clone-max="2" target-role="Started"
> resource-stickiness="100" notify="true"
> >
> > I then add an ordering such as this one :
> > order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote
> >
> > I expect the VIP to be mounted, and then ZK started, and then PostgreSQL
> master to be promoted. Instead, all the resources seem to be started in
> parallel.
> >
> > Here comes a /var/log/messages extract, taken just after a corosync
> restart.
> > It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all
> started simultaneously.
> > Do I have something wrong here, or something I didn't understand ?
> >
> > Thank you !
> >
> > Jun 25 05:19:52 clustera pengine[33832]:     info:
> determine_online_status: Node clustera is online
> > Jun 25 05:19:52 clustera pengine[33832]:     info: native_print:
> VIP73#011(ocf::heartbeat:IPaddr2):#011Stopped
> > Jun 25 05:19:52 clustera pengine[33832]:     info: clone_print:
>  Master/Slave Set: MS_POSTGRESQL [POSTGRESQL]
> > Jun 25 05:19:52 clustera pengine[33832]:     info: short_print:
>  Stopped: [ POSTGRESQL:0 POSTGRESQL:1 ]
> > Jun 25 05:19:52 clustera pengine[33832]:     info: native_print:
> ZK#011(ocf::custom:zookeeper):#011Stopped
> > Jun 25 05:19:52 clustera pengine[33832]:     info: native_color:
> Resource POSTGRESQL:1 cannot run anywhere
> > Jun 25 05:19:52 clustera pengine[33832]:     info: master_color:
> MS_POSTGRESQL: Promoted 0 instances of a possible 1 to master
> > Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start
> recurring monitor (10s) for VIP73 on clustera
> > Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start
> recurring monitor (60s) for POSTGRESQL:0 on clustera
> > Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start
> recurring monitor (60s) for POSTGRESQL:0 on clustera
> > Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start
> recurring monitor (5s) for ZK on clustera
> > Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start
> VIP73#011(clustera)
> > Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start
> POSTGRESQL:0#011(clustera)
> > Jun 25 05:19:52 clustera pengine[33832]:     info: LogActions: Leave
> POSTGRESQL:1#011(Stopped)
> > Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start
> ZK#011(clustera)
> >
> > ...
> > Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 3: probe_complete probe_complete on clustera (local) - no
> waiting
> > Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 7: start VIP73_start_0 on clustera (local)
> > Jun 25 05:19:52 clustera attrd[33831]:     info: find_hash_entry:
> Creating hash entry for probe_complete
> > Jun 25 05:19:52 clustera attrd[33831]:   notice: attrd_trigger_update:
> Sending flush op to all hosts for: probe_complete (true)
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:5: start
> > Jun 25 05:19:52 clustera attrd[33831]:   notice: attrd_perform_update:
> Sent update 4: probe_complete=true
> > Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 9: start POSTGRESQL:0_start_0 on clustera (local)
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:POSTGRESQL:0:6: start
> > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip -f inet addr
> add 192.168.73.222/24 brd 192.168.73.255 dev eth1 label eth1:VIP73
> > Jun 25 05:19:52 clustera avahi-daemon[2065]: Registering new address
> record for 192.168.73.222 on eth1.IPv4.
> > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip link set eth1 up
> > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO:
> /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p
> /var/run/heartbeat/rsctmp/send_arp-192.168.73.222 eth1 192.168.73.222 auto
> not_used not_used
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> POSTGRESQL:0: Starting
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> ACTION=start
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:start
> process 33953 exited with return code 0.
> > Jun 25 05:19:52 clustera crmd[33833]:     info: process_lrm_event: LRM
> operation VIP73_start_0 (call=5, rc=0, cib-update=29, confirmed=true) ok
> > Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 8: monitor VIP73_monitor_10000 on clustera (local)
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:7: monitor
> > Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 35: start ZK_start_0 on clustera (local)
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:ZK:8: start
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RA_VERSION_MAJOR=1
> > Jun 25 05:19:52 clustera zookeeper(ZK)[34058]: INFO: [ZK] No pid file
> found
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RA_VERSION_MINOR=0
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_clone=0
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_clone_max=2
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_clone_node_max=1
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_globally_unique=false
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_master_max=1
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_master_node_max=1
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:monitor
> process 34057 exited with return code 0.
> > Jun 25 05:19:52 clustera crmd[33833]:     info: process_lrm_event: LRM
> operation VIP73_monitor_10000 (call=7, rc=0, cib-update=30,
> confirmed=false) ok
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_name=start
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify=true
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_active_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_active_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_demote_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_demote_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_inactive_resource=POSTGRESQL:0
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> POSTGRESQL:1
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_master_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_master_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_promote_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_promote_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_slave_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_slave_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_start_resource=POSTGRESQL:0
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_start_uname=clustera
> > Jun 25 05:19:52 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stdout) Starting zookeeper ...
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_stop_resource=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_notify_stop_uname=
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_CRM_meta_timeout=90000
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_crm_feature_set=3.0.6
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_pgctl=/usr/pgsql-9.2/bin/pg_ctl
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_pgdata=/opt/custom/pgdata
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESKEY_repmgr_conf=/var/lib/pgsql/repmgr/repmgr.conf
> > Jun 25 05:19:52 clustera avahi-daemon[2065]: Invalid legacy unicast
> query packet.
> > Jun 25 05:19:52 clustera avahi-daemon[2065]: Received response from host
> 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESOURCE_INSTANCE=POSTGRESQL:0
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESOURCE_PROVIDER=custom
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_RESOURCE_TYPE=postgresql
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO:
> OCF_ROOT=/usr/lib/ocf
> > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: Run as
> postgres: /usr/pgsql-9.2/bin/pg_ctl start -w -D /opt/custom/pgdata -l
> /var/log/custom/custom/ha/postgresql/postgres_ha.log -o '-c
> config_file=/opt/custom/pgdata/postgresql.conf' -o '-p 5432'
> > Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast
> query packet.
> > Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast
> query packet.
> > Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host
> 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> > Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host
> 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stdout) STARTED
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stdout)
> ---------------------------------------------------------#012Starting ZK
> restore#012---------------------------------------------------------
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stderr) ls: cannot access
> /var/custom/custom/ha/snapshots/zk_dump_*.xml: No such file or directory
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stdout) [ZK]   Last snapshot :           <NONE>#012[ZK]
> WARNING: Nothing to restore
> > Jun 25 05:19:53 clustera zookeeper(ZK)[34058]: INFO: [ZK] Nothing to
> restore
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output:
> (ZK:start:stdout) ---------------------------------------------------------
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:start process
> 34058 exited with return code 0.
> > Jun 25 05:19:53 clustera crmd[33833]:     info: process_lrm_event: LRM
> operation ZK_start_0 (call=8, rc=0, cib-update=31, confirmed=true) ok
> > Jun 25 05:19:53 clustera crmd[33833]:     info: te_rsc_command:
> Initiating action 36: monitor ZK_monitor_5000 on clustera (local)
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: rsc:ZK:9: monitor
> > Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:monitor process
> 34390 exited with return code 0.
> > Jun 25 05:19:53 clustera crmd[33833]:     info: process_lrm_event: LRM
> operation ZK_monitor_5000 (call=9, rc=0, cib-update=32, confirmed=false) ok
> > Jun 25 05:19:53 clustera postgresql(POSTGRESQL:0)[33954]: INFO: waiting
> for server to start.... done server started
> > Jun 25 05:19:54 clustera avahi-daemon[2065]: Received response from host
> 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> > Jun 25 05:19:56 clustera avahi-daemon[2065]: Received response from host
> 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> > Jun 25 05:19:56 clustera lrmd: [33830]: info: RA output:
> (VIP73:start:stderr) ARPING 192.168.73.222 from 192.168.73.222 eth1#012Sent
> 5 probes (5 broadcast(s))#
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140722/03c563bd/attachment-0002.html>