[Pacemaker] Why "order o inf: VIP A B" starts VIP, A and B simultaneously ?

Fri Jun 27 01:43:56 EDT 2014

On 25 Jun 2014, at 7:36 pm, Sékine Coulibaly <scoulibaly at gmail.com> wrote:

> Hi all,
> 
> My setup is as follows : RedHat 6.3 (yes, I know,this is quite old) , Pacemaker 1.1.7, Corosync 1.4.1.
> 
> I noticed something that is strange because since it doesn't complies with what I read (and understood) from the following ressources :
> 1. http://crmsh.nongnu.org/crm.8.html#cmdhelp_configure_order
> 2. http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.manual_config.html
> 
> 
> The strange behaviour I experience is related to ordering.
> Let's have this very basic case :
> 
> One zookeeper process, and one PostgreSQL master. I want them to run alltogether on the master node. Since they bind on the VIP IP, I want the VIP to be started before ZK and PostgreSQL.
> So I setup :
> 
> primitive VIP73 ocf:heartbeat:IPaddr2 \
>         params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1" cidr_netmask="24" iflabel="VIP73" \
>         op monitor interval="10s" timeout="20s"
> 
> primitive POSTGRESQL ocf:custom:postgresql \
>         params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf" pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \
>         op start interval="0" timeout="90s" \
>         op stop interval="0" timeout="60s" \
>         op promote interval="0" timeout="120s" \
>         op monitor interval="53s" role="Master" \
>         op monitor interval="60s" role="Slave"
> 
> primitive ZK ocf:custom:zookeeper \
>         op monitor interval="5s" timeout="10s" \
>         op start interval="0" timeout="10s" \
>         op stop interval="0" timeout="10s"
> 
> ms MS_POSTGRESQL POSTGRESQL \
>         meta clone-max="2" target-role="Started" resource-stickiness="100" notify="true"
> 
> I then add an ordering such as this one :
> order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote
> 
> I expect the VIP to be mounted, and then ZK started, and then PostgreSQL master to be promoted. Instead, all the resources seem to be started in parallel.

Yep, that'll do that.
I think you need brackets around the group or something, my memory is a little hazy.
Thats the trigger for crmsh to set sequential=true which gives the behaviour you're looking for.

The default is quite silly.

Possibly its easier to just edit the xml.

> 
> Here comes a /var/log/messages extract, taken just after a corosync restart.
> It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all started simultaneously.
> Do I have something wrong here, or something I didn't understand ?
> 
> 
> 
> primitive VIP73 ocf:heartbeat:IPaddr2 \
>         params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1" cidr_netmask="24" iflabel="VIP73" \
>         op monitor interval="10s" timeout="20s"
> 
> primitive POSTGRESQL ocf:custom:postgresql \
>         params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf" pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \
>         op start interval="0" timeout="90s" \
>         op stop interval="0" timeout="60s" \
>         op promote interval="0" timeout="120s" \
>         op monitor interval="53s" role="Master" \
>         op monitor interval="60s" role="Slave"
> 
> primitive ZK ocf:custom:zookeeper \
>         op monitor interval="5s" timeout="10s" \
>         op start interval="0" timeout="10s" \
>         op stop interval="0" timeout="10s"
> ms MS_POSTGRESQL POSTGRESQL \
>         meta clone-max="2" target-role="Started" resource-stickiness="100" notify="true"
> 
> I then add an ordering such as this one :
> order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote
> 
> I expect the VIP to be mounted, and then ZK started, and then PostgreSQL master to be promoted. Instead, all the resources seem to be started in parallel.
> 
> Here comes a /var/log/messages extract, taken just after a corosync restart.
> It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all started simultaneously.
> Do I have something wrong here, or something I didn't understand ?
> 
> Thank you !
> 
> Jun 25 05:19:52 clustera pengine[33832]:     info: determine_online_status: Node clustera is online
> Jun 25 05:19:52 clustera pengine[33832]:     info: native_print: VIP73#011(ocf::heartbeat:IPaddr2):#011Stopped 
> Jun 25 05:19:52 clustera pengine[33832]:     info: clone_print:  Master/Slave Set: MS_POSTGRESQL [POSTGRESQL]
> Jun 25 05:19:52 clustera pengine[33832]:     info: short_print:      Stopped: [ POSTGRESQL:0 POSTGRESQL:1 ]
> Jun 25 05:19:52 clustera pengine[33832]:     info: native_print: ZK#011(ocf::custom:zookeeper):#011Stopped 
> Jun 25 05:19:52 clustera pengine[33832]:     info: native_color: Resource POSTGRESQL:1 cannot run anywhere
> Jun 25 05:19:52 clustera pengine[33832]:     info: master_color: MS_POSTGRESQL: Promoted 0 instances of a possible 1 to master
> Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start recurring monitor (10s) for VIP73 on clustera
> Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start recurring monitor (60s) for POSTGRESQL:0 on clustera
> Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start recurring monitor (60s) for POSTGRESQL:0 on clustera
> Jun 25 05:19:52 clustera pengine[33832]:     info: RecurringOp:  Start recurring monitor (5s) for ZK on clustera
> Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start   VIP73#011(clustera)
> Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start   POSTGRESQL:0#011(clustera)
> Jun 25 05:19:52 clustera pengine[33832]:     info: LogActions: Leave   POSTGRESQL:1#011(Stopped)
> Jun 25 05:19:52 clustera pengine[33832]:   notice: LogActions: Start   ZK#011(clustera)
> 
> ...
> Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command: Initiating action 3: probe_complete probe_complete on clustera (local) - no waiting
> Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command: Initiating action 7: start VIP73_start_0 on clustera (local)
> Jun 25 05:19:52 clustera attrd[33831]:     info: find_hash_entry: Creating hash entry for probe_complete
> Jun 25 05:19:52 clustera attrd[33831]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:5: start
> Jun 25 05:19:52 clustera attrd[33831]:   notice: attrd_perform_update: Sent update 4: probe_complete=true
> Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command: Initiating action 9: start POSTGRESQL:0_start_0 on clustera (local)
> Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:POSTGRESQL:0:6: start
> Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip -f inet addr add 192.168.73.222/24 brd 192.168.73.255 dev eth1 label eth1:VIP73
> Jun 25 05:19:52 clustera avahi-daemon[2065]: Registering new address record for 192.168.73.222 on eth1.IPv4.
> Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip link set eth1 up
> Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp-192.168.73.222 eth1 192.168.73.222 auto not_used not_used
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: POSTGRESQL:0: Starting
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: ACTION=start
> Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:start process 33953 exited with return code 0.
> Jun 25 05:19:52 clustera crmd[33833]:     info: process_lrm_event: LRM operation VIP73_start_0 (call=5, rc=0, cib-update=29, confirmed=true) ok
> Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command: Initiating action 8: monitor VIP73_monitor_10000 on clustera (local)
> Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:7: monitor
> Jun 25 05:19:52 clustera crmd[33833]:     info: te_rsc_command: Initiating action 35: start ZK_start_0 on clustera (local)
> Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:ZK:8: start
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RA_VERSION_MAJOR=1
> Jun 25 05:19:52 clustera zookeeper(ZK)[34058]: INFO: [ZK] No pid file found
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RA_VERSION_MINOR=0
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_clone=0
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_clone_max=2
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_clone_node_max=1
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_globally_unique=false
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_master_max=1
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_master_node_max=1
> Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:monitor process 34057 exited with return code 0.
> Jun 25 05:19:52 clustera crmd[33833]:     info: process_lrm_event: LRM operation VIP73_monitor_10000 (call=7, rc=0, cib-update=30, confirmed=false) ok
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_name=start
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify=true
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_active_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_active_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_demote_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_demote_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_inactive_resource=POSTGRESQL:0
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: POSTGRESQL:1
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_master_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_master_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_promote_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_promote_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_slave_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_slave_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_start_resource=POSTGRESQL:0
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_start_uname=clustera
> Jun 25 05:19:52 clustera lrmd: [33830]: info: RA output: (ZK:start:stdout) Starting zookeeper ... 
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_stop_resource=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_notify_stop_uname=
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_CRM_meta_timeout=90000
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_crm_feature_set=3.0.6
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_pgctl=/usr/pgsql-9.2/bin/pg_ctl
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_pgdata=/opt/custom/pgdata
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESKEY_repmgr_conf=/var/lib/pgsql/repmgr/repmgr.conf
> Jun 25 05:19:52 clustera avahi-daemon[2065]: Invalid legacy unicast query packet.
> Jun 25 05:19:52 clustera avahi-daemon[2065]: Received response from host 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESOURCE_INSTANCE=POSTGRESQL:0
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESOURCE_PROVIDER=custom
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_RESOURCE_TYPE=postgresql
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: OCF_ROOT=/usr/lib/ocf
> Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: Run as postgres: /usr/pgsql-9.2/bin/pg_ctl start -w -D /opt/custom/pgdata -l /var/log/custom/custom/ha/postgresql/postgres_ha.log -o '-c config_file=/opt/custom/pgdata/postgresql.conf' -o '-p 5432'
> Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast query packet.
> Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast query packet.
> Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: (ZK:start:stdout) STARTED
> Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: (ZK:start:stdout) ---------------------------------------------------------#012Starting ZK restore#012---------------------------------------------------------
> Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: (ZK:start:stderr) ls: cannot access /var/custom/custom/ha/snapshots/zk_dump_*.xml: No such file or directory
> Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: (ZK:start:stdout) [ZK]   Last snapshot :           <NONE>#012[ZK]   WARNING: Nothing to restore
> Jun 25 05:19:53 clustera zookeeper(ZK)[34058]: INFO: [ZK] Nothing to restore
> Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: (ZK:start:stdout) ---------------------------------------------------------
> Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:start process 34058 exited with return code 0.
> Jun 25 05:19:53 clustera crmd[33833]:     info: process_lrm_event: LRM operation ZK_start_0 (call=8, rc=0, cib-update=31, confirmed=true) ok
> Jun 25 05:19:53 clustera crmd[33833]:     info: te_rsc_command: Initiating action 36: monitor ZK_monitor_5000 on clustera (local)
> Jun 25 05:19:53 clustera lrmd: [33830]: info: rsc:ZK:9: monitor
> Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:monitor process 34390 exited with return code 0.
> Jun 25 05:19:53 clustera crmd[33833]:     info: process_lrm_event: LRM operation ZK_monitor_5000 (call=9, rc=0, cib-update=32, confirmed=false) ok
> Jun 25 05:19:53 clustera postgresql(POSTGRESQL:0)[33954]: INFO: waiting for server to start.... done server started
> Jun 25 05:19:54 clustera avahi-daemon[2065]: Received response from host 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> Jun 25 05:19:56 clustera avahi-daemon[2065]: Received response from host 192.168.72.1 with invalid source port 50070 on interface 'eth0.0'
> Jun 25 05:19:56 clustera lrmd: [33830]: info: RA output: (VIP73:start:stderr) ARPING 192.168.73.222 from 192.168.73.222 eth1#012Sent 5 probes (5 broadcast(s))#
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140627/0c36ef3d/attachment-0003.sig>