<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz <<a href="mailto:janusz.jaskiewicz@gmail.com">janusz.jaskiewicz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello.<br><br>Please forgive the length of this email but I wanted to provide as much details as possible.<br><br>I'm trying to set up a cluster of two nodes for my service.<br>I have a problem with a scenario where the network between two nodes gets broken and they can no longer see each other.<br>This causes split-brain.<br>I know that proper way of implementing this would be to employ STONITH, but it is not feasible for me now (I don't have necessary hardware support and I don't want to introduce another point of failure by introducing shared storage based STONITH).<br><br>In order to work-around the split-brain scenario I introduced pingd to my cluster, which in theory should do what I expect.<br>pingd pings a network device, so when the NIC is broken on one of my nodes, this node should not run the resources because pingd would fail for it.<br></div></blockquote><div>As we've discussed on this list in multiple previous threads already there are lots of failure scenarios</div><div>where cluster-nodes don't see each other but both can ping something else on the network.</div><div>Important cases where your approach wouldn't work are as well those where nodes are just</div><div>partially alive - leads to corosync membership being lost & node not able to stop resources</div><div>properly anymore.</div><div>Thus it is highly recommended to have all these setups that rely on some kind of self-fencing or</div><div>bringing down of resources within some timeout being guarded by a (hardware)-watchdog.</div><div>Previously you probably were referring to SBD which implements such a</div><div>watchdog-guarded approach. As you've probably figured out you can't directly use SBD</div><div>in a 2-node-setup without a shared-disk. Pure watchdog-fencing needs quorum decision</div><div>made by at least 3 instances. If you don't want a full blown 3rd node you can consider</div><div>qdevice - can be used by multiple 2-node-clusters for quorum evaluation.</div><div>Otherwise you can use SBD with a shared disk.</div><div>You are right that both, a shared disk and any kind of 3rd node are an additional point of</div><div>failure. Important is that in both cases we are talking about a point of failure but not of a</div><div>single point of failure - meaning it failing it would not necessarily impose services to be</div><div>shutdown.</div><div><br></div><div>Klaus</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>pingd resource is configured to update the value of variable 'pingd' (interval: 5s, dampen: 3s, multiplier:1000).<br>Based on the value of pingd I have a location constraint which sets score to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000.<br>All other resources are colocated with DimProdClusterIP, and DimProdClusterIP should start before all other resources.<br><br>Based on that setup I would expect that when the resources run on dimprod01 and I disconnect dimprod02 from the network, the resources will not start on dimprod02.<br>Unfortunately I see that after a token interval + consensus interval my resources are brought up for a moment and then go down again.<br>This is undesirable, as it causes DRBD split-brain inconsistency and cluster IP may also be taken over by the node which is down.<br><br>I tried to debug it, but I can't figure out why it doesn't work.<br>I would appreciate any help/pointers.<br><br><br>Following are some details of my setup and snippet of pacemaker logs with comments:<br><br>Setup details:<br><br>pcs status:<br>Cluster name: dimprodcluster<br>Cluster Summary:<br>  * Stack: corosync<br>  * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum<br>  * Last updated: Tue Aug  3 08:20:32 2021<br>  * Last change:  Mon Aug  2 18:24:39 2021 by root via cibadmin on dimprod01<br>  * 2 nodes configured<br>  * 8 resource instances configured<br><br>Node List:<br>  * Online: [ dimprod01 dimprod02 ]<br><br>Full List of Resources:<br>  * DimProdClusterIP   (ocf::heartbeat:IPaddr2):        Started dimprod01<br>  * WyrDimProdServer       (systemd:wyr-dim):       Started dimprod01<br>  * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable):<br>    * Masters: [ dimprod01 ]<br>    * Slaves: [ dimprod02 ]<br>  * WyrDimProdFS   (ocf::heartbeat:Filesystem):     Started dimprod01<br>  * DimTestClusterIP       (ocf::heartbeat:IPaddr2):        Started dimprod01<br>  * Clone Set: ping-clone [ping]:<br>    * Started: [ dimprod01 dimprod02 ]<br><br>Daemon Status:<br>  corosync: active/enabled<br>  pacemaker: active/enabled<br>  pcsd: active/enabled<br>  <br><br>pcs constraint<br>Location Constraints:<br>  Resource: DimProdClusterIP<br>    Constraint: location-DimProdClusterIP<br>      Rule: score=-INFINITY<br>        Expression: pingd ne 1000<br>Ordering Constraints:<br>  start DimProdClusterIP then promote WyrDimProdServerData-clone (kind:Mandatory)<br>  promote WyrDimProdServerData-clone then start WyrDimProdFS (kind:Mandatory)<br>  start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)<br>  start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)<br>Colocation Constraints:<br>  WyrDimProdServer with DimProdClusterIP (score:INFINITY)<br>  DimTestClusterIP with DimProdClusterIP (score:INFINITY)<br>  WyrDimProdServerData-clone with DimProdClusterIP (score:INFINITY) (with-rsc-role:Master)<br>  WyrDimProdFS with DimProdClusterIP (score:INFINITY)<br>Ticket Constraints:<br><br><br>pcs resource config ping<br> Resource: ping (class=ocf provider=pacemaker type=ping)<br>  Attributes: dampen=3s host_list=193.30.22.33 multiplier=1000<br>  Operations: monitor interval=5s timeout=4s (ping-monitor-interval-5s)<br>              start interval=0s timeout=60s (ping-start-interval-0s)<br>              stop interval=0s timeout=5s (ping-stop-interval-0s)<br>              <br>              <br>              <br>cat /etc/corosync/corosync.conf<br>totem {<br>    version: 2<br>    cluster_name: dimprodcluster<br>    transport: knet<br>    crypto_cipher: aes256<br>    crypto_hash: sha256<br>    token: 10000<br>    interface {<br>        knet_ping_interval: 1000<br>        knet_ping_timeout: 1000<br>    }<br>}<br><br>nodelist {<br>    node {<br>        ring0_addr: dimprod01<br>        name: dimprod01<br>        nodeid: 1<br>    }<br><br>    node {<br>        ring0_addr: dimprod02<br>        name: dimprod02<br>        nodeid: 2<br>    }<br>}<br><br>quorum {<br>    provider: corosync_votequorum<br>    two_node: 1<br>}<br><br>logging {<br>    to_logfile: yes<br>    logfile: /var/log/cluster/corosync.log<br>    to_syslog: yes<br>    timestamp: on<br>    debug:on<br>}<br><br><br><br>Logs:<br>When the network is connected 'pingd' takes value of 1000:<br><br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_client_update)   debug: Broadcasting pingd[dimprod02]=1000 (writer)<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3369856] (pcmk__node_attr_request)   debug: Asked pacemaker-attrd to update pingd=1000 for dimprod02: OK (0)<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3369856] (crm_xml_cleanup)      info: Cleaning up memory from libxml2<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3369856] (crm_exit)       info: Exiting attrd_updater | with status 0<br><br>When the network is down we update 'pingd' to 0:<br><br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_client_update)         debug: Broadcasting pingd[dimprod02]=0 (writer)<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370053] (pcmk__node_attr_request)      debug: Asked pacemaker-attrd to update pingd=0 for dimprod02: OK (0)<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370053] (crm_xml_cleanup)         info: Cleaning up memory from libxml2<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370053] (crm_exit)       info: Exiting attrd_updater | with status 0<br><br>And again:<br><br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_client_update)       debug: Broadcasting pingd[dimprod02]=0 (writer)<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370109] (pcmk__node_attr_request)      debug: Asked pacemaker-attrd to update pingd=0 for dimprod02: OK (0)<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370109] (crm_xml_cleanup)         info: Cleaning up memory from libxml2<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater       [3370109] (crm_exit)       info: Exiting attrd_updater | with status 0<br><br>Then the node realizes it is not connected to the other node:<br><br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (pcmk_cpg_membership)    info: Group attrd event 8: dimprod01 (node 1 pid 2118843) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (pcmk_cpg_membership)   info: Group cib event 8: dimprod01 (node 1 pid 2118840) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (crm_update_peer_proc)    info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (crm_update_peer_proc)       info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk_cpg_membership)         info: Group crmd event 8: dimprod01 (node 1 pid 2118845) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (pcmk_cpg_membership)    info: Group stonith-ng event 8: dimprod01 (node 1 pid 2118841) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (crm_update_peer_proc)      info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (crm_update_peer_state_iter)         notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (crm_update_peer_proc)  info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (peer_update_callback)        info: Node dimprod01 is no longer a peer | DC=true old=0x4000000 new=0x0000000<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)      notice: Removing all dimprod01 attributes for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (crm_update_peer_state_iter)    notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)     debug: Removed #attrd-protocol[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (crm_reap_dead_member)  info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)      debug: Removed master-WyrDimProdServerData[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (crm_update_peer_state_iter)        notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (reap_crm_member)       notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)    debug: Removed last-failure-WyrDimProdFS#start_0[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (controld_delete_node_state)   info: Deleting transient attributes for node dimprod01 (via CIB call 466) | xpath=//node_state[@uname='dimprod01']/transient_attributes<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (pcmk_cpg_membership)   info: Group cib event 8: dimprod02 (node 2 pid 2827043) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)   debug: Removed fail-count-WyrDimProdFS#start_0[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (st_peer_update_callback)       debug: Broadcasting our uname because of node 1<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_remove)     debug: Removed pingd[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (crm_reap_dead_member)    info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (crm_reap_dead_member)   info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (reap_crm_member)        notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (pcmk_cpg_membership)  info: Group attrd event 8: dimprod02 (node 2 pid 2827046) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (match_down_event)   debug: No reason to expect node 1 to be down<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (reap_crm_member)  notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced    [2827044] (pcmk_cpg_membership)  info: Group stonith-ng event 8: dimprod02 (node 2 pid 2827044) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (peer_update_callback)  warning: Stonith/shutdown of node dimprod01 was not expected<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (abort_transition_graph)    info: Transition 99 aborted: Node failure | source=peer_update_callback:280 complete=true<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (cib_process_request)         info: Forwarding cib_delete operation for section //node_state[@uname='dimprod01']/transient_attributes to all (origin=local/crmd/466)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk_cpg_membership)     info: Group crmd event 8: dimprod02 (node 2 pid 2827048) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__set_flags_as)  debug: FSA action flags 0x2000000000000 (new_actions) for controller set by s_crmd_fsa:198<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (s_crmd_fsa)  debug: Processing I_PE_CALC: [ state=S_IDLE cause=C_FSA_INTERNAL origin=abort_transition_graph ]<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based     [2827043] (cib_process_request)  info: Forwarding cib_modify operation for section status to all (origin=local/crmd/467)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (do_state_transition)    notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__set_flags_as)  debug: FSA action flags 0x00000020 (A_INTEGRATE_TIMER_STOP) for controller set by do_state_transition:559<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__set_flags_as)   debug: FSA action flags 0x00000080 (A_FINALIZE_TIMER_STOP) for controller set by do_state_transition:565<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__set_flags_as)    debug: FSA action flags 0x00000200 (A_DC_TIMER_STOP) for controller set by do_state_transition:569<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (check_join_counts)   debug: Sole active cluster node is fully joined<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__clear_flags_as)   debug: FSA action flags 0x00000200 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__clear_flags_as)        debug: FSA action flags 0x00000020 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__clear_flags_as)        debug: FSA action flags 0x00000080 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk__clear_flags_as)        debug: FSA action flags 0x2000000000000 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (do_pe_invoke)   debug: Query 468: Requesting the current CIB: S_POLICY_ENGINE<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk_quorum_notification)         info: Quorum retained | membership=1140 members=1<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd     [2827046] (attrd_peer_update)   notice: Setting pingd[dimprod02]: 1000 -> 0 | from dimprod02<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (pcmk_quorum_notification)       debug: Member[0] 2 <br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (crm_update_peer_state_iter)         notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_reap_unseen_nodes<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (peer_update_callback)  info: Cluster node dimprod01 is now lost (was member)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (match_down_event)         debug: No reason to expect node 1 to be down<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld  [2827048] (peer_update_callback)      warning: Stonith/shutdown of node dimprod01 was not expected<br><br>And then the node decides to allocate the resources to this node and start it, ignoring the location constraint<br><br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)         debug: STONITH timeout: 60000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)     debug: STONITH of failed nodes is disabled<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)        debug: Concurrent fencing is enabled<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)      debug: Stop all active resources: false<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)   debug: Cluster is symmetric - resources can run anywhere by default<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)       debug: On loss of quorum: Stop ALL resources<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config)      debug: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (determine_online_status)        info: Node dimprod02 is online<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource)     debug: Internally renamed WyrDimProdServerData on dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource)      debug: Internally renamed ping on dimprod02 to ping:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)     info: DimProdClusterIP  (ocf::heartbeat:IPaddr2):        Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)  info: WyrDimProdServer  (systemd:wyr-dim):       Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)  info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Slaves: [ dimprod02 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)    info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Stopped: [ dimprod01 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)   info: WyrDimProdFS      (ocf::heartbeat:Filesystem):     Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)  info: DimTestClusterIP  (ocf::heartbeat:IPaddr2):        Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)  info: Clone Set: ping-clone [ping]: Started: [ dimprod02 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)        info: Clone Set: ping-clone [ping]: Stopped: [ dimprod01 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)      debug: Resource WyrDimProdServerData:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)        debug: Resource ping:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)     debug: Assigning dimprod02 to DimProdClusterIP<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)       debug: Assigning dimprod02 to WyrDimProdServer<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)      debug: Allocating up to 2 WyrDimProdServerData-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: Assigning dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: All nodes for resource WyrDimProdServerData:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)      debug: Could not allocate a node for WyrDimProdServerData:1<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)       info: Resource WyrDimProdServerData:1 cannot run anywhere<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)   debug: Allocated 1 WyrDimProdServerData-clone instances of a possible 2<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)        debug: WyrDimProdServerData:0 promotion score: 1000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)    info: Promoting WyrDimProdServerData:0 (Slave dimprod02)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)       debug: WyrDimProdServerData:1 promotion score: 1000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)    info: WyrDimProdServerData-clone: Promoted 1 instances of a possible 1 to master<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)     debug: Assigning dimprod02 to WyrDimProdFS<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)   debug: Assigning dimprod02 to DimTestClusterIP<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)      debug: Allocating up to 2 ping-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: Assigning dimprod02 to ping:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: All nodes for resource ping:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)      debug: Could not allocate a node for ping:1<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)       info: Resource ping:1 cannot run anywhere<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)   debug: Allocated 1 ping-clone instances of a possible 2<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)     info:  Start recurring monitor (30s) for DimProdClusterIP on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)     info:  Start recurring monitor (60s) for WyrDimProdServer on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)     info: Cancelling action WyrDimProdServerData:0_monitor_60000 (Slave vs. Master)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (create_promotable_actions)       debug: Creating actions for WyrDimProdServerData-clone<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)      info: Cancelling action WyrDimProdServerData:0_monitor_60000 (Slave vs. Master)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)     info:  Start recurring monitor (20s) for WyrDimProdFS on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp)         info:  Start recurring monitor (30s) for DimTestClusterIP on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)       notice:  * Start      DimProdClusterIP           (                 dimprod02 )  <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)    notice:  * Start      WyrDimProdServer           (                 dimprod02 )  <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)    notice:  * Promote    WyrDimProdServerData:0     ( Slave -> Master dimprod02 )  <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)    info: Leave   WyrDimProdServerData:1   (Stopped)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)     notice:  * Start      WyrDimProdFS               (                 dimprod02 )  <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)  notice:  * Start      DimTestClusterIP           (                 dimprod02 )  <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)   info: Leave   ping:0   (Started dimprod02)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)  info: Leave   ping:1   (Stopped)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml)    debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml)        debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml)        debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml)        debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br><br>When the resources are started, the node decides that they can not be allocated and closes them:<br><br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (determine_online_status)     info: Node dimprod02 is online<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource)     debug: Internally renamed WyrDimProdServerData on dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource)      debug: Internally renamed ping on dimprod02 to ping:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)     info: DimProdClusterIP  (ocf::heartbeat:IPaddr2):        Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)        info: WyrDimProdServer  (systemd:wyr-dim):       Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)        info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Masters: [ dimprod02 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)   info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Stopped: [ dimprod01 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)   info: WyrDimProdFS      (ocf::heartbeat:Filesystem):     Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)        info: DimTestClusterIP  (ocf::heartbeat:IPaddr2):        Stopped<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)  info: Clone Set: ping-clone [ping]: Started: [ dimprod02 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item)        info: Clone Set: ping-clone [ping]: Stopped: [ dimprod01 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)      debug: Resource DimProdClusterIP: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)      debug: Resource WyrDimProdServer: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)      debug: Resource WyrDimProdServerData:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)        debug: Resource WyrDimProdFS: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness)  debug: Resource ping:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights)     info: DimProdClusterIP: Rolling back optional scores from WyrDimProdServerData-clone<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights)         info: DimProdClusterIP: Rolling back optional scores from WyrDimProdFS<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights)       info: DimProdClusterIP: Rolling back optional scores from WyrDimProdServer<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights)   info: DimProdClusterIP: Rolling back optional scores from DimTestClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)   debug: All nodes for resource DimProdClusterIP are unavailable, unclean or shutting down (dimprod02: 1, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)    debug: Could not allocate a node for DimProdClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)        debug: Processing DimProdClusterIP_monitor_30000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)  info: Resource DimProdClusterIP cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)  debug: All nodes for resource WyrDimProdServer are unavailable, unclean or shutting down (dimprod02: 1, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)    debug: Could not allocate a node for WyrDimProdServer<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)     info: Resource WyrDimProdServer cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)         debug: Allocating up to 2 WyrDimProdServerData-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: Assigning dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: All nodes for resource WyrDimProdServerData:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)      debug: Could not allocate a node for WyrDimProdServerData:1<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)       info: Resource WyrDimProdServerData:1 cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)   debug: Allocated 1 WyrDimProdServerData-clone instances of a possible 2<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (filter_colocation_constraint)    error: WyrDimProdServerData:0 must be colocated with DimProdClusterIP but is not (dimprod02 vs. unallocated)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)   debug: WyrDimProdServerData:0 promotion score: 10000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)   info: Promoting WyrDimProdServerData:0 (Master dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)      debug: WyrDimProdServerData:1 promotion score: 10000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles)   info: WyrDimProdServerData-clone: Promoted 1 instances of a possible 1 to master<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)     debug: All nodes for resource WyrDimProdFS are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)        debug: Could not allocate a node for WyrDimProdFS<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)    debug: Processing WyrDimProdFS_monitor_20000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)      info: Resource WyrDimProdFS cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)      debug: All nodes for resource DimTestClusterIP are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)    debug: Could not allocate a node for DimTestClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)     info: Resource DimTestClusterIP cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)         debug: Allocating up to 2 ping-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: Assigning dimprod02 to ping:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)         debug: All nodes for resource ping:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node)      debug: Could not allocate a node for ping:1<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate)       info: Resource ping:1 cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children)   debug: Allocated 1 ping-clone instances of a possible 2<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (create_promotable_actions)       debug: Creating actions for WyrDimProdServerData-clone<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)        notice:  * Stop       DimProdClusterIP           (                 dimprod02 )   due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)   notice:  * Stop       WyrDimProdServer           (                 dimprod02 )   due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)  info: Leave   WyrDimProdServerData:0   (Master dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)   info: Leave   WyrDimProdServerData:1   (Stopped)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction)     notice:  * Stop       WyrDimProdFS               (                 dimprod02 )   due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)        info: Leave   DimTestClusterIP (Stopped)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)    info: Leave   ping:0   (Started dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions)  info: Leave   ping:1   (Stopped)<br><br>So the final result is OK, I would just like to avoid the start of the resources on disconnected node.<br>I have no idea how I could further debug it.<br>I will appreciate any help.<br>If that's helpful I can provide full debug log and more configuration details.<br><br>Regards,<br>Janusz.<br></div>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

</blockquote></div></div>