[ClusterLabs] controlling cluster behavior on startup

Mon Jan 29 18:49:20 EST 2024

On Mon, 2024-01-29 at 22:48 +0000, Faaland, Olaf P. wrote:
> Thank you, Ken.
> 
> I changed my configuration management system to put an initial
> cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> values I was setting via pcs commands, including dc-deadtime.  I
> removed those "pcs property set" commands from the ones that are run
> at startup time.
> 
> That worked in the sense that after Pacemaker start, the node waits
> my newly specified dc-deadtime of 300s before giving up on the
> partner node and fencing it, if the partner never appears as a
> member.
> 
> However, now it seems to wait that amount of time before it elects a
> DC, even when quorum is acquired earlier.  In my log snippet below,
> with dc-deadtime 300s,

The dc-deadtime is not waiting for quorum, but for another DC to show
up. If all nodes show up, it can proceed, but otherwise it has to wait.

> 
> 14:14:24 Pacemaker starts on gopher12
> 14:17:04 quorum is acquired
> 14:19:26 Election Trigger just popped (start time + dc-deadtime
> seconds)
> 14:19:26 gopher12 wins the election
> 
> Is there other configuration that needs to be present in the cib at
> startup time?
> 
> thanks,
> Olaf
> 
> === log extract using new system of installing partial cib.xml before
> startup
> Jan 29 14:14:24 gopher12 pacemakerd          [123690]
> (main)    notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> features:agent-manpages ascii-docs compat-2.0 corosync-ge-2 default-
> concurrent-fencing generated-manpages monotonic nagios ncurses remote
> systemd
> Jan 29 14:14:25 gopher12 pacemaker-attrd     [123695]
> (attrd_start_election_if_needed)  info: Starting an election to
> determine the writer
> Jan 29 14:14:25 gopher12 pacemaker-attrd     [123695]
> (election_check)  info: election-attrd won by local node
> Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)    info: Cluster node gopher12 is now member
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (crm_timer_popped)        info: Election Trigger just popped |
> input=I_DC_TIMEOUT time=300000ms
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_log)  warning: Input I_DC_TIMEOUT received in state S_PENDING
> from crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_state_transition)     info: State transition S_PENDING ->
> S_ELECTION | input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> origin=crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (election_check)  info: election-DC won by local node
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> (do_state_transition)     notice: State transition S_ELECTION ->
> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL
> origin=election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active)         info: Start 10s-interval monitor
> for gopher11_zpool on gopher11
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active)         info: Start 10s-interval monitor
> for gopher12_zpool on gopher12
> 
> 
> === initial cib.xml contents
> <cib crm_feature_set="3.19.0" validate-with="pacemaker-3.9" epoch="9"
> num_updates="0" admin_epoch="0" cib-last-written="Mon Jan 29 11:07:06
> 2024" update-origin="gopher12" update-client="root" update-
> user="root" have-quorum="0" dc-uuid="2">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-stonith-action"
> name="stonith-action" value="off"/>
>         <nvpair id="cib-bootstrap-options-have-watchdog" name="have-
> watchdog" value="false"/>
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-
> version" value="2.1.7-1.t4-2.1.7"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="corosync"/>
>         <nvpair id="cib-bootstrap-options-cluster-name"
> name="cluster-name" value="gopher11"/>
>         <nvpair id="cib-bootstrap-options-cluster-recheck-inte"
> name="cluster-recheck-interval" value="60"/>
>         <nvpair id="cib-bootstrap-options-start-failure-is-fat"
> name="start-failure-is-fatal" value="false"/>
>         <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-
> deadtime" value="300"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="1" uname="gopher11"/>
>       <node id="2" uname="gopher12"/>
>     </nodes>
>     <resources/>
>     <constraints/>
>   </configuration>
> </cib>
> 
> ________________________________________
> From: Ken Gaillot <kgaillot at redhat.com>
> Sent: Monday, January 29, 2024 10:51 AM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Cc: Faaland, Olaf P.
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> On Mon, 2024-01-29 at 18:05 +0000, Faaland, Olaf P. via Users wrote:
> > Hi,
> > 
> > I have configured clusters of node pairs, so each cluster has 2
> > nodes.  The cluster members are statically defined in corosync.conf
> > before corosync or pacemaker is started, and quorum {two_node: 1}
> > is
> > set.
> > 
> > When both nodes are powered off and I power them on, they do not
> > start pacemaker at exactly the same time.  The time difference may
> > be
> > a few minutes depending on other factors outside the nodes.
> > 
> > My goals are (I call the first node to start pacemaker "node1"):
> > 1) I want to control how long pacemaker on node1 waits before
> > fencing
> > node2 if node2 does not start pacemaker.
> > 2) If node1 is part-way through that waiting period, and node2
> > starts
> > pacemaker so they detect each other, I would like them to proceed
> > immediately to probing resource state and starting resources which
> > are down, not wait until the end of that "grace period".
> > 
> > It looks from the documentation like dc-deadtime is how #1 is
> > controlled, and #2 is expected normal behavior.  However, I'm
> > seeing
> > fence actions before dc-deadtime has passed.
> > 
> > Am I misunderstanding Pacemaker's expected behavior and/or how dc-
> > deadtime should be used?
> 
> You have everything right. The problem is that you're starting with
> an
> empty configuration every time, so the default dc-deadtime is being
> used for the first election (before you can set the desired value).
> 
> I can't think of anything you can do to get around that, since the
> controller starts the timer as soon as it starts up. Would it be
> possible to bake an initial configuration into the PXE image?
> 
> When the timer value changes, we could stop the existing timer and
> restart it. There's a risk that some external automation could make
> repeated changes to the timeout, thus never letting it expire, but
> that
> seems preferable to your problem. I've created an issue for that:
> 
>   
> https://urldefense.us/v3/__https://projects.clusterlabs.org/T764__;!!G2kpM7uM-TzIFchu!0LU3msm_lT0kftiloTf7Qo4NM7JdSzgjqRk4ViRx8L8DbWSwdnp07tzNUVbSB7uaLL5DHsvPBb0d3U93x6U$
> 
> BTW there's also election-timeout. I'm not sure offhand how that
> interacts; it might be necessary to raise that one as well.
> 
> > One possibly unusual aspect of this cluster is that these two nodes
> > are stateless - they PXE boot from an image on another server - and
> > I
> > build the cluster configuration at boot time with a series of pcs
> > commands, because the nodes have no local storage for this
> > purpose.  The commands are:
> > 
> > ['pcs', 'cluster', 'start']
> > ['pcs', 'property', 'set', 'stonith-action=off']
> > ['pcs', 'property', 'set', 'cluster-recheck-interval=60']
> > ['pcs', 'property', 'set', 'start-failure-is-fatal=false']
> > ['pcs', 'property', 'set', 'dc-deadtime=300']
> > ['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman',
> > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > 'pcmk_host_list=gopher11,gopher12']
> > ['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman',
> > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > 'pcmk_host_list=gopher11,gopher12']
> > ['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool',
> > 'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11',
> > 'op',
> > 'start', 'timeout=805']
> > ...
> > ['pcs', 'property', 'set', 'no-quorum-policy=ignore']
> 
> BTW you don't need to change no-quorum-policy when you're using
> two_node with Corosync.
> 
> > I could, instead, generate a CIB so that when Pacemaker is started,
> > it has a full config.  Is that better?
> > 
> > thanks,
> > Olaf
> > 
> > === corosync.conf:
> > totem {
> >     version: 2
> >     cluster_name: gopher11
> >     secauth: off
> >     transport: udpu
> > }
> > nodelist {
> >     node {
> >         ring0_addr: gopher11
> >         name: gopher11
> >         nodeid: 1
> >     }
> >     node {
> >         ring0_addr: gopher12
> >         name: gopher12
> >         nodeid: 2
> >     }
> > }
> > quorum {
> >     provider: corosync_votequorum
> >     two_node: 1
> > }
> > 
> > === Log excerpt
> > 
> > Here's an except from Pacemaker logs that reflect what I'm
> > seeing.  These are from gopher12, the node that came up first.  The
> > other node, which is not yet up, is gopher11.
> > 
> > Jan 25 17:55:38 gopher12 pacemakerd          [116033]
> > (main)    notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2
> > default-
> > concurrent-fencing generated-manpages monotonic nagios ncurses
> > remote
> > systemd
> > Jan 25 17:55:39 gopher12 pacemaker-controld  [116040]
> > (peer_update_callback)    info: Cluster node gopher12 is now member
> > (was in unknown state)
> > Jan 25 17:55:43 gopher12 pacemaker-based     [116035]
> > (cib_perform_op)  info: ++
> > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options']:  <nvpair id="cib-bootstrap-options-dc-
> > deadtime"
> > name="dc-deadtime" value="300"/>
> > Jan 25 17:56:00 gopher12 pacemaker-controld  [116040]
> > (crm_timer_popped)        info: Election Trigger just popped |
> > input=I_DC_TIMEOUT time=300000ms
> > Jan 25 17:56:01 gopher12 pacemaker-based     [116035]
> > (cib_perform_op)  info: ++
> > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options']:  <nvpair id="cib-bootstrap-options-no-quorum-
> > policy" name="no-quorum-policy" value="ignore"/>
> > Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> > (abort_transition_graph)  info: Transition 0 aborted by cib-
> > bootstrap-options-no-quorum-policy doing create no-quorum-
> > policy=ignore: Configuration change | cib=0.26.0
> > source=te_update_diff_v2:464
> > path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options'] complete=true
> > Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> > (controld_execute_fence_action)   notice: Requesting fencing (off)
> > targeting node gopher11 | action=11 timeout=60
> > 
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://urldefense.us/v3/__https://lists.clusterlabs.org/mailman/listinfo/users__;!!G2kpM7uM-TzIFchu!0LU3msm_lT0kftiloTf7Qo4NM7JdSzgjqRk4ViRx8L8DbWSwdnp07tzNUVbSB7uaLL5DHsvPBb0dplCjBr8$
> > 
> > ClusterLabs home: 
> > https://urldefense.us/v3/__https://www.clusterlabs.org/__;!!G2kpM7uM-TzIFchu!0LU3msm_lT0kftiloTf7Qo4NM7JdSzgjqRk4ViRx8L8DbWSwdnp07tzNUVbSB7uaLL5DHsvPBb0dU0gWW04$
> > 
> --
> Ken Gaillot <kgaillot at redhat.com>
> 
-- 
Ken Gaillot <kgaillot at redhat.com>