[ClusterLabs] controlling cluster behavior on startup
Klaus Wenninger
kwenning at redhat.com
Tue Jan 30 10:21:01 EST 2024
On Tue, Jan 30, 2024 at 2:21 PM Walker, Chris <christopher.walker at hpe.com>
wrote:
> >>> However, now it seems to wait that amount of time before it elects a
> >>> DC, even when quorum is acquired earlier. In my log snippet below,
> >>> with dc-deadtime 300s,
> >>
> >> The dc-deadtime is not waiting for quorum, but for another DC to show
> >> up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> > I believe all the nodes showed up by 14:17:04, but it still waited until
> 14:19:26 to elect a DC:
>
> > Jan 29 14:14:25 gopher12 pacemaker-controld [123697]
> (peer_update_callback) info: Cluster node gopher12 is now membe (was in
> unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> (peer_update_callback) info: Cluster node gopher11 is now membe (was in
> unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
>
> > This is a cluster with 2 nodes, gopher11 and gopher12.
>
> This is our experience with dc-deadtime too: even if both nodes in the
> cluster show up, dc-deadtime must elapse before the cluster starts. This
> was discussed on this list a while back (
> https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and an
> RFE came out of it (https://bugs.clusterlabs.org/show_bug.cgi?id=5310).
>
>
>
> I’ve worked around this by having an ExecStartPre directive for Corosync
> that does essentially:
>
>
>
> while ! systemctl -H ${peer} is-active corosync; do sleep 5; done
>
>
>
> With this in place, the nodes wait for each other before starting Corosync
> and Pacemaker. We can then use the default 20s dc-deadtime so that the DC
> election happens quickly once both nodes are up.
>
Actually wait-for-all coming per default with 2-node should lead to quorum
being delayed till both nodes showed up.
And if we make the cluster not ignore quorum it shouldn't start fencing
before it sees the peer - right?
Running a 2-node-cluster ignoring quorum or without wait-for-all is a
delicate thing anyway I would say
and shouldn't work in a generic case. Not saying it is an issue here -
guess there just isn't enough
info about the cluster to say.
So you shouldn't need this raised dc-deadtime and thus wouldn't experience
large startup-delays.
Regards,
Klaus
> Thanks,
>
> Chris
>
>
>
> *From: *Users <users-bounces at clusterlabs.org> on behalf of Faaland, Olaf
> P. via Users <users at clusterlabs.org>
> *Date: *Monday, January 29, 2024 at 7:46 PM
> *To: *Ken Gaillot <kgaillot at redhat.com>, Cluster Labs - All topics
> related to open-source clustering welcomed <users at clusterlabs.org>
> *Cc: *Faaland, Olaf P. <faaland1 at llnl.gov>
> *Subject: *Re: [ClusterLabs] controlling cluster behavior on startup
>
> >> However, now it seems to wait that amount of time before it elects a
> >> DC, even when quorum is acquired earlier. In my log snippet below,
> >> with dc-deadtime 300s,
> >
> > The dc-deadtime is not waiting for quorum, but for another DC to show
> > up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> I believe all the nodes showed up by 14:17:04, but it still waited until
> 14:19:26 to elect a DC:
>
> Jan 29 14:14:25 gopher12 pacemaker-controld [123697]
> (peer_update_callback) info: Cluster node gopher12 is now membe (was in
> unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> (peer_update_callback) info: Cluster node gopher11 is now membe (was in
> unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
>
> This is a cluster with 2 nodes, gopher11 and gopher12.
>
> Am I misreading that?
>
> thanks,
> Olaf
>
> ________________________________________
> From: Ken Gaillot <kgaillot at redhat.com>
> Sent: Monday, January 29, 2024 3:49 PM
> To: Faaland, Olaf P.; Cluster Labs - All topics related to open-source
> clustering welcomed
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
>
> On Mon, 2024-01-29 at 22:48 +0000, Faaland, Olaf P. wrote:
> > Thank you, Ken.
> >
> > I changed my configuration management system to put an initial
> > cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> > values I was setting via pcs commands, including dc-deadtime. I
> > removed those "pcs property set" commands from the ones that are run
> > at startup time.
> >
> > That worked in the sense that after Pacemaker start, the node waits
> > my newly specified dc-deadtime of 300s before giving up on the
> > partner node and fencing it, if the partner never appears as a
> > member.
> >
> > However, now it seems to wait that amount of time before it elects a
> > DC, even when quorum is acquired earlier. In my log snippet below,
> > with dc-deadtime 300s,
>
> The dc-deadtime is not waiting for quorum, but for another DC to show
> up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> >
> > 14:14:24 Pacemaker starts on gopher12
> > 14:17:04 quorum is acquired
> > 14:19:26 Election Trigger just popped (start time + dc-deadtime
> > seconds)
> > 14:19:26 gopher12 wins the election
> >
> > Is there other configuration that needs to be present in the cib at
> > startup time?
> >
> > thanks,
> > Olaf
> >
> > === log extract using new system of installing partial cib.xml before
> > startup
> > Jan 29 14:14:24 gopher12 pacemakerd [123690]
> > (main) notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2 default-
> > concurrent-fencing generated-manpages monotonic nagios ncurses remote
> > systemd
> > Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> > (attrd_start_election_if_needed) info: Starting an election to
> > determine the writer
> > Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> > (election_check) info: election-attrd won by local node
> > Jan 29 14:14:25 gopher12 pacemaker-controld [123697]
> > (peer_update_callback) info: Cluster node gopher12 is now member
> > (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> > (quorum_notification_cb) notice: Quorum acquired | membership=54
> > members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> > (crm_timer_popped) info: Election Trigger just popped |
> > input=I_DC_TIMEOUT time=300000ms
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> > (do_log) warning: Input I_DC_TIMEOUT received in state S_PENDING
> > from crm_timer_popped
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> > (do_state_transition) info: State transition S_PENDING ->
> > S_ELECTION | input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> > origin=crm_timer_popped
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> > (election_check) info: election-DC won by local node
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info:
> > Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> > Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> > (do_state_transition) notice: State transition S_ELECTION ->
> > S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL
> > origin=election_win_cb
> > Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> > (recurring_op_for_active) info: Start 10s-interval monitor
> > for gopher11_zpool on gopher11
> > Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> > (recurring_op_for_active) info: Start 10s-interval monitor
> > for gopher12_zpool on gopher12
> >
> >
> > === initial cib.xml contents
> > <cib crm_feature_set="3.19.0" validate-with="pacemaker-3.9" epoch="9"
> > num_updates="0" admin_epoch="0" cib-last-written="Mon Jan 29 11:07:06
> > 2024" update-origin="gopher12" update-client="root" update-
> > user="root" have-quorum="0" dc-uuid="2">
> > <configuration>
> > <crm_config>
> > <cluster_property_set id="cib-bootstrap-options">
> > <nvpair id="cib-bootstrap-options-stonith-action"
> > name="stonith-action" value="off"/>
> > <nvpair id="cib-bootstrap-options-have-watchdog" name="have-
> > watchdog" value="false"/>
> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-
> > version" value="2.1.7-1.t4-2.1.7"/>
> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> > name="cluster-infrastructure" value="corosync"/>
> > <nvpair id="cib-bootstrap-options-cluster-name"
> > name="cluster-name" value="gopher11"/>
> > <nvpair id="cib-bootstrap-options-cluster-recheck-inte"
> > name="cluster-recheck-interval" value="60"/>
> > <nvpair id="cib-bootstrap-options-start-failure-is-fat"
> > name="start-failure-is-fatal" value="false"/>
> > <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-
> > deadtime" value="300"/>
> > </cluster_property_set>
> > </crm_config>
> > <nodes>
> > <node id="1" uname="gopher11"/>
> > <node id="2" uname="gopher12"/>
> > </nodes>
> > <resources/>
> > <constraints/>
> > </configuration>
> > </cib>
> >
> > ________________________________________
> > From: Ken Gaillot <kgaillot at redhat.com>
> > Sent: Monday, January 29, 2024 10:51 AM
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > Cc: Faaland, Olaf P.
> > Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> >
> > On Mon, 2024-01-29 at 18:05 +0000, Faaland, Olaf P. via Users wrote:
> > > Hi,
> > >
> > > I have configured clusters of node pairs, so each cluster has 2
> > > nodes. The cluster members are statically defined in corosync.conf
> > > before corosync or pacemaker is started, and quorum {two_node: 1}
> > > is
> > > set.
> > >
> > > When both nodes are powered off and I power them on, they do not
> > > start pacemaker at exactly the same time. The time difference may
> > > be
> > > a few minutes depending on other factors outside the nodes.
> > >
> > > My goals are (I call the first node to start pacemaker "node1"):
> > > 1) I want to control how long pacemaker on node1 waits before
> > > fencing
> > > node2 if node2 does not start pacemaker.
> > > 2) If node1 is part-way through that waiting period, and node2
> > > starts
> > > pacemaker so they detect each other, I would like them to proceed
> > > immediately to probing resource state and starting resources which
> > > are down, not wait until the end of that "grace period".
> > >
> > > It looks from the documentation like dc-deadtime is how #1 is
> > > controlled, and #2 is expected normal behavior. However, I'm
> > > seeing
> > > fence actions before dc-deadtime has passed.
> > >
> > > Am I misunderstanding Pacemaker's expected behavior and/or how dc-
> > > deadtime should be used?
> >
> > You have everything right. The problem is that you're starting with
> > an
> > empty configuration every time, so the default dc-deadtime is being
> > used for the first election (before you can set the desired value).
> >
> > I can't think of anything you can do to get around that, since the
> > controller starts the timer as soon as it starts up. Would it be
> > possible to bake an initial configuration into the PXE image?
> >
> > When the timer value changes, we could stop the existing timer and
> > restart it. There's a risk that some external automation could make
> > repeated changes to the timeout, thus never letting it expire, but
> > that
> > seems preferable to your problem. I've created an issue for that:
> >
> >
> > https://urldefense.us/v3/__https:/projects.clusterlabs.org/T764
> >
> > BTW there's also election-timeout. I'm not sure offhand how that
> > interacts; it might be necessary to raise that one as well.
> >
> > > One possibly unusual aspect of this cluster is that these two nodes
> > > are stateless - they PXE boot from an image on another server - and
> > > I
> > > build the cluster configuration at boot time with a series of pcs
> > > commands, because the nodes have no local storage for this
> > > purpose. The commands are:
> > >
> > > ['pcs', 'cluster', 'start']
> > > ['pcs', 'property', 'set', 'stonith-action=off']
> > > ['pcs', 'property', 'set', 'cluster-recheck-interval=60']
> > > ['pcs', 'property', 'set', 'start-failure-is-fatal=false']
> > > ['pcs', 'property', 'set', 'dc-deadtime=300']
> > > ['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman',
> > > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > > 'pcmk_host_list=gopher11,gopher12']
> > > ['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman',
> > > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > > 'pcmk_host_list=gopher11,gopher12']
> > > ['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool',
> > > 'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11',
> > > 'op',
> > > 'start', 'timeout=805']
> > > ...
> > > ['pcs', 'property', 'set', 'no-quorum-policy=ignore']
> >
> > BTW you don't need to change no-quorum-policy when you're using
> > two_node with Corosync.
> >
> > > I could, instead, generate a CIB so that when Pacemaker is started,
> > > it has a full config. Is that better?
> > >
> > > thanks,
> > > Olaf
> > >
> > > === corosync.conf:
> > > totem {
> > > version: 2
> > > cluster_name: gopher11
> > > secauth: off
> > > transport: udpu
> > > }
> > > nodelist {
> > > node {
> > > ring0_addr: gopher11
> > > name: gopher11
> > > nodeid: 1
> > > }
> > > node {
> > > ring0_addr: gopher12
> > > name: gopher12
> > > nodeid: 2
> > > }
> > > }
> > > quorum {
> > > provider: corosync_votequorum
> > > two_node: 1
> > > }
> > >
> > > === Log excerpt
> > >
> > > Here's an except from Pacemaker logs that reflect what I'm
> > > seeing. These are from gopher12, the node that came up first. The
> > > other node, which is not yet up, is gopher11.
> > >
> > > Jan 25 17:55:38 gopher12 pacemakerd [116033]
> > > (main) notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2
> > > default-
> > > concurrent-fencing generated-manpages monotonic nagios ncurses
> > > remote
> > > systemd
> > > Jan 25 17:55:39 gopher12 pacemaker-controld [116040]
> > > (peer_update_callback) info: Cluster node gopher12 is now member
> > > (was in unknown state)
> > > Jan 25 17:55:43 gopher12 pacemaker-based [116035]
> > > (cib_perform_op) info: ++
> > > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options']: <nvpair id="cib-bootstrap-options-dc-
> > > deadtime"
> > > name="dc-deadtime" value="300"/>
> > > Jan 25 17:56:00 gopher12 pacemaker-controld [116040]
> > > (crm_timer_popped) info: Election Trigger just popped |
> > > input=I_DC_TIMEOUT time=300000ms
> > > Jan 25 17:56:01 gopher12 pacemaker-based [116035]
> > > (cib_perform_op) info: ++
> > > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options']: <nvpair id="cib-bootstrap-options-no-quorum-
> > > policy" name="no-quorum-policy" value="ignore"/>
> > > Jan 25 17:56:01 gopher12 pacemaker-controld [116040]
> > > (abort_transition_graph) info: Transition 0 aborted by cib-
> > > bootstrap-options-no-quorum-policy doing create no-quorum-
> > > policy=ignore: Configuration change | cib=0.26.0
> > > source=te_update_diff_v2:464
> > > path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options'] complete=true
> > > Jan 25 17:56:01 gopher12 pacemaker-controld [116040]
> > > (controld_execute_fence_action) notice: Requesting fencing (off)
> > > targeting node gopher11 | action=11 timeout=60
> > >
> > >
> > > _______________________________________________
> > > Manage your subscription:
> > >
> https://urldefense.us/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home:
> > > https://urldefense.us/v3/__https:/www.clusterlabs.org/
> > >
> > --
> > Ken Gaillot <kgaillot at redhat.com>
> >
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20240130/5d77eba8/attachment-0001.htm>
More information about the Users
mailing list