[ClusterLabs] controlling cluster behavior on startup
Walker, Chris
christopher.walker at hpe.com
Tue Jan 30 08:20:53 EST 2024
>>> However, now it seems to wait that amount of time before it elects a
>>> DC, even when quorum is acquired earlier. In my log snippet below,
>>> with dc-deadtime 300s,
>>
>> The dc-deadtime is not waiting for quorum, but for another DC to show
>> up. If all nodes show up, it can proceed, but otherwise it has to wait.
> I believe all the nodes showed up by 14:17:04, but it still waited until 14:19:26 to elect a DC:
> Jan 29 14:14:25 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher12 is now membe (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher11 is now membe (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> This is a cluster with 2 nodes, gopher11 and gopher12.
This is our experience with dc-deadtime too: even if both nodes in the cluster show up, dc-deadtime must elapse before the cluster starts. This was discussed on this list a while back (https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and an RFE came out of it (https://bugs.clusterlabs.org/show_bug.cgi?id=5310).
I’ve worked around this by having an ExecStartPre directive for Corosync that does essentially:
while ! systemctl -H ${peer} is-active corosync; do sleep 5; done
With this in place, the nodes wait for each other before starting Corosync and Pacemaker. We can then use the default 20s dc-deadtime so that the DC election happens quickly once both nodes are up.
Thanks,
Chris
From: Users <users-bounces at clusterlabs.org> on behalf of Faaland, Olaf P. via Users <users at clusterlabs.org>
Date: Monday, January 29, 2024 at 7:46 PM
To: Ken Gaillot <kgaillot at redhat.com>, Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Cc: Faaland, Olaf P. <faaland1 at llnl.gov>
Subject: Re: [ClusterLabs] controlling cluster behavior on startup
>> However, now it seems to wait that amount of time before it elects a
>> DC, even when quorum is acquired earlier. In my log snippet below,
>> with dc-deadtime 300s,
>
> The dc-deadtime is not waiting for quorum, but for another DC to show
> up. If all nodes show up, it can proceed, but otherwise it has to wait.
I believe all the nodes showed up by 14:17:04, but it still waited until 14:19:26 to elect a DC:
Jan 29 14:14:25 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher12 is now membe (was in unknown state)
Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher11 is now membe (was in unknown state)
Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2
Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
This is a cluster with 2 nodes, gopher11 and gopher12.
Am I misreading that?
thanks,
Olaf
________________________________________
From: Ken Gaillot <kgaillot at redhat.com>
Sent: Monday, January 29, 2024 3:49 PM
To: Faaland, Olaf P.; Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] controlling cluster behavior on startup
On Mon, 2024-01-29 at 22:48 +0000, Faaland, Olaf P. wrote:
> Thank you, Ken.
>
> I changed my configuration management system to put an initial
> cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> values I was setting via pcs commands, including dc-deadtime. I
> removed those "pcs property set" commands from the ones that are run
> at startup time.
>
> That worked in the sense that after Pacemaker start, the node waits
> my newly specified dc-deadtime of 300s before giving up on the
> partner node and fencing it, if the partner never appears as a
> member.
>
> However, now it seems to wait that amount of time before it elects a
> DC, even when quorum is acquired earlier. In my log snippet below,
> with dc-deadtime 300s,
The dc-deadtime is not waiting for quorum, but for another DC to show
up. If all nodes show up, it can proceed, but otherwise it has to wait.
>
> 14:14:24 Pacemaker starts on gopher12
> 14:17:04 quorum is acquired
> 14:19:26 Election Trigger just popped (start time + dc-deadtime
> seconds)
> 14:19:26 gopher12 wins the election
>
> Is there other configuration that needs to be present in the cib at
> startup time?
>
> thanks,
> Olaf
>
> === log extract using new system of installing partial cib.xml before
> startup
> Jan 29 14:14:24 gopher12 pacemakerd [123690]
> (main) notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> features:agent-manpages ascii-docs compat-2.0 corosync-ge-2 default-
> concurrent-fencing generated-manpages monotonic nagios ncurses remote
> systemd
> Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> (attrd_start_election_if_needed) info: Starting an election to
> determine the writer
> Jan 29 14:14:25 gopher12 pacemaker-attrd [123695]
> (election_check) info: election-attrd won by local node
> Jan 29 14:14:25 gopher12 pacemaker-controld [123697]
> (peer_update_callback) info: Cluster node gopher12 is now member
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld [123697]
> (quorum_notification_cb) notice: Quorum acquired | membership=54
> members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> (crm_timer_popped) info: Election Trigger just popped |
> input=I_DC_TIMEOUT time=300000ms
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> (do_log) warning: Input I_DC_TIMEOUT received in state S_PENDING
> from crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> (do_state_transition) info: State transition S_PENDING ->
> S_ELECTION | input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> origin=crm_timer_popped
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> (election_check) info: election-DC won by local node
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-controld [123697]
> (do_state_transition) notice: State transition S_ELECTION ->
> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL
> origin=election_win_cb
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active) info: Start 10s-interval monitor
> for gopher11_zpool on gopher11
> Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> (recurring_op_for_active) info: Start 10s-interval monitor
> for gopher12_zpool on gopher12
>
>
> === initial cib.xml contents
> <cib crm_feature_set="3.19.0" validate-with="pacemaker-3.9" epoch="9"
> num_updates="0" admin_epoch="0" cib-last-written="Mon Jan 29 11:07:06
> 2024" update-origin="gopher12" update-client="root" update-
> user="root" have-quorum="0" dc-uuid="2">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-stonith-action"
> name="stonith-action" value="off"/>
> <nvpair id="cib-bootstrap-options-have-watchdog" name="have-
> watchdog" value="false"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-
> version" value="2.1.7-1.t4-2.1.7"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="corosync"/>
> <nvpair id="cib-bootstrap-options-cluster-name"
> name="cluster-name" value="gopher11"/>
> <nvpair id="cib-bootstrap-options-cluster-recheck-inte"
> name="cluster-recheck-interval" value="60"/>
> <nvpair id="cib-bootstrap-options-start-failure-is-fat"
> name="start-failure-is-fatal" value="false"/>
> <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-
> deadtime" value="300"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="1" uname="gopher11"/>
> <node id="2" uname="gopher12"/>
> </nodes>
> <resources/>
> <constraints/>
> </configuration>
> </cib>
>
> ________________________________________
> From: Ken Gaillot <kgaillot at redhat.com>
> Sent: Monday, January 29, 2024 10:51 AM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Cc: Faaland, Olaf P.
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
>
> On Mon, 2024-01-29 at 18:05 +0000, Faaland, Olaf P. via Users wrote:
> > Hi,
> >
> > I have configured clusters of node pairs, so each cluster has 2
> > nodes. The cluster members are statically defined in corosync.conf
> > before corosync or pacemaker is started, and quorum {two_node: 1}
> > is
> > set.
> >
> > When both nodes are powered off and I power them on, they do not
> > start pacemaker at exactly the same time. The time difference may
> > be
> > a few minutes depending on other factors outside the nodes.
> >
> > My goals are (I call the first node to start pacemaker "node1"):
> > 1) I want to control how long pacemaker on node1 waits before
> > fencing
> > node2 if node2 does not start pacemaker.
> > 2) If node1 is part-way through that waiting period, and node2
> > starts
> > pacemaker so they detect each other, I would like them to proceed
> > immediately to probing resource state and starting resources which
> > are down, not wait until the end of that "grace period".
> >
> > It looks from the documentation like dc-deadtime is how #1 is
> > controlled, and #2 is expected normal behavior. However, I'm
> > seeing
> > fence actions before dc-deadtime has passed.
> >
> > Am I misunderstanding Pacemaker's expected behavior and/or how dc-
> > deadtime should be used?
>
> You have everything right. The problem is that you're starting with
> an
> empty configuration every time, so the default dc-deadtime is being
> used for the first election (before you can set the desired value).
>
> I can't think of anything you can do to get around that, since the
> controller starts the timer as soon as it starts up. Would it be
> possible to bake an initial configuration into the PXE image?
>
> When the timer value changes, we could stop the existing timer and
> restart it. There's a risk that some external automation could make
> repeated changes to the timeout, thus never letting it expire, but
> that
> seems preferable to your problem. I've created an issue for that:
>
>
> https://urldefense.us/v3/__https:/projects.clusterlabs.org/T764<https://urldefense.us/v3/__https:/projects.clusterlabs.org/T764>
>
> BTW there's also election-timeout. I'm not sure offhand how that
> interacts; it might be necessary to raise that one as well.
>
> > One possibly unusual aspect of this cluster is that these two nodes
> > are stateless - they PXE boot from an image on another server - and
> > I
> > build the cluster configuration at boot time with a series of pcs
> > commands, because the nodes have no local storage for this
> > purpose. The commands are:
> >
> > ['pcs', 'cluster', 'start']
> > ['pcs', 'property', 'set', 'stonith-action=off']
> > ['pcs', 'property', 'set', 'cluster-recheck-interval=60']
> > ['pcs', 'property', 'set', 'start-failure-is-fatal=false']
> > ['pcs', 'property', 'set', 'dc-deadtime=300']
> > ['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman',
> > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > 'pcmk_host_list=gopher11,gopher12']
> > ['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman',
> > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > 'pcmk_host_list=gopher11,gopher12']
> > ['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool',
> > 'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11',
> > 'op',
> > 'start', 'timeout=805']
> > ...
> > ['pcs', 'property', 'set', 'no-quorum-policy=ignore']
>
> BTW you don't need to change no-quorum-policy when you're using
> two_node with Corosync.
>
> > I could, instead, generate a CIB so that when Pacemaker is started,
> > it has a full config. Is that better?
> >
> > thanks,
> > Olaf
> >
> > === corosync.conf:
> > totem {
> > version: 2
> > cluster_name: gopher11
> > secauth: off
> > transport: udpu
> > }
> > nodelist {
> > node {
> > ring0_addr: gopher11
> > name: gopher11
> > nodeid: 1
> > }
> > node {
> > ring0_addr: gopher12
> > name: gopher12
> > nodeid: 2
> > }
> > }
> > quorum {
> > provider: corosync_votequorum
> > two_node: 1
> > }
> >
> > === Log excerpt
> >
> > Here's an except from Pacemaker logs that reflect what I'm
> > seeing. These are from gopher12, the node that came up first. The
> > other node, which is not yet up, is gopher11.
> >
> > Jan 25 17:55:38 gopher12 pacemakerd [116033]
> > (main) notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2
> > default-
> > concurrent-fencing generated-manpages monotonic nagios ncurses
> > remote
> > systemd
> > Jan 25 17:55:39 gopher12 pacemaker-controld [116040]
> > (peer_update_callback) info: Cluster node gopher12 is now member
> > (was in unknown state)
> > Jan 25 17:55:43 gopher12 pacemaker-based [116035]
> > (cib_perform_op) info: ++
> > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options']: <nvpair id="cib-bootstrap-options-dc-
> > deadtime"
> > name="dc-deadtime" value="300"/>
> > Jan 25 17:56:00 gopher12 pacemaker-controld [116040]
> > (crm_timer_popped) info: Election Trigger just popped |
> > input=I_DC_TIMEOUT time=300000ms
> > Jan 25 17:56:01 gopher12 pacemaker-based [116035]
> > (cib_perform_op) info: ++
> > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options']: <nvpair id="cib-bootstrap-options-no-quorum-
> > policy" name="no-quorum-policy" value="ignore"/>
> > Jan 25 17:56:01 gopher12 pacemaker-controld [116040]
> > (abort_transition_graph) info: Transition 0 aborted by cib-
> > bootstrap-options-no-quorum-policy doing create no-quorum-
> > policy=ignore: Configuration change | cib=0.26.0
> > source=te_update_diff_v2:464
> > path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> > bootstrap-options'] complete=true
> > Jan 25 17:56:01 gopher12 pacemaker-controld [116040]
> > (controld_execute_fence_action) notice: Requesting fencing (off)
> > targeting node gopher11 | action=11 timeout=60
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://urldefense.us/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users<https://urldefense.us/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users>
> >
> > ClusterLabs home:
> > https://urldefense.us/v3/__https:/www.clusterlabs.org/<https://urldefense.us/v3/__https:/www.clusterlabs.org/>
> >
> --
> Ken Gaillot <kgaillot at redhat.com>
>
--
Ken Gaillot <kgaillot at redhat.com>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users<https://lists.clusterlabs.org/mailman/listinfo/users>
ClusterLabs home: https://www.clusterlabs.org/<https://www.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20240130/ba93919e/attachment-0001.htm>
More information about the Users
mailing list