<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2/16/23 20:54, Ken Gaillot wrote:<br>
</div>
<blockquote type="cite"
cite="mid:89caf7e367060ea23b842106138718e72186c292.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">On Thu, 2023-02-16 at 11:13 +0100, Adam Cecile wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
On 2/16/23 07:57, Ulrich Windl wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Adam Cecile <a class="moz-txt-link-rfc2396E" href="mailto:acecile@le-vert.net"><acecile@le-vert.net></a> schrieb am 15.02.2023 um
10:49 in
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Nachricht
<a class="moz-txt-link-rfc2396E" href="mailto:b4f1f2f1-66fe-ca62-ff4f-708d781a507c@le-vert.net"><b4f1f2f1-66fe-ca62-ff4f-708d781a507c@le-vert.net></a>:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hello,
Just had some issue with unexpected server behavior after reboot.
This
node was powered off, so cluster was running fine with this
tomcat9
resource running on a different machine.
After powering on this node again, it briefly started tomcat
before
joining the cluster and decided to stop it again. I'm not sure
why.
Here is the systemctl status tomcat9 on this host:
tomcat9.service - Apache Tomcat 9 Web Application Server
Loaded: loaded (/lib/systemd/system/tomcat9.service;
disabled;
vendor preset: enabled)
Drop-In: /etc/systemd/system/tomcat9.service.d
└─override.conf
Active: inactive (dead)
Docs: <a class="moz-txt-link-freetext" href="https://tomcat.apache.org/tomcat-9.0-doc/index.html">https://tomcat.apache.org/tomcat-9.0-doc/index.html</a>
Feb 15 09:43:27 server tomcat9[1398]: Starting service [Catalina]
Feb 15 09:43:27 server tomcat9[1398]: Starting Servlet engine:
[Apache
Tomcat/9.0.43 (Debian)]
Feb 15 09:43:27 server tomcat9[1398]: [...]
Feb 15 09:43:29 server systemd[1]: Stopping Apache Tomcat 9 Web
Application Server...
Feb 15 09:43:29 server systemd[1]: tomcat9.service: Succeeded.
Feb 15 09:43:29 server systemd[1]: Stopped Apache Tomcat 9 Web
Application Server.
Feb 15 09:43:29 server systemd[1]: tomcat9.service: Consumed
8.017s CPU
time.
You can see it is disabled and should NOT be started with the
same,
start/stop is under Corosync control
The systemd resource is defined like this:
primitive tomcat9 systemd:tomcat9.service \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op monitor interval=60 timeout=100
Any idea why this happened ?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Your journal (syslog) should tell you!
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Indeed, I overlooked yesterday... But it says it's pacemaker that
decided to start it:
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync members[3]: 1
2 3
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync joined[2]: 1 2
Feb 15 09:43:26 server3 corosync[568]: [TOTEM ] A new membership
(1.42d) was formed. Members joined: 1 2
Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] This node is within
the primary component and will provide service.
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Members[3]: 1 2 3
Feb 15 09:43:26 server3 corosync[568]: [MAIN ] Completed service
synchronization, ready to provide service.
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Quorum
acquired
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node
server1 state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node
server2 state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Transition
0 aborted: Peer Halt
Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Another DC
detected: server2 (op=noop)
Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State
transition S_ELECTION -> S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Cancelling
timer for action 12 (src=67)
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: No need to
invoke the TE (A_TE_HALT) in state S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State
transition S_PENDING -> S_NOT_DC
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting
#attrd-protocol[server1]: (unset) -> 2
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Detected
another attribute writer (server2), starting new election
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting
#attrd-protocol[server2]: (unset) -> 2
Feb 15 09:43:27 server3 IPaddr2(Shared-IPv4)[1258]: INFO:
Feb 15 09:43:27 server3 ntpd[602]: Listen normally on 8 eth0
10.13.68.12:123
Feb 15 09:43:27 server3 ntpd[602]: new interface(s) found: waking up
resolver
=> Feb 15 09:43:28 server3 pacemaker-controld[862]: notice: Result
of start operation for tomcat9 on server3: ok
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link
change for host: 2 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link
change for host: 1 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: Global data
MTU changed to: 1397
=> Feb 15 09:43:29 server3 pacemaker-controld[862]: notice:
Requesting local execution of stop operation for tomcat9 on server3
Any idea ?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
What do the logs on the other node say over the same time frame?</pre>
</blockquote>
<p>Here is the logs from server2, same period:</p>
<p>Feb 15 09:43:26 server2 corosync[524]: [KNET ] rx: host: 3
link: 0 is up<br>
Feb 15 09:43:26 server2 corosync[524]: [KNET ] host: host: 3
(passive) best link: 0 (pri: 1)<br>
Feb 15 09:43:26 server2 corosync[524]: [QUORUM] Sync members[3]:
1 2 3<br>
Feb 15 09:43:26 server2 corosync[524]: [QUORUM] Sync joined[1]:
3<br>
Feb 15 09:43:26 server2 corosync[524]: [TOTEM ] A new membership
(1.42d) was formed. Members joined: 3<br>
Feb 15 09:43:26 server2 pacemaker-based[812]: notice: Node
server3. state is now member <br>
Feb 15 09:43:26 server2 pacemaker-fenced[813]: notice: Node
server3. state is now member <br>
Feb 15 09:43:26 server2 pacemaker-attrd[815]: notice: Node
server3. state is now member <br>
Feb 15 09:43:26 server2 corosync[524]: [QUORUM] Members[3]: 1 2
3<br>
Feb 15 09:43:26 server2 corosync[524]: [MAIN ] Completed
service synchronization, ready to provide service.<br>
Feb 15 09:43:26 server2 pacemaker-controld[817]: notice: Node
server3. state is now member <br>
Feb 15 09:43:26 server2 pacemaker-controld[817]: notice: State
transition S_IDLE -> S_INTEGRATION <br>
Feb 15 09:43:26 server2 pacemaker-controld[817]: warning: Another
DC detected: server3. (op=noop)<br>
Feb 15 09:43:26 server2 pacemaker-controld[817]: notice: State
transition S_ELECTION -> S_INTEGRATION <br>
Feb 15 09:43:27 server2 pacemaker-attrd[815]: notice: Detected
another attribute writer (server3.), starting new election<br>
Feb 15 09:43:27 server2 pacemaker-attrd[815]: notice: Setting
#attrd-protocol[server3.]: (unset) -> 2 <br>
Feb 15 09:43:27 server2 pacemaker-attrd[815]: notice: Recorded
local node as attribute writer (was unset)<br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: notice: On
loss of quorum: Ignore<br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: error:
Resource Shared-IPv4 is active on 2 nodes (attempting recovery)<br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: notice: See
<a class="moz-txt-link-freetext" href="https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active">https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for
more information<br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: notice: *
Restart Shared-IPv4 (
server1. ) <br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: notice: *
Restart tomcat9 (
server1. ) due to required Shared-IPv4 start<br>
Feb 15 09:43:27 server2 pacemaker-schedulerd[816]: error:
Calculated transition 767 (with errors), saving inputs in
/var/lib/pacemaker/pengine/pe-error-5.bz2<br>
Feb 15 09:43:27 server2 pacemaker-controld[817]: notice:
Initiating stop operation tomcat9_stop_0 on server1. <br>
Feb 15 09:43:28 server2 pacemaker-controld[817]: notice:
Transition 767 aborted by operation tomcat9_start_0 'modify' on
server3.: Foreign event <br>
Feb 15 09:43:29 server2 pacemaker-controld[817]: notice:
Transition 767 (Complete=2, Pending=0, Fired=0, Skipped=2,
Incomplete=9, Source=/var/lib/pacemaker/pengine/pe-error-5.bz2):
Stopped<br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: notice: On
loss of quorum: Ignore<br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: error:
Resource Shared-IPv4 is active on 2 nodes (attempting recovery)<br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: notice: See
<a class="moz-txt-link-freetext" href="https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active">https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for
more information<br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: notice: *
Restart Shared-IPv4 (
server1. ) <br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: notice: *
Move tomcat9 ( server3. -> server1. ) <br>
Feb 15 09:43:29 server2 pacemaker-schedulerd[816]: error:
Calculated transition 768 (with errors), saving inputs in
/var/lib/pacemaker/pengine/pe-error-6.bz2<br>
Feb 15 09:43:29 server2 pacemaker-controld[817]: notice:
Initiating stop operation tomcat9_stop_0 on server3. <br>
Feb 15 09:43:31 server2 pacemaker-controld[817]: notice:
Initiating stop operation Shared-IPv4_stop_0 on server3. <br>
Feb 15 09:43:31 server2 pacemaker-controld[817]: notice:
Initiating stop operation Shared-IPv4_stop_0 on server1. <br>
Feb 15 09:43:31 server2 pacemaker-controld[817]: notice:
Initiating start operation Shared-IPv4_start_0 on server1. <br>
Feb 15 09:43:31 server2 pacemaker-controld[817]: notice:
Initiating monitor operation Shared-IPv4_monitor_60000 on server1.
<br>
Feb 15 09:43:31 server2 pacemaker-controld[817]: notice:
Initiating start operation tomcat9_start_0 on server1. <br>
Feb 15 09:43:33 server2 pacemaker-controld[817]: notice:
Initiating monitor operation tomcat9_monitor_60000 on server1. <br>
Feb 15 09:43:33 server2 pacemaker-controld[817]: notice:
Transition 768 (Complete=11, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-6.bz2):
Complete<br>
Feb 15 09:43:33 server2 pacemaker-controld[817]: notice: State
transition S_TRANSITION_ENGINE -> S_IDLE <br>
Feb 15 09:43:38 server2 pacemaker-based[812]: notice: Local CIB
0.206.53.01293e9ed5a687db873d1ddd384803e2 differs from server3.:
0.206.51.411962ae9850d59bc1bbf4728b9e89b2 0x55f23d164cb0<br>
<br>
</p>
<div id="grammalecte_menu_main_button_shadow_host" style="width:
0px; height: 0px;"></div>
</body>
</html>