[ClusterLabs] Antw: [EXT] Systemd resource started on node after reboot before cluster is stable ?
Adam Cecile
acecile at le-vert.net
Thu Feb 16 05:13:56 EST 2023
On 2/16/23 07:57, Ulrich Windl wrote:
>>>> Adam Cecile<acecile at le-vert.net> schrieb am 15.02.2023 um 10:49 in
> Nachricht
> <b4f1f2f1-66fe-ca62-ff4f-708d781a507c at le-vert.net>:
>> Hello,
>>
>> Just had some issue with unexpected server behavior after reboot. This
>> node was powered off, so cluster was running fine with this tomcat9
>> resource running on a different machine.
>>
>> After powering on this node again, it briefly started tomcat before
>> joining the cluster and decided to stop it again. I'm not sure why.
>>
>>
>> Here is the systemctl status tomcat9 on this host:
>>
>> tomcat9.service - Apache Tomcat 9 Web Application Server
>> Loaded: loaded (/lib/systemd/system/tomcat9.service; disabled;
>> vendor preset: enabled)
>> Drop-In: /etc/systemd/system/tomcat9.service.d
>> └─override.conf
>> Active: inactive (dead)
>> Docs:https://tomcat.apache.org/tomcat-9.0-doc/index.html
>>
>> Feb 15 09:43:27 server tomcat9[1398]: Starting service [Catalina]
>> Feb 15 09:43:27 server tomcat9[1398]: Starting Servlet engine: [Apache
>> Tomcat/9.0.43 (Debian)]
>> Feb 15 09:43:27 server tomcat9[1398]: [...]
>> Feb 15 09:43:29 server systemd[1]: Stopping Apache Tomcat 9 Web
>> Application Server...
>> Feb 15 09:43:29 server systemd[1]: tomcat9.service: Succeeded.
>> Feb 15 09:43:29 server systemd[1]: Stopped Apache Tomcat 9 Web
>> Application Server.
>> Feb 15 09:43:29 server systemd[1]: tomcat9.service: Consumed 8.017s CPU
>> time.
>>
>> You can see it is disabled and should NOT be started with the same,
>> start/stop is under Corosync control
>>
>>
>> The systemd resource is defined like this:
>>
>> primitive tomcat9 systemd:tomcat9.service \
>> op start interval=0 timeout=120 \
>> op stop interval=0 timeout=120 \
>> op monitor interval=60 timeout=100
>>
>>
>> Any idea why this happened ?
> Your journal (syslog) should tell you!
Indeed, I overlooked yesterday... But it says it's pacemaker that
decided to start it:
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync members[3]: 1 2 3
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Sync joined[2]: 1 2
Feb 15 09:43:26 server3 corosync[568]: [TOTEM ] A new membership
(1.42d) was formed. Members joined: 1 2
Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] This node is within
the primary component and will provide service.
Feb 15 09:43:26 server3 corosync[568]: [QUORUM] Members[3]: 1 2 3
Feb 15 09:43:26 server3 corosync[568]: [MAIN ] Completed service
synchronization, ready to provide service.
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Quorum acquired
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-based[857]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: Transition 0
aborted: Peer Halt
Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server1
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Another DC
detected: server2 (op=noop)
Feb 15 09:43:26 server3 pacemaker-fenced[858]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State
transition S_ELECTION -> S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-controld[862]: warning: Cancelling
timer for action 12 (src=67)
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: No need to
invoke the TE (A_TE_HALT) in state S_RELEASE_DC
Feb 15 09:43:26 server3 pacemaker-attrd[860]: notice: Node server2
state is now member
Feb 15 09:43:26 server3 pacemaker-controld[862]: notice: State
transition S_PENDING -> S_NOT_DC
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting
#attrd-protocol[server1]: (unset) -> 2
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Detected another
attribute writer (server2), starting new election
Feb 15 09:43:27 server3 pacemaker-attrd[860]: notice: Setting
#attrd-protocol[server2]: (unset) -> 2
Feb 15 09:43:27 server3 IPaddr2(Shared-IPv4)[1258]: INFO:
Feb 15 09:43:27 server3 ntpd[602]: Listen normally on 8 eth0 10.13.68.12:123
Feb 15 09:43:27 server3 ntpd[602]: new interface(s) found: waking up
resolver
=> Feb 15 09:43:28 server3 pacemaker-controld[862]: notice: Result of
start operation for tomcat9 on server3: ok
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link
change for host: 2 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: PMTUD link
change for host: 1 link: 0 from 485 to 1397
Feb 15 09:43:29 server3 corosync[568]: [KNET ] pmtud: Global data MTU
changed to: 1397
=> Feb 15 09:43:29 server3 pacemaker-controld[862]: notice: Requesting
local execution of stop operation for tomcat9 on server3
Any idea ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230216/6bf2d237/attachment.htm>
More information about the Users
mailing list