[ClusterLabs] Issue with DRBD + a systemd resource

Julien Semaan jsemaan at inverse.ca
Thu Dec 14 10:49:58 EST 2017


Hi Andrei,

Great success!

Adding the following line to /usr/lib/systemd/system/pacemaker.service 
did it:
After=dbus.service

Now, the question is, should the unit file shipped in the RPM be 
adjusted (currently using CentOS 7), if so, is this the best place to 
get the message going, or should I post this to a specific BTS ?

Thanks!

-- 
Julien Semaan
jsemaan at inverse.ca   ::  +1 (866) 353-6153 *155  ::www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (www.packetfence.org)



On 2017-12-14 12:12 AM, Andrei Borzenkov wrote:
>
>
> Отправлено с iPhone
>
> 13 дек. 2017 г., в 22:53, Julien Semaan <jsemaan at inverse.ca 
> <mailto:jsemaan at inverse.ca>> написал(а):
>
>> Hello,
>>
>> Its my first post on this mailing list so excuse any rookie mistake I 
>> may do in this thread.
>>
>> We currently have clusters deployed using corosync/pacemaker that 
>> manage DRBD + a couple of systemd services.
>>
>> My colleague Derek previously emailed the list about it but has left 
>> the company since then:
>> http://lists.clusterlabs.org/pipermail/users/2017-November/006796.html
>>
>> I'm hoping to continue his work in order to fix it once and for all.
>>
>> I looked into the Q&A that was done in that thread and have managed 
>> to track it down to the following:
>> - If I reboot the server that is running as the primary (DRBD + 
>> systemd resources started), then when it completes reboot, there is a 
>> split-brain
>> - If I stop pacemaker (systemctl stop pacemaker), then reboot that 
>> primary server, then it comes back online without any issues and no 
>> split-brain
>> - If I reboot the server that doesn't have the running resources, all 
>> goes well
>>
>> Following those observations, my guess is that the way the pacemaker 
>> services are being stopped during a systemd shutdown is causing issues.
>> It seems that pacemaker isn't stopping the systemd resources in that 
>> case and thus, not un-mounting the DRBD partition, putting it in 
>> secondary before stopping DRBD which results in the split-brain.
>>
>
>
> According to your log D-Bus is stopped before pacemaker. Try adding 
> After dependency on dbus service to pacemaker.
>
>
>
>> Here is the interesting bit I found in the logs:
>> Dec 13 14:09:40 act-pass-2 lrmd[1133]:    error: Could not connect to 
>> System DBus: Did not receive a reply. Possible causes include: the 
>> remote application did not send a reply, the message bus security 
>> policy blocked the reply, the reply timeout expired, or the network 
>> connection was broken.
>> Dec 13 14:09:40 act-pass-2 lrmd[1133]:    error: systemd_unit_exec: 
>> Triggered fatal assert at systemd.c:730 : systemd_init()
>> Dec 13 14:09:40 act-pass-2 pacemakerd[1083]:    error: Managed 
>> process 1133 (lrmd) dumped core
>> Dec 13 14:09:40 act-pass-2 pacemakerd[1083]:    error: The lrmd 
>> process (1133) terminated with signal 6 (core=1)
>>
>> And a pastebin of the full journald output during the shutdown
>> https://pastebin.com/CB38BiwC
>>
>> Not sure where to go from there, may be a dependency to another 
>> systemd resource but it seems more like an issue connecting to 
>> systemd itself to stop the systemd resources of the cluster (that's a 
>> wild guess) since systemd isn't accepting commands since its 
>> stopping. At this point, this goes beyond my knowledge of systemd so 
>> I'd like some guidance on any required adjustment or further 
>> necessary troubleshooting.
>>
>> Best Regards,
>>
>> -- 
>> Julien Semaan
>> jsemaan at inverse.ca   ::  +1 (866) 353-6153 *155  ::www.inverse.ca
>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (www.packetfence.org)
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20171214/ea51a647/attachment-0003.html>


More information about the Users mailing list