Are gou using sharding for glusterfs ?<div id="yMail_cursorElementTracker_1630233453737"><br></div><div id="yMail_cursorElementTracker_1630233453904">I would put libvirt service and glusterfs service in a systemd dependency as your libvirt relies on gluster being available.</div><div id="yMail_cursorElementTracker_1630233503295"><br></div><div id="yMail_cursorElementTracker_1630233504057">Also, check if you got 'backup-volfile-servers' mount option if using FUSE.With libgfapi, I got no clue how to configure that.</div><div id="yMail_cursorElementTracker_1630233550776"><br></div><div id="yMail_cursorElementTracker_1630233550935">Your setup looksfar close to the oVirt project ... (just mentioning).</div><div id="yMail_cursorElementTracker_1630233574190"><br></div><div id="yMail_cursorElementTracker_1630233574371">Best Regards,</div><div id="yMail_cursorElementTracker_1630233578004">Strahil Nikolov<br><br><div id="ymail_android_signature"><a id="ymail_android_signature_link" href="https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature">Sent from Yahoo Mail on Android</a></div> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Sat, Aug 28, 2021 at 13:33, lejeczek via Users</div><div><users@clusterlabs.org> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> <br clear="none"><br clear="none">On 26/08/2021 10:35, Klaus Wenninger wrote:<br clear="none">><br clear="none">><br clear="none">> On Thu, Aug 26, 2021 at 11:13 AM lejeczek via Users <br clear="none">> <<a shape="rect" ymailto="mailto:users@clusterlabs.org" href="mailto:users@clusterlabs.org">users@clusterlabs.org</a> <mailto:<a shape="rect" ymailto="mailto:users@clusterlabs.org" href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>>> wrote:<br clear="none">><br clear="none">>     Hi guys.<br clear="none">><br clear="none">>     I sometimes - I think I know when in terms of any<br clear="none">>     pattern -<br clear="none">>     get resources stuck on one node (two-node cluster) with<br clear="none">>     these in libvirtd's logs:<br clear="none">>     ...<br clear="none">>     Cannot start job (query, none, none) for domain<br clear="none">>     c8kubermaster1; current job is (modify, none, none)<br clear="none">>     owned by<br clear="none">>     (192261 qemuProcessReconnect, 0 <null>, 0 <null><br clear="none">>     (flags=0x0)) for (1093s, 0s, 0s)<br clear="none">>     Cannot start job (query, none, none) for domain<br clear="none">>     ubuntu-tor;<br clear="none">>     current job is (modify, none, none) owned by (192263<br clear="none">>     qemuProcessReconnect, 0 <null>, 0 <null> (flags=0x0)) for<br clear="none">>     (1093s, 0s, 0s)<br clear="none">>     Timed out during operation: cannot acquire state<br clear="none">>     change lock<br clear="none">>     (held by monitor=qemuProcessReconnect)<br clear="none">>     Timed out during operation: cannot acquire state<br clear="none">>     change lock<br clear="none">>     (held by monitor=qemuProcessReconnect)<br clear="none">>     ...<br clear="none">><br clear="none">>     when this happens, and if the resourec is meant to be the<br clear="none">>     other node, I have to to disable the resource first, then<br clear="none">>     the node on which resources are stuck will shutdown<br clear="none">>     the VM<br clear="none">>     and then I have to re-enable that resource so it<br clear="none">>     would, only<br clear="none">>     then, start on that other, the second node.<br clear="none">><br clear="none">>     I think this problem occurs if I restart 'libvirtd'<br clear="none">>     via systemd.<br clear="none">><br clear="none">>     Any thoughts on this guys?<br clear="none">><br clear="none">><br clear="none">> What are the logs on the pacemaker-side saying?<br clear="none">> An issue with migration?<br clear="none">><br clear="none">> Klaus<br clear="none"><br clear="none">I'll have to try to tidy up the "protocol" with my stuff so <br clear="none">I could call it all reproducible, at the moment if only <br clear="none">feels that way, as reproducible.<br clear="none"><br clear="none">I'm on CentOS Stream and have 2-node cluster, with KVM <br clear="none">resources, with same glusterfs cluster 2-node. (all <br clear="none">psychically is two machines)<br clear="none"><br clear="none">1) I power down one node in orderly manner and the other <br clear="none">node is last-man-standing.<br clear="none">2) after a while (not sure if time period is also a key <br clear="none">here) I brought up that first node.<br clear="none">3) the last man-standing-node libvirtd becomes irresponsive <br clear="none">(don't know yet, if that is only after the first node came <br clear="none">back up) to virt cmd and to probably everything else, <br clear="none">pacameker log says:<br clear="none">...<br clear="none">pacemaker-controld[2730]:  error: Result of probe operation <br clear="none">for c8kubernode2 on dzien: Timed Out<br clear="none">...<br clear="none">and libvirtd log does not say anything really (with default <br clear="none">debug levels)<br clear="none"><br clear="none">4) if glusterfs might play any role? Healing of the <br clear="none">volume(s) is finished at this time, completed successfully.<br clear="none"><br clear="none">This the moment where I would manually 'systemd restart <br clear="none">libvirtd' that irresponsive node(was last-man-standing) and <br clear="none">got original error messages.<br clear="none"><br clear="none">There is plenty of room for anybody to make guesses, obvious.<br clear="none">Is it 'libvirtd' going haywire because glusterfs volume is <br clear="none">in an unhealthy state and needs healing?<br clear="none">Is it pacemaker last-man-standing which makes 'libvirtd' go <br clear="none">haywire?<br clear="none">etc...<br clear="none"><br clear="none">I can add much concrete stuff at this moment but will <br clear="none">appreciate any thoughts you want to share.<br clear="none">thanks, L<div class="yqt6486222430" id="yqtfd94478"><br clear="none"><br clear="none">>     many thanks, L.<br clear="none">>     _______________________________________________<br clear="none">>     Manage your subscription:<br clear="none">>     <a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br clear="none">>     <<a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a>><br clear="none">><br clear="none">>     ClusterLabs home: <a shape="rect" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br clear="none">>     <<a shape="rect" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a>><br clear="none">><br clear="none"><br clear="none">_______________________________________________<br clear="none">Manage your subscription:<br clear="none"><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br clear="none"><br clear="none">ClusterLabs home: <a shape="rect" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br clear="none"></div> </div> </blockquote></div>