[ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't) - SOLVED!

Thu Jan 12 12:00:04 EST 2023

On Thu, Jan 12, 2023 at 6:24 AM Madison Kelly <mkelly at alteeve.com> wrote:
>
> On 2023-01-11 23:10, Madison Kelly wrote:
> > Hi all,
> >
> >    There was a lot of sub-threads, so I figured it's helpful to start a
> > new thread with a summary so far. For context; I have a super simple
> > perl script that pretends to be an RA for the sake of debugging.
> >
> > https://pastebin.com/9z314TaB
> >
> >    I've had variations log environment variables and confirmed that all
> > the variables in the direct call that work are in the crm_resource
> > triggered call. There are no selinux issues logged in audit.log and
> > selinux is permissive. The script logs the real and effective UID and
> > GID and it's the same in both instances. Calling other shell programs
> > (tested with 'hostname') run fine, this is specifically crm_resource ->
> > test RA -> virsh call.
> >
> >    I ran strace on the virsh call from inside my test script (changing
> > 'virsh.good' to 'virsh.bad' between running directly and via
> > crm_resource. The strace runs made six files each time. Below are
> > pastebin links with the outputs of the six runs in one paste, but each
> > file's output is in it's own block (search for file: to see the
> > different file outputs)
> >
> > Good/direct run of the test RA:
> > - https://pastebin.com/xtqe9NSG
> >
> > Bad/crm_resource triggered run of the test RA:
> > - https://pastebin.com/vBiLVejW
> >
> > Still absolutely stumped.
>
> bandini found the problem
>
> https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
>
> /usr/bin/setsid --wait /usr/bin/virsh list --all
>
> That fixed it.
>
> omg. I'm going to sleep. holy crap.

Hooray! I'm really glad someone figured this out.

Based on a link that was shared in another thread, maybe it worked
fine on my machine due to a newer polkit version.
- https://gitlab.com/libvirt/libvirt/-/issues/366#note_1102131966

The RHEL 8 BZ that's linked in the ServerFault thread was auto-closed
by a bot. Not sure if the fix will find its way in some other way or
not.
- https://bugzilla.redhat.com/show_bug.cgi?id=1726714

>
> --
> Madison Kelly
> Alteeve's Niche!
> Chief Technical Officer
> c: +1-647-471-0951
> https://alteeve.com/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker