[ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)
Reid Wahl
nwahl at redhat.com
Thu Jan 12 01:26:56 EST 2023
On Wed, Jan 11, 2023 at 10:21 PM Madison Kelly <mkelly at alteeve.com> wrote:
>
> On 2023-01-12 01:12, Reid Wahl wrote:
> > On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly <mkelly at alteeve.com> wrote:
> >>
> >> Hi all,
> >>
> >> There was a lot of sub-threads, so I figured it's helpful to start a
> >> new thread with a summary so far. For context; I have a super simple
> >> perl script that pretends to be an RA for the sake of debugging.
> >>
> >> https://pastebin.com/9z314TaB
> >>
> >> I've had variations log environment variables and confirmed that all
> >> the variables in the direct call that work are in the crm_resource
> >> triggered call. There are no selinux issues logged in audit.log and
> >> selinux is permissive. The script logs the real and effective UID and
> >> GID and it's the same in both instances. Calling other shell programs
> >> (tested with 'hostname') run fine, this is specifically crm_resource ->
> >> test RA -> virsh call.
> >>
> >> I ran strace on the virsh call from inside my test script (changing
> >> 'virsh.good' to 'virsh.bad' between running directly and via
> >> crm_resource. The strace runs made six files each time. Below are
> >> pastebin links with the outputs of the six runs in one paste, but each
> >> file's output is in it's own block (search for file: to see the
> >> different file outputs)
> >>
> >> Good/direct run of the test RA:
> >> - https://pastebin.com/xtqe9NSG
> >>
> >> Bad/crm_resource triggered run of the test RA:
> >> - https://pastebin.com/vBiLVejW
> >>
> >> Still absolutely stumped.
> >
> > The strace outputs show that your bad runs are all getting stopped
> > with SIGTTOU. If you've never heard of that, me either.
>
> The hell?! This is new to me also.
>
> > https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html
> >
> > Macro: int SIGTTOU
> >
> > This is similar to SIGTTIN, but is generated when a process in a
> > background job attempts to write to the terminal or set its modes.
> > Again, the default action is to stop the process. SIGTTOU is only
> > generated for an attempt to write to the terminal if the TOSTOP output
> > mode is set; see Output Modes.
> >
> >
> > Maybe this has something to do with the buffer settings in the perl
> > script(?). It might be worth trying a version that doesn't fiddle with
> > the outputs and buffer settings.
>
> I tried removing the $|, and then I changed the script to be entirely a
> bash script, still hanging. I tried 'virsh --connect <method> list
> --all' where method was qemu:///system, qemu:///session, and
> ssh+qemu:///root@localhost/system, all hang. In bash or perl.
>
> > I don't know which difference between your environment and mine is
> > relevant here, such that I can't reproduce the issue using your test
> > script. It works perfectly fine for me.
> >
> > Can you run `stty -a | grep tostop`? If there's a minus sign
> > ("-tostop"), it's disabled; if it's present without a minus sign
> > ("tostop"), it's enabled, as best I can tell.
>
> -tostop is there
>
> ====
> [root at mk-a07n02 ~]# stty -a | grep tostop
> isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
> [root at mk-a07n02 ~]#
> ====
>
> > I'm just spitballing here. It's disabled by default on my machine...
> > but even when I enable it, crm_resource --validate works fine. It may
> > be set differently when running under crm_resource.
>
> How do you enable it?
With `stty tostop`
It's 100% possible that this whole thing is a red herring by the way.
I'm looking for anything that might explain the discrepancy. SIGTTOU
may not be directly tied to the root cause.
>
> --
> Madison Kelly
> Alteeve's Niche!
> Chief Technical Officer
> c: +1-647-471-0951
> https://alteeve.com/
>
--
Regards,
Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker
More information about the Users
mailing list