[ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

Andrei Borzenkov arvidjaar at gmail.com
Thu Jan 12 05:07:42 EST 2023


On Thu, Jan 12, 2023 at 12:50 PM Keisuke MORI <keisuke.mori+ha at gmail.com> wrote:
>
> Hi,
>
> Just a guess but could it be the same issue with this?
>
> https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
>

That is exactly the same issue. The reason for SIGTTOU is explained in
https://gitlab.com/libvirt/libvirt/-/issues/366#note_1102131966:

most likely due to pkttyagent wanting to change the terminal mode to
read the password

and pkttyagent gets SIGTTOU immediately after trying to set terminal mode.



> 2023年1月12日(木) 15:36 Madison Kelly <mkelly at alteeve.com>:
> >
> > On 2023-01-12 01:26, Reid Wahl wrote:
> > > On Wed, Jan 11, 2023 at 10:21 PM Madison Kelly <mkelly at alteeve.com> wrote:
> > >>
> > >> On 2023-01-12 01:12, Reid Wahl wrote:
> > >>> On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly <mkelly at alteeve.com> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>>      There was a lot of sub-threads, so I figured it's helpful to start a
> > >>>> new thread with a summary so far. For context; I have a super simple
> > >>>> perl script that pretends to be an RA for the sake of debugging.
> > >>>>
> > >>>> https://pastebin.com/9z314TaB
> > >>>>
> > >>>>      I've had variations log environment variables and confirmed that all
> > >>>> the variables in the direct call that work are in the crm_resource
> > >>>> triggered call. There are no selinux issues logged in audit.log and
> > >>>> selinux is permissive. The script logs the real and effective UID and
> > >>>> GID and it's the same in both instances. Calling other shell programs
> > >>>> (tested with 'hostname') run fine, this is specifically crm_resource ->
> > >>>> test RA -> virsh call.
> > >>>>
> > >>>>      I ran strace on the virsh call from inside my test script (changing
> > >>>> 'virsh.good' to 'virsh.bad' between running directly and via
> > >>>> crm_resource. The strace runs made six files each time. Below are
> > >>>> pastebin links with the outputs of the six runs in one paste, but each
> > >>>> file's output is in it's own block (search for file: to see the
> > >>>> different file outputs)
> > >>>>
> > >>>> Good/direct run of the test RA:
> > >>>> - https://pastebin.com/xtqe9NSG
> > >>>>
> > >>>> Bad/crm_resource triggered run of the test RA:
> > >>>> - https://pastebin.com/vBiLVejW
> > >>>>
> > >>>> Still absolutely stumped.
> > >>>
> > >>> The strace outputs show that your bad runs are all getting stopped
> > >>> with SIGTTOU. If you've never heard of that, me either.
> > >>
> > >> The hell?! This is new to me also.
> > >>
> > >>> https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html
> > >>>
> > >>> Macro: int SIGTTOU
> > >>>
> > >>>       This is similar to SIGTTIN, but is generated when a process in a
> > >>> background job attempts to write to the terminal or set its modes.
> > >>> Again, the default action is to stop the process. SIGTTOU is only
> > >>> generated for an attempt to write to the terminal if the TOSTOP output
> > >>> mode is set; see Output Modes.
> > >>>
> > >>>
> > >>> Maybe this has something to do with the buffer settings in the perl
> > >>> script(?). It might be worth trying a version that doesn't fiddle with
> > >>> the outputs and buffer settings.
> > >>
> > >> I tried removing the $|, and then I changed the script to be entirely a
> > >> bash script, still hanging. I tried 'virsh --connect <method> list
> > >> --all' where method was qemu:///system, qemu:///session, and
> > >> ssh+qemu:///root@localhost/system, all hang. In bash or perl.
> > >>
> > >>> I don't know which difference between your environment and mine is
> > >>> relevant here, such that I can't reproduce the issue using your test
> > >>> script. It works perfectly fine for me.
> > >>>
> > >>> Can you run `stty -a | grep tostop`? If there's a minus sign
> > >>> ("-tostop"), it's disabled; if it's present without a minus sign
> > >>> ("tostop"), it's enabled, as best I can tell.
> > >>
> > >> -tostop is there
> > >>
> > >> ====
> > >> [root at mk-a07n02 ~]# stty -a | grep tostop
> > >> isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
> > >> [root at mk-a07n02 ~]#
> > >> ====
> > >>
> > >>> I'm just spitballing here. It's disabled by default on my machine...
> > >>> but even when I enable it, crm_resource --validate works fine. It may
> > >>> be set differently when running under crm_resource.
> > >>
> > >> How do you enable it?
> > >
> > > With `stty tostop`
> > >
> > > It's 100% possible that this whole thing is a red herring by the way.
> > > I'm looking for anything that might explain the discrepancy. SIGTTOU
> > > may not be directly tied to the root cause.
> >
> > Appreciate the stab, didn't stop the hang though :(
> >
> > --
> > Madison Kelly
> > Alteeve's Niche!
> > Chief Technical Officer
> > c: +1-647-471-0951
> > https://alteeve.com/
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Keisuke MORI
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list