[ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

Madison Kelly mkelly at alteeve.com
Thu Jan 12 20:25:28 EST 2023


On 2023-01-12 04:50, Keisuke MORI wrote:
> Hi,
> 
> Just a guess but could it be the same issue with this?
> 
> https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background

That was exactly what it was! Bandini linked the same thing last night. 
I fixed it by calling 'setsid --wait virsh <whatever>'.

Thanks!

> 2023年1月12日(木) 15:36 Madison Kelly <mkelly at alteeve.com>:
>>
>> On 2023-01-12 01:26, Reid Wahl wrote:
>>> On Wed, Jan 11, 2023 at 10:21 PM Madison Kelly <mkelly at alteeve.com> wrote:
>>>>
>>>> On 2023-01-12 01:12, Reid Wahl wrote:
>>>>> On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly <mkelly at alteeve.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>       There was a lot of sub-threads, so I figured it's helpful to start a
>>>>>> new thread with a summary so far. For context; I have a super simple
>>>>>> perl script that pretends to be an RA for the sake of debugging.
>>>>>>
>>>>>> https://pastebin.com/9z314TaB
>>>>>>
>>>>>>       I've had variations log environment variables and confirmed that all
>>>>>> the variables in the direct call that work are in the crm_resource
>>>>>> triggered call. There are no selinux issues logged in audit.log and
>>>>>> selinux is permissive. The script logs the real and effective UID and
>>>>>> GID and it's the same in both instances. Calling other shell programs
>>>>>> (tested with 'hostname') run fine, this is specifically crm_resource ->
>>>>>> test RA -> virsh call.
>>>>>>
>>>>>>       I ran strace on the virsh call from inside my test script (changing
>>>>>> 'virsh.good' to 'virsh.bad' between running directly and via
>>>>>> crm_resource. The strace runs made six files each time. Below are
>>>>>> pastebin links with the outputs of the six runs in one paste, but each
>>>>>> file's output is in it's own block (search for file: to see the
>>>>>> different file outputs)
>>>>>>
>>>>>> Good/direct run of the test RA:
>>>>>> - https://pastebin.com/xtqe9NSG
>>>>>>
>>>>>> Bad/crm_resource triggered run of the test RA:
>>>>>> - https://pastebin.com/vBiLVejW
>>>>>>
>>>>>> Still absolutely stumped.
>>>>>
>>>>> The strace outputs show that your bad runs are all getting stopped
>>>>> with SIGTTOU. If you've never heard of that, me either.
>>>>
>>>> The hell?! This is new to me also.
>>>>
>>>>> https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html
>>>>>
>>>>> Macro: int SIGTTOU
>>>>>
>>>>>        This is similar to SIGTTIN, but is generated when a process in a
>>>>> background job attempts to write to the terminal or set its modes.
>>>>> Again, the default action is to stop the process. SIGTTOU is only
>>>>> generated for an attempt to write to the terminal if the TOSTOP output
>>>>> mode is set; see Output Modes.
>>>>>
>>>>>
>>>>> Maybe this has something to do with the buffer settings in the perl
>>>>> script(?). It might be worth trying a version that doesn't fiddle with
>>>>> the outputs and buffer settings.
>>>>
>>>> I tried removing the $|, and then I changed the script to be entirely a
>>>> bash script, still hanging. I tried 'virsh --connect <method> list
>>>> --all' where method was qemu:///system, qemu:///session, and
>>>> ssh+qemu:///root@localhost/system, all hang. In bash or perl.
>>>>
>>>>> I don't know which difference between your environment and mine is
>>>>> relevant here, such that I can't reproduce the issue using your test
>>>>> script. It works perfectly fine for me.
>>>>>
>>>>> Can you run `stty -a | grep tostop`? If there's a minus sign
>>>>> ("-tostop"), it's disabled; if it's present without a minus sign
>>>>> ("tostop"), it's enabled, as best I can tell.
>>>>
>>>> -tostop is there
>>>>
>>>> ====
>>>> [root at mk-a07n02 ~]# stty -a | grep tostop
>>>> isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
>>>> [root at mk-a07n02 ~]#
>>>> ====
>>>>
>>>>> I'm just spitballing here. It's disabled by default on my machine...
>>>>> but even when I enable it, crm_resource --validate works fine. It may
>>>>> be set differently when running under crm_resource.
>>>>
>>>> How do you enable it?
>>>
>>> With `stty tostop`
>>>
>>> It's 100% possible that this whole thing is a red herring by the way.
>>> I'm looking for anything that might explain the discrepancy. SIGTTOU
>>> may not be directly tied to the root cause.
>>
>> Appreciate the stab, didn't stop the hang though :(
>>
>> --
>> Madison Kelly
>> Alteeve's Niche!
>> Chief Technical Officer
>> c: +1-647-471-0951
>> https://alteeve.com/
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
> 
> 
> 

-- 
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/



More information about the Users mailing list