[ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)

Madison Kelly mkelly at alteeve.com
Wed Jan 11 02:31:37 EST 2023



On January 11, 2023 2:26:57 a.m. EST, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>> Madison Kelly <mkelly at alteeve.com> schrieb am 11.01.2023 um 06:21 in Nachricht
><74df2c8e-1cff-ba07-7f4a-070be296b1fb at alteeve.com>:
>> On 2023-01-11 00:14, Madison Kelly wrote:
>>> Hi all,
>>> 
>>> Edit: Last message was in HTML format, sorry about that.
>>> 
>>>    I've got a hell of a weird problem, and I am absolutely stumped on 
>>> what's going on.
>>> 
>>>    The short of it is; if my RA is called from the command line, it's 
>>> fine. If a resource exists, monitor, enable, disable, all that stuff 
>>> works just fine. If I try to create a resource, it hangs on the validate 
>>> stage. Specifically, it hangs when 'pcs' calls:
>>> 
>>> crm_resource --validate --output-as xml --class ocf --agent server 
>>> --provider alteeve --option name=<resource_name>
>>> 
>>>    Specifically, it hangs when it tries to make a shell call (to virsh, 
>>> specifically, but that doesn't matter). So to debug, I started stripping 
>>> down my RA simpler and simpler until I was left with the very most basic 
>>> of programs;
>>> 
>>> https://pastebin.com/VtSpkwMr 
>>> 
>>>    That is literally the simplest program I could write that made the 
>>> shell call. The 'open()' call is where it hangs.
>>> 
>>> When I call directly;
>>> 
>>> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
>>> srv04-test; echo rc:$?
>>> 
>>> ====
>>> real    0m0.061s
>>> user    0m0.037s
>>> sys    0m0.014s
>>> rc:0
>>> ====
>>> 
>>> It's just fine. I can see in the log the output from the 'virsh' call as 
>>> well. However, when I call from crm_resource;
>>> 
>>> time crm_resource --validate --output-as xml --class ocf --agent server 
>>> --provider alteeve --option name=srv04-test; echo rc:$?
>>> 
>>> ====
>>> <pacemaker-result api-version="2.25" request="crm_resource --validate 
>>> --output-as xml --class ocf --agent server --provider alteeve --option 
>>> name=srv04-test">
>>>    <resource-agent-action action="validate" class="ocf" type="server" 
>>> provider="alteeve">
>>>      <overrides/>
>>>      <agent-status code="1" message="error" execution_code="2" 
>>> execution_message="Timed Out" reason="Resource agent did not exit within 
>>> specified timeout"/>
>>>    </resource-agent-action>
>>>    <status code="1" message="Error occurred">
>>>      <errors>
>>>        <error>crm_resource: Error performing operation: Error 
>>> occurred</error>
>>>      </errors>
>>>    </status>
>>> </pacemaker-result>
>>> 
>>> real    0m20.521s
>>> user    0m0.022s
>>> sys    0m0.010s
>>> rc:1
>>> ====
>>> 
>>> In the log file, I see (from line 20 of the super-simple-test-script):
>>> 
>>> ====
>>> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
>>> /usr/bin/echo return_code:0 |]
>>> ====
>
>In VirtualDomain RA I found a similar command (assuming that works):
> virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} >
> ${CFGTMP}
>
>virsh is somewhat strange; libvirtd is running, right?

Yes, I can call the RA directly, then immediately call crm_resource, or reverse order, always the same results.

Again, same calls work fine when enabling, disabling, etc. So weird...

>>> 
>>> Then nothing else.
>>> 
>>> The strace output is: https://pastebin.com/raw/UCEUdBeP 
>>> 
>>> Environment;
>>> 
>>> * selinux is permissive
>>> * Pacemaker 2.1.5-4.el8
>>> * pcs 0.10.15
>>> * 4.18.0-408.el8.x86_64
>>> * CentOS Stream release 8
>>> 
>>> Any help is appreciated, I am stumped. :/
>> 
>> After sending this, I tried having my "RA" call 'hostname', and that 
>> worked fine. I switched back to 'virsh list --all', and that hangs. So 
>> it seems to somehow be related to call 'virsh' specifically.
>> 
>> -- 
>> Madison Kelly
>> Alteeve's Niche!
>> Chief Technical Officer
>> c: +1-647-471-0951
>> https://alteeve.com/ 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>
>
>
>
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list