[ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Jan 11 02:26:57 EST 2023


>>> Madison Kelly <mkelly at alteeve.com> schrieb am 11.01.2023 um 06:21 in Nachricht
<74df2c8e-1cff-ba07-7f4a-070be296b1fb at alteeve.com>:
> On 2023-01-11 00:14, Madison Kelly wrote:
>> Hi all,
>> 
>> Edit: Last message was in HTML format, sorry about that.
>> 
>>    I've got a hell of a weird problem, and I am absolutely stumped on 
>> what's going on.
>> 
>>    The short of it is; if my RA is called from the command line, it's 
>> fine. If a resource exists, monitor, enable, disable, all that stuff 
>> works just fine. If I try to create a resource, it hangs on the validate 
>> stage. Specifically, it hangs when 'pcs' calls:
>> 
>> crm_resource --validate --output-as xml --class ocf --agent server 
>> --provider alteeve --option name=<resource_name>
>> 
>>    Specifically, it hangs when it tries to make a shell call (to virsh, 
>> specifically, but that doesn't matter). So to debug, I started stripping 
>> down my RA simpler and simpler until I was left with the very most basic 
>> of programs;
>> 
>> https://pastebin.com/VtSpkwMr 
>> 
>>    That is literally the simplest program I could write that made the 
>> shell call. The 'open()' call is where it hangs.
>> 
>> When I call directly;
>> 
>> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
>> srv04-test; echo rc:$?
>> 
>> ====
>> real    0m0.061s
>> user    0m0.037s
>> sys    0m0.014s
>> rc:0
>> ====
>> 
>> It's just fine. I can see in the log the output from the 'virsh' call as 
>> well. However, when I call from crm_resource;
>> 
>> time crm_resource --validate --output-as xml --class ocf --agent server 
>> --provider alteeve --option name=srv04-test; echo rc:$?
>> 
>> ====
>> <pacemaker-result api-version="2.25" request="crm_resource --validate 
>> --output-as xml --class ocf --agent server --provider alteeve --option 
>> name=srv04-test">
>>    <resource-agent-action action="validate" class="ocf" type="server" 
>> provider="alteeve">
>>      <overrides/>
>>      <agent-status code="1" message="error" execution_code="2" 
>> execution_message="Timed Out" reason="Resource agent did not exit within 
>> specified timeout"/>
>>    </resource-agent-action>
>>    <status code="1" message="Error occurred">
>>      <errors>
>>        <error>crm_resource: Error performing operation: Error 
>> occurred</error>
>>      </errors>
>>    </status>
>> </pacemaker-result>
>> 
>> real    0m20.521s
>> user    0m0.022s
>> sys    0m0.010s
>> rc:1
>> ====
>> 
>> In the log file, I see (from line 20 of the super-simple-test-script):
>> 
>> ====
>> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
>> /usr/bin/echo return_code:0 |]
>> ====

In VirtualDomain RA I found a similar command (assuming that works):
 virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} >
 ${CFGTMP}

virsh is somewhat strange; libvirtd is running, right?

>> 
>> Then nothing else.
>> 
>> The strace output is: https://pastebin.com/raw/UCEUdBeP 
>> 
>> Environment;
>> 
>> * selinux is permissive
>> * Pacemaker 2.1.5-4.el8
>> * pcs 0.10.15
>> * 4.18.0-408.el8.x86_64
>> * CentOS Stream release 8
>> 
>> Any help is appreciated, I am stumped. :/
> 
> After sending this, I tried having my "RA" call 'hostname', and that 
> worked fine. I switched back to 'virsh list --all', and that hangs. So 
> it seems to somehow be related to call 'virsh' specifically.
> 
> -- 
> Madison Kelly
> Alteeve's Niche!
> Chief Technical Officer
> c: +1-647-471-0951
> https://alteeve.com/ 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 






More information about the Users mailing list