[ClusterLabs] RA hangs when called by crm_resource (resending text format)

Madison Kelly mkelly at alteeve.com
Wed Jan 11 00:21:25 EST 2023


On 2023-01-11 00:14, Madison Kelly wrote:
> Hi all,
> 
> Edit: Last message was in HTML format, sorry about that.
> 
>    I've got a hell of a weird problem, and I am absolutely stumped on 
> what's going on.
> 
>    The short of it is; if my RA is called from the command line, it's 
> fine. If a resource exists, monitor, enable, disable, all that stuff 
> works just fine. If I try to create a resource, it hangs on the validate 
> stage. Specifically, it hangs when 'pcs' calls:
> 
> crm_resource --validate --output-as xml --class ocf --agent server 
> --provider alteeve --option name=<resource_name>
> 
>    Specifically, it hangs when it tries to make a shell call (to virsh, 
> specifically, but that doesn't matter). So to debug, I started stripping 
> down my RA simpler and simpler until I was left with the very most basic 
> of programs;
> 
> https://pastebin.com/VtSpkwMr
> 
>    That is literally the simplest program I could write that made the 
> shell call. The 'open()' call is where it hangs.
> 
> When I call directly;
> 
> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
> srv04-test; echo rc:$?
> 
> ====
> real    0m0.061s
> user    0m0.037s
> sys    0m0.014s
> rc:0
> ====
> 
> It's just fine. I can see in the log the output from the 'virsh' call as 
> well. However, when I call from crm_resource;
> 
> time crm_resource --validate --output-as xml --class ocf --agent server 
> --provider alteeve --option name=srv04-test; echo rc:$?
> 
> ====
> <pacemaker-result api-version="2.25" request="crm_resource --validate 
> --output-as xml --class ocf --agent server --provider alteeve --option 
> name=srv04-test">
>    <resource-agent-action action="validate" class="ocf" type="server" 
> provider="alteeve">
>      <overrides/>
>      <agent-status code="1" message="error" execution_code="2" 
> execution_message="Timed Out" reason="Resource agent did not exit within 
> specified timeout"/>
>    </resource-agent-action>
>    <status code="1" message="Error occurred">
>      <errors>
>        <error>crm_resource: Error performing operation: Error 
> occurred</error>
>      </errors>
>    </status>
> </pacemaker-result>
> 
> real    0m20.521s
> user    0m0.022s
> sys    0m0.010s
> rc:1
> ====
> 
> In the log file, I see (from line 20 of the super-simple-test-script):
> 
> ====
> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
> /usr/bin/echo return_code:0 |]
> ====
> 
> Then nothing else.
> 
> The strace output is: https://pastebin.com/raw/UCEUdBeP
> 
> Environment;
> 
> * selinux is permissive
> * Pacemaker 2.1.5-4.el8
> * pcs 0.10.15
> * 4.18.0-408.el8.x86_64
> * CentOS Stream release 8
> 
> Any help is appreciated, I am stumped. :/

After sending this, I tried having my "RA" call 'hostname', and that 
worked fine. I switched back to 'virsh list --all', and that hangs. So 
it seems to somehow be related to call 'virsh' specifically.

-- 
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/



More information about the Users mailing list