[Pacemaker] crmd thinks lsb returns error on monito

Pavlos Parissis pavlos.parissis at gmail.com
Mon Oct 11 05:23:54 EDT 2010


On 10 October 2010 17:40, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Sun, Oct 10, 2010 at 12:47 AM, Pavlos Parissis
> <pavlos.parissis at gmail.com> wrote:
>> Hi,
>>
>> My resource is not started because I get this
>>
>> 00:44:27 crmd: [3141]: WARN: status_from_rc: Action 16
>> (pbx_02_monitor_0) on node-02 failed (target: 7 vs. rc: 5): Error
>>
>> but when I run manually the status I get 3, which ok because the
>> application is stopped
>>
>> [root at node-02 ~]# /etc/init.d/znd-pbx_02 status
>> pbx_02 is stopped
>> [root at node-02 ~]# echo $?
>> 3
>>
>> why does crm get error in this case?
>
> I imagine because when pacemaker ran it, the script didn't return 3.
>
pacemaker got 5 because the script returns 5 when the application is
not available on the system, which happens only when the fs is not
active. What actually happened in this particular case is the the
start action on fs and on the resource, which holds the application,
started on the same second. I am pretty sure that the start of the
application resource went too fast and at the time the LSB script was
executed the fs was not available, even the fs resources returned 0 on
start and on the first monitor.
This issue doesn't happen always but if I put a sleep on LSB script
for the application resource I don't run into that issue.
The resource are in group with order ip fs app.
I also removed the exit code 5 from the LSB script, it confuses the
cluster when the monitor action does place on the slave node.

Cheers,
Pavlos




More information about the Pacemaker mailing list