[Pacemaker] hangs pending
Andrey Groshev
greenx at yandex.ru
Tue Feb 25 09:30:47 UTC 2014
21.02.2014, 12:04, "Andrey Groshev" <greenx at yandex.ru>:
> 21.02.2014, 05:53, "Andrew Beekhof" <andrew at beekhof.net>:
>
>> On 19 Feb 2014, at 7:53 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>> 19.02.2014, 09:49, "Andrew Beekhof" <andrew at beekhof.net>:
>>>> On 19 Feb 2014, at 4:18 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>> 19.02.2014, 09:08, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>> On 19 Feb 2014, at 4:00 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>> 19.02.2014, 06:48, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>> On 18 Feb 2014, at 11:05 pm, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>> Hi, ALL and Andrew!
>>>>>>>>>
>>>>>>>>> Today is a good day - I killed a lot, and a lot of shooting at me.
>>>>>>>>> In general - I am happy (almost like an elephant) :)
>>>>>>>>> Except resources on the node are important to me eight processes: corosync,pacemakerd,cib,stonithd,lrmd,attrd,pengine,crmd.
>>>>>>>>> I killed them with different signals (4,6,11 and even 9).
>>>>>>>>> Behavior does not depend of number signal - it's good.
>>>>>>>>> If STONITH send reboot to the node - it rebooted and rejoined the cluster - too it's good.
>>>>>>>>> But the behavior is different from killing various demons.
>>>>>>>>>
>>>>>>>>> Turned four groups:
>>>>>>>>> 1. corosync,cib - STONITH work 100%.
>>>>>>>>> Kill via any signals - call STONITH and reboot.
>>>>>>>>>
>>>>>>>>> 2. lrmd,crmd - strange behavior STONITH.
>>>>>>>>> Sometimes called STONITH - and the corresponding reaction.
>>>>>>>>> Sometimes restart daemon and restart resources with large delay MS:pgsql.
>>>>>>>>> One time after restart crmd - pgsql don't restart.
>>>>>>>>>
>>>>>>>>> 3. stonithd,attrd,pengine - not need STONITH
>>>>>>>>> This daemons simple restart, resources - stay running.
>>>>>>>>>
>>>>>>>>> 4. pacemakerd - nothing happens.
>>>>>>>>> And then I can kill any process of the third group. They do not restart.
>>>>>>>>> Generaly don't touch corosync,cib and maybe lrmd,crmd.
>>>>>>>>>
>>>>>>>>> What do you think about this?
>>>>>>>>> The main question of this topic - we decided.
>>>>>>>>> But this varied behavior - another big problem.
>>>>>>>>>
>>>>>>>>> Forgоt logs http://send2me.ru/pcmk-Tue-18-Feb-2014.tar.bz2
>>>>>>>> Which of the various conditions above do the logs cover?
>>>>>>> All various in day.
>>>>>> Are you trying to torture me?
>>>>>> Can you give me a rough idea what happened when?
>>>>> No, there is 8 processes on the 4th signal and repeats the experiments with unknown outcome :)
>>>>> Easier to conduct new experiments and individual new logs .
>>>>> Which variant is more interesting?
>>>> The long delay in restarting pgsql.
>>>> Everything else seems correct.
>>> He even don't tried start pgsql.
>>> In Logs tree the tests.
>>> kill -s4 lrmd pid.
>>> 1. STONITH
>>> 2. STONITH
>>> 3. hangs
>> Its waiting on a value for default_ping_set
>>
>> It seems we're calling monitor for pingCheck but for some reason its not performing an update:
>>
>> # grep 2632.*lrmd.*pingCheck /Users/beekhof/Downloads/pcmk-Wed-19-Feb-2014/dev-cluster2-node2.unix.tensor.ru/corosync.log
>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: info: process_lrmd_get_rsc_info: Resource 'pingCheck' not found (3 active resources)
>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: info: process_lrmd_get_rsc_info: Resource 'pingCheck:3' not found (3 active resources)
>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: info: process_lrmd_rsc_register: Added 'pingCheck' to the rsc list (4 active resources)
>> Feb 19 10:49:58 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: log_execute: executing - rsc:pingCheck action:monitor call_id:19
>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_0:2658 - exited with rc=0
>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_0:2658:stderr [ -- empty -- ]
>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_0:2658:stdout [ -- empty -- ]
>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: log_finished: finished - rsc:pingCheck action:monitor call_id:19 pid:2658 exit-code:0 exec-time:2039ms queue-time:0ms
>> Feb 19 10:50:00 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: log_execute: executing - rsc:pingCheck action:monitor call_id:20
>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_10000:2816 - exited with rc=0
>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_10000:2816:stderr [ -- empty -- ]
>> Feb 19 10:50:02 [2632] dev-cluster2-node2.unix.tensor.ru lrmd: debug: operation_finished: pingCheck_monitor_10000:2816:stdout [ -- empty -- ]
>>
>> Could you add:
>>
>> export OCF_TRACE_RA=1
>>
>> to the top of the ping agent and retest?
>
> Today the fourth time worked.
> I even doubted if the difference is how to kill (kill -s 4 pid or pkill -4 lrmd)
> Logs http://send2me.ru/pcmk-Fri-21-Feb-2014.tar.bz2
Hi,
You haven't watched it?
>>> http://send2me.ru/pcmk-Wed-19-Feb-2014.tar.bz2
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> ,
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list