[Pacemaker] strange error

Tue Jul 29 19:22:15 EDT 2014

On 30 Jul 2014, at 2:37 am, divinesecret <arvydas at artogama.lt> wrote:

> No dhcp.
> no nm.
> 
> Somehow findif fails to find eth1 at random times (exactly eth1, while there are resources with eth2,eth3 with no such problem)
> 
> any ideas?

IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up

^^^ does that imply that the agent may also take it down under some conditions?
perhaps look through the agent to see when that might happen and if it could be happening in your cluster.

> 
> 2014-07-10 01:26, Andrew Beekhof rašė:
>> Is NetworkManager present?  Using dhcp for that interface?
>> On 9 Jul 2014, at 7:03 pm, divinesecret <arvydas at artogama.lt> wrote:
>>> Hi,
>>> just wanted to ask maybe someone encountered such situation.
>>> suddenly cluster fails:
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: Unknown interface [eth1] No such device.
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: [findif] failed
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: process_lrm_event: LRM operation extVip51_monitor_20000 (call=57, rc=6, cib-update=2151, confirmed=false) not configured
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: rc=6 (update=value++, time=1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-extVip51 (1)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_perform_update: Sent update 42: fail-count-extVip51=1
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-extVip51 (1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:    error: unpack_rsc_op: Preventing extVip51 from re-starting anywhere in the cluster : operation monitor failed 'not configured' (rc=6)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: Processing failed op monitor for extVip51 on sdcsispprxfe1: not configured (6)
>>> restart was issued and then:
>>> IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up
>>> ....
>>> Version: 1.1.10-14.el6_5.3-368c726
>>> centos 6.5
>>> (other logs don't show eth1 going down or sthing similar)
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140730/5945e5c6/attachment-0003.sig>