[ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.
Andrew Beekhof
andrew at beekhof.net
Mon Aug 17 02:30:06 CEST 2015
> On 4 Aug 2015, at 7:36 pm, renayama19661014 at ybb.ne.jp wrote:
>
> Hi Andrew,
>
> Thank you for comments.
>
>>> However, a trap of crm_mon is sent to an SNMP manager.
>>
>> Are you using the built-in SNMP logic or using -E to give crm_mon a script which
>> is then producing the trap?
>> (I’m trying to figure out who could be turning the monitor action into a start)
>
>
> I used the built-in SNMP.
> I started as a daemon with -d option.
Is it running on both nodes or just snmp1?
Because there is no logic in crm_mon that would have remapped the monitor to start, so my working theory is that its a duplicate of an old event.
Can you tell which node the trap is being sent from?
>
>
> Best Regards,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Andrew Beekhof <andrew at beekhof.net>
>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
>> Cc:
>> Date: 2015/8/4, Tue 14:15
>> Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.
>>
>>
>>> On 27 Jul 2015, at 4:18 pm, renayama19661014 at ybb.ne.jp wrote:
>>>
>>> Hi All,
>>>
>>> The transmission of the SNMP trap of crm_mon seems to have a problem.
>>> I identified a problem on latest Pacemaker and Pacemaker1.1.13.
>>>
>>>
>>> Step 1) I constitute a cluster and send simple CLI file.
>>>
>>> [root at snmp1 ~]# crm_mon -1
>>> Last updated: Mon Jul 27 14:40:37 2015 Last change: Mon Jul 27
>> 14:40:29 2015 by root via cibadmin on snmp1
>>> Stack: corosync
>>> Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
>>> 2 nodes and 1 resource configured
>>>
>>> Online: [ snmp1 snmp2 ]
>>>
>>> prmDummy (ocf::heartbeat:Dummy): Started snmp1
>>>
>>> Step 2) I stop a node of the standby once.
>>>
>>> [root at snmp2 ~]# stop pacemaker
>>> pacemaker stop/waiting
>>>
>>>
>>> Step 3) I start a node of the standby again.
>>> [root at snmp2 ~]# start pacemaker
>>> pacemaker start/running, process 2284
>>>
>>> Step 4) The indication of crm_mon does not change in particular.
>>> [root at snmp1 ~]# crm_mon -1
>>> Last updated: Mon Jul 27 14:45:12 2015 Last change: Mon Jul 27
>> 14:40:29 2015 by root via cibadmin on snmp1
>>> Stack: corosync
>>> Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
>>> 2 nodes and 1 resource configured
>>>
>>> Online: [ snmp1 snmp2 ]
>>>
>>> prmDummy (ocf::heartbeat:Dummy): Started snmp1
>>>
>>>
>>> In addition, as for the resource that started in snmp1 node, nothing
>> changes.
>>>
>>> -------
>>> Jul 27 14:41:39 snmp1 crmd[29116]: notice: State transition S_IDLE ->
>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
>> origin=abort_transition_graph ]
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: Completed cib_modify operation
>> for section status: OK (rc=0, origin=snmp1/attrd/11, version=0.4.20)
>>> Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for probe_complete:
>> OK (0)
>>> Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for
>> probe_complete[snmp1]=true: OK (0)
>>> Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for
>> probe_complete[snmp2]=true: OK (0)
>>> Jul 27 14:41:39 snmp1 cib[29202]: info: Wrote version 0.4.0 of the CIB
>> to disk (digest: a1f1920279fe0b1466a79cab09fa77d6)
>>> Jul 27 14:41:39 snmp1 pengine[29115]: notice: On loss of CCM Quorum:
>> Ignore
>>> Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp2 is online
>>> Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp1 is online
>>> Jul 27 14:41:39 snmp1 pengine[29115]: info:
>> prmDummy#011(ocf::heartbeat:Dummy):#011Started snmp1
>>> Jul 27 14:41:39 snmp1 pengine[29115]: info: Leave
>> prmDummy#011(Started snmp1)
>>> -------
>>>
>>> However, a trap of crm_mon is sent to an SNMP manager.
>>
>> Are you using the built-in SNMP logic or using -E to give crm_mon a script which
>> is then producing the trap?
>> (I’m trying to figure out who could be turning the monitor action into a start)
>>
>>> The resource does not reboot, but the SNMP trap which a resource started is
>> sent.
>>>
>>> -------
>>> Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1
>> [UDP:
>> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
>> = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 =
>> OID:
>> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource
>> = STRING: "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode =
>> STRING: "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation =
>> STRING: "start"#011PACEMAKER-MIB::pacemakerNotificationDescription =
>> STRING: "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode =
>> INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER:
>> 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>> Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1
>> [UDP:
>> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
>> = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 =
>> OID:
>> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource
>> = STRING: "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode =
>> STRING: "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation =
>> STRING: "monitor"#011PACEMAKER-MIB::pacemakerNotificationDescription =
>> STRING: "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode =
>> INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER:
>> 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>> -------
>>>
>>> A difference of CIB occurring by the start stop of the node seems to have a
>> problem.
>>> By this difference, crm_mon transmits an unnecessary SNMP trap.
>>> -------
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: + /cib: @num_updates=19
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: +
>> /cib/status/node_state[@id='3232238190']:
>> @crm-debug-origin=do_update_resource
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: ++
>> /cib/status/node_state[@id='3232238190']/lrm[@id='3232238190']/lrm_resources:
>> <lrm_resource id="prmDummy" type="Dummy"
>> class="ocf" provider="heartbeat"/>
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: ++
>> <lrm_rsc_op
>> id="prmDummy_last_0" operation_key="prmDummy_monitor_0"
>> operation="monitor" crm-debug-origin="do_update_resource"
>> crm_feature_set="3.0.10"
>> transition-key="6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787"
>> transition-magic="0:7;6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787"
>> on_node="snmp2" call-id="5" rc-code="7"
>> op-status="0" interval="0" last-run="1437975699"
>> last-rc-change="1437975699" exec-time="18" queue-ti
>>> Jul 27 14:41:39 snmp1 cib[29111]: info: ++
>> </lrm_resource>
>>> -------
>>>
>>> I registered this problem with Bugzilla.
>>> * http://bugs.clusterlabs.org/show_bug.cgi?id=5245
>>> * The log attached it to Bugzilla.
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list