[ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.

Mon Aug 31 03:58:53 CEST 2015

> On 18 Aug 2015, at 10:54 am, renayama19661014 at ybb.ne.jp wrote:
> 
> Hi Andrew,
> 
> 
>>>>   I used the built-in SNMP.
> 
>>>>   I started as a daemon with -d option.
>>>  
>>> Is it running on both nodes or just snmp1?
> 
> 
> On both nodes.
> 
> [root at snmp1 ~]# ps -ef |grep crm_mon
> root      4923     1  0 09:42 ?        00:00:00 crm_mon -d -S 192.168.40.2 -W -p /tmp/ClusterMon-upstart.pid
> [root at snmp2 ~]# ps -ef |grep crm_mon
> root      4860     1  0 09:42 ?        00:00:00 crm_mon -d -S 192.168.40.2 -W -p /tmp/ClusterMon-upstart.pid
> 
> 
>>> Because there is no logic in crm_mon that would have remapped the monitor 
>> to 
>>> start, so my working theory is that its a duplicate of an old event.
>>> Can you tell which node the trap is being sent from?
> 
> 
> The trap is transmitted by snmp1 node.

Ok, its probably being triggered when we regenerate the status section in response to snmp2 rejoining.

We could trying detecting those mass changes and filtering them out until someone gets a chance to do proper event notifications.

> 
> The trap is not sent from the snmp2 node that rebooted.
> 
> 
> Aug 18 09:44:37 SNMP-MANAGER snmptrapd[1334]: 2015-08-18 09:44:37 snmp1 [UDP: [192.168.40.100]:59668->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1439858677) 166 days, 15:36:26.77#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "start"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
> Aug 18 09:44:37 SNMP-MANAGER snmptrapd[1334]: 2015-08-18 09:44:37 snmp1 [UDP: [192.168.40.100]:59668->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1439858677) 166 days, 15:36:26.77#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "monitor"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
> 
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> 
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
>> Cc: 
>> Date: 2015/8/17, Mon 10:05
>> Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.
>> 
>> Hi Andrew,
>> 
>> Thank you for comments.
>> 
>> 
>> I will confirm it tomorrow.
>> I am a vacation today.
>> 
>> Best Regards,
>> Hideo Yamauchi.
>> 
>> 
>> ----- Original Message -----
>>> From: Andrew Beekhof <andrew at beekhof.net>
>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to 
>> open-source clustering welcomed <users at clusterlabs.org>
>>> Cc: 
>>> Date: 2015/8/17, Mon 09:30
>>> Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been already 
>> started is transmitted.
>>> 
>>> 
>>>>   On 4 Aug 2015, at 7:36 pm, renayama19661014 at ybb.ne.jp wrote:
>>>> 
>>>>   Hi Andrew,
>>>> 
>>>>   Thank you for comments.
>>>> 
>>>>>>   However, a trap of crm_mon is sent to an SNMP manager.
>>>>>   
>>>>>   Are you using the built-in SNMP logic or using -E to give crm_mon 
>> a 
>>> script which 
>>>>>   is then producing the trap?
>>>>>   (I’m trying to figure out who could be turning the monitor action 
>> into 
>>> a start)
>>>> 
>>>> 
>>>>   I used the built-in SNMP.
>>>>   I started as a daemon with -d option.
>>> 
>>> Is it running on both nodes or just snmp1?
>>> Because there is no logic in crm_mon that would have remapped the monitor 
>> to 
>>> start, so my working theory is that its a duplicate of an old event.
>>> Can you tell which node the trap is being sent from?
>>> 
>>>> 
>>>> 
>>>>   Best Regards,
>>>>   Hideo Yamauchi.
>>>> 
>>>> 
>>>>   ----- Original Message -----
>>>>>   From: Andrew Beekhof <andrew at beekhof.net>
>>>>>   To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related 
>> to 
>>> open-source clustering welcomed <users at clusterlabs.org>
>>>>>   Cc: 
>>>>>   Date: 2015/8/4, Tue 14:15
>>>>>   Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been 
>>> already started is transmitted.
>>>>> 
>>>>> 
>>>>>>   On 27 Jul 2015, at 4:18 pm, renayama19661014 at ybb.ne.jp wrote:
>>>>>> 
>>>>>>   Hi All,
>>>>>> 
>>>>>>   The transmission of the SNMP trap of crm_mon seems to have a 
>>> problem.
>>>>>>   I identified a problem on latest Pacemaker and 
>> Pacemaker1.1.13.
>>>>>> 
>>>>>> 
>>>>>>   Step 1) I constitute a cluster and send simple CLI file.
>>>>>> 
>>>>>>   [root at snmp1 ~]# crm_mon -1 
>>>>>>   Last updated: Mon Jul 27 14:40:37 2015          Last change: 
>> Mon 
>>> Jul 27 
>>>>>   14:40:29 2015 by root via cibadmin on snmp1
>>>>>>   Stack: corosync
>>>>>>   Current DC: snmp1 (version 1.1.13-3d781d3) - partition with 
>> quorum
>>>>>>   2 nodes and 1 resource configured
>>>>>> 
>>>>>>   Online: [ snmp1 snmp2 ]
>>>>>> 
>>>>>>    prmDummy       (ocf::heartbeat:Dummy): Started snmp1
>>>>>> 
>>>>>>   Step 2) I stop a node of the standby once.
>>>>>> 
>>>>>>   [root at snmp2 ~]# stop pacemaker
>>>>>>   pacemaker stop/waiting
>>>>>> 
>>>>>> 
>>>>>>   Step 3) I start a node of the standby again.
>>>>>>   [root at snmp2 ~]# start pacemaker
>>>>>>   pacemaker start/running, process 2284
>>>>>> 
>>>>>>   Step 4) The indication of crm_mon does not change in 
>> particular.
>>>>>>   [root at snmp1 ~]# crm_mon -1
>>>>>>   Last updated: Mon Jul 27 14:45:12 2015          Last change: 
>> Mon 
>>> Jul 27 
>>>>>   14:40:29 2015 by root via cibadmin on snmp1
>>>>>>   Stack: corosync
>>>>>>   Current DC: snmp1 (version 1.1.13-3d781d3) - partition with 
>> quorum
>>>>>>   2 nodes and 1 resource configured
>>>>>> 
>>>>>>   Online: [ snmp1 snmp2 ]
>>>>>> 
>>>>>>    prmDummy       (ocf::heartbeat:Dummy): Started snmp1
>>>>>> 
>>>>>> 
>>>>>>   In addition, as for the resource that started in snmp1 node, 
>>> nothing 
>>>>>   changes.
>>>>>> 
>>>>>>   -------
>>>>>>   Jul 27 14:41:39 snmp1 crmd[29116]:   notice: State transition 
>>> S_IDLE -> 
>>>>>   S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>>>>>   origin=abort_transition_graph ]
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: Completed 
>> cib_modify 
>>> operation 
>>>>>   for section status: OK (rc=0, origin=snmp1/attrd/11, 
>> version=0.4.20)
>>>>>>   Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
>>> probe_complete: 
>>>>>   OK (0)
>>>>>>   Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
>>>>>   probe_complete[snmp1]=true: OK (0)
>>>>>>   Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
>>>>>   probe_complete[snmp2]=true: OK (0)
>>>>>>   Jul 27 14:41:39 snmp1 cib[29202]:     info: Wrote version 
>> 0.4.0 of 
>>> the CIB 
>>>>>   to disk (digest: a1f1920279fe0b1466a79cab09fa77d6)
>>>>>>   Jul 27 14:41:39 snmp1 pengine[29115]:   notice: On loss of CCM 
>> 
>>> Quorum: 
>>>>>   Ignore
>>>>>>   Jul 27 14:41:39 snmp1 pengine[29115]:     info: Node snmp2 is 
>>> online
>>>>>>   Jul 27 14:41:39 snmp1 pengine[29115]:     info: Node snmp1 is 
>>> online
>>>>>>   Jul 27 14:41:39 snmp1 pengine[29115]:     info: 
>>>>>   prmDummy#011(ocf::heartbeat:Dummy):#011Started snmp1
>>>>>>   Jul 27 14:41:39 snmp1 pengine[29115]:     info: Leave  
>>>>>   prmDummy#011(Started snmp1)
>>>>>>   -------
>>>>>> 
>>>>>>   However, a trap of crm_mon is sent to an SNMP manager.
>>>>> 
>>>>>   Are you using the built-in SNMP logic or using -E to give crm_mon 
>> a 
>>> script which 
>>>>>   is then producing the trap?
>>>>>   (I’m trying to figure out who could be turning the monitor action 
>> into 
>>> a start)
>>>>> 
>>>>>>   The resource does not reboot, but the SNMP trap which a 
>> resource 
>>> started is 
>>>>>   sent.
>>>>>> 
>>>>>>   -------
>>>>>>   Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 
>> 14:41:39 
>>> snmp1 
>>>>>   [UDP: 
>>>>> 
>>> 
>> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>> 
>>> 
>>>>>   = Timeticks: (1437975699) 166 days, 
>>> 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>>>   OID: 
>>>>> 
>>> 
>> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource 
>> 
>>> 
>>>>>   = STRING: 
>>> "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = 
>>>>>   STRING: 
>>> "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>>>   STRING: 
>>> "start"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
>>>>>   STRING: 
>>> "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = 
>>>>>   INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode 
>> = 
>>> INTEGER: 
>>>>>   0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>>>>>   Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 
>> 14:41:39 
>>> snmp1 
>>>>>   [UDP: 
>>>>> 
>>> 
>> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>> 
>>> 
>>>>>   = Timeticks: (1437975699) 166 days, 
>>> 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>>>   OID: 
>>>>> 
>>> 
>> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource 
>> 
>>> 
>>>>>   = STRING: 
>>> "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = 
>>>>>   STRING: 
>>> "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>>>   STRING: 
>>> "monitor"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
>>>>>   STRING: 
>>> "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = 
>>>>>   INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode 
>> = 
>>> INTEGER: 
>>>>>   0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>>>>>   -------
>>>>>> 
>>>>>>   A difference of CIB occurring by the start stop of the node 
>> seems 
>>> to have a 
>>>>>   problem.
>>>>>>   By this difference, crm_mon transmits an unnecessary SNMP 
>> trap.
>>>>>>   -------
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: +  /cib:  
>>> @num_updates=19
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: +  
>>>>>   /cib/status/node_state[@id='3232238190']:  
>>>>>   @crm-debug-origin=do_update_resource
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: ++ 
>>>>> 
>>> 
>> /cib/status/node_state[@id='3232238190']/lrm[@id='3232238190']/lrm_resources:  
>> 
>>> 
>>>>>   <lrm_resource id="prmDummy" type="Dummy" 
>>>>>   class="ocf" provider="heartbeat"/>
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: ++                
>>       
>>>         
>>>>>                                                      <lrm_rsc_op 
>> 
>>>>>   id="prmDummy_last_0" 
>>> operation_key="prmDummy_monitor_0" 
>>>>>   operation="monitor" 
>>> crm-debug-origin="do_update_resource" 
>>>>>   crm_feature_set="3.0.10" 
>>>>>   
>> transition-key="6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787" 
>>>>> 
>>> transition-magic="0:7;6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787" 
>> 
>>>>>   on_node="snmp2" call-id="5" 
>> rc-code="7" 
>>>>>   op-status="0" interval="0" 
>>> last-run="1437975699" 
>>>>>   last-rc-change="1437975699" exec-time="18" 
>> queue-ti
>>>>>>   Jul 27 14:41:39 snmp1 cib[29111]:     info: ++                
>>       
>>>         
>>>>>                                                    
>> </lrm_resource>
>>>>>>   -------
>>>>>> 
>>>>>>   I registered this problem with Bugzilla.
>>>>>>    * http://bugs.clusterlabs.org/show_bug.cgi?id=5245
>>>>>>     * The log attached it to Bugzilla.
>>>>>> 
>>>>>>   Best Regards,
>>>>>>   Hideo Yamauchi.
>>>>>> 
>>>>>>   _______________________________________________
>>>>>>   Users mailing list: Users at clusterlabs.org
>>>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>>> 
>>>>>>   Project Home: http://www.clusterlabs.org
>>>>>>   Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>   Bugs: http://bugs.clusterlabs.org
>>>>> 
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org
>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org