[ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.

Mon Aug 17 03:05:15 CEST 2015

Hi Andrew,

Thank you for comments.

I will confirm it tomorrow.
I am a vacation today.

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: Andrew Beekhof <andrew at beekhof.net>
> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc: 
> Date: 2015/8/17, Mon 09:30
> Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.
> 
> 
>>  On 4 Aug 2015, at 7:36 pm, renayama19661014 at ybb.ne.jp wrote:
>> 
>>  Hi Andrew,
>> 
>>  Thank you for comments.
>> 
>>>>  However, a trap of crm_mon is sent to an SNMP manager.
>>>   
>>>  Are you using the built-in SNMP logic or using -E to give crm_mon a 
> script which 
>>>  is then producing the trap?
>>>  (I’m trying to figure out who could be turning the monitor action into 
> a start)
>> 
>> 
>>  I used the built-in SNMP.
>>  I started as a daemon with -d option.
> 
> Is it running on both nodes or just snmp1?
> Because there is no logic in crm_mon that would have remapped the monitor to 
> start, so my working theory is that its a duplicate of an old event.
> Can you tell which node the trap is being sent from?
> 
>> 
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>>  ----- Original Message -----
>>>  From: Andrew Beekhof <andrew at beekhof.net>
>>>  To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to 
> open-source clustering welcomed <users at clusterlabs.org>
>>>  Cc: 
>>>  Date: 2015/8/4, Tue 14:15
>>>  Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been 
> already started is transmitted.
>>> 
>>> 
>>>>  On 27 Jul 2015, at 4:18 pm, renayama19661014 at ybb.ne.jp wrote:
>>>> 
>>>>  Hi All,
>>>> 
>>>>  The transmission of the SNMP trap of crm_mon seems to have a 
> problem.
>>>>  I identified a problem on latest Pacemaker and Pacemaker1.1.13.
>>>> 
>>>> 
>>>>  Step 1) I constitute a cluster and send simple CLI file.
>>>> 
>>>>  [root at snmp1 ~]# crm_mon -1 
>>>>  Last updated: Mon Jul 27 14:40:37 2015          Last change: Mon 
> Jul 27 
>>>  14:40:29 2015 by root via cibadmin on snmp1
>>>>  Stack: corosync
>>>>  Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
>>>>  2 nodes and 1 resource configured
>>>> 
>>>>  Online: [ snmp1 snmp2 ]
>>>> 
>>>>    prmDummy       (ocf::heartbeat:Dummy): Started snmp1
>>>> 
>>>>  Step 2) I stop a node of the standby once.
>>>> 
>>>>  [root at snmp2 ~]# stop pacemaker
>>>>  pacemaker stop/waiting
>>>> 
>>>> 
>>>>  Step 3) I start a node of the standby again.
>>>>  [root at snmp2 ~]# start pacemaker
>>>>  pacemaker start/running, process 2284
>>>> 
>>>>  Step 4) The indication of crm_mon does not change in particular.
>>>>  [root at snmp1 ~]# crm_mon -1
>>>>  Last updated: Mon Jul 27 14:45:12 2015          Last change: Mon 
> Jul 27 
>>>  14:40:29 2015 by root via cibadmin on snmp1
>>>>  Stack: corosync
>>>>  Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
>>>>  2 nodes and 1 resource configured
>>>> 
>>>>  Online: [ snmp1 snmp2 ]
>>>> 
>>>>    prmDummy       (ocf::heartbeat:Dummy): Started snmp1
>>>> 
>>>> 
>>>>  In addition, as for the resource that started in snmp1 node, 
> nothing 
>>>  changes.
>>>> 
>>>>  -------
>>>>  Jul 27 14:41:39 snmp1 crmd[29116]:   notice: State transition 
> S_IDLE -> 
>>>  S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>>>  origin=abort_transition_graph ]
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: Completed cib_modify 
> operation 
>>>  for section status: OK (rc=0, origin=snmp1/attrd/11, version=0.4.20)
>>>>  Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
> probe_complete: 
>>>  OK (0)
>>>>  Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
>>>  probe_complete[snmp1]=true: OK (0)
>>>>  Jul 27 14:41:39 snmp1 attrd[29114]:     info: Update 11 for 
>>>  probe_complete[snmp2]=true: OK (0)
>>>>  Jul 27 14:41:39 snmp1 cib[29202]:     info: Wrote version 0.4.0 of 
> the CIB 
>>>  to disk (digest: a1f1920279fe0b1466a79cab09fa77d6)
>>>>  Jul 27 14:41:39 snmp1 pengine[29115]:   notice: On loss of CCM 
> Quorum: 
>>>  Ignore
>>>>  Jul 27 14:41:39 snmp1 pengine[29115]:     info: Node snmp2 is 
> online
>>>>  Jul 27 14:41:39 snmp1 pengine[29115]:     info: Node snmp1 is 
> online
>>>>  Jul 27 14:41:39 snmp1 pengine[29115]:     info: 
>>>  prmDummy#011(ocf::heartbeat:Dummy):#011Started snmp1
>>>>  Jul 27 14:41:39 snmp1 pengine[29115]:     info: Leave  
>>>  prmDummy#011(Started snmp1)
>>>>  -------
>>>> 
>>>>  However, a trap of crm_mon is sent to an SNMP manager.
>>> 
>>>  Are you using the built-in SNMP logic or using -E to give crm_mon a 
> script which 
>>>  is then producing the trap?
>>>  (I’m trying to figure out who could be turning the monitor action into 
> a start)
>>> 
>>>>  The resource does not reboot, but the SNMP trap which a resource 
> started is 
>>>  sent.
>>>> 
>>>>  -------
>>>>  Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 
> snmp1 
>>>  [UDP: 
>>> 
> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>>  = Timeticks: (1437975699) 166 days, 
> 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource 
> 
>>>  = STRING: 
> "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = 
>>>  STRING: 
> "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: 
> "start"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
>>>  STRING: 
> "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = 
>>>  INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>>>  Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 
> snmp1 
>>>  [UDP: 
>>> 
> [192.168.40.100]:35265->[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>>  = Timeticks: (1437975699) 166 days, 
> 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource 
> 
>>>  = STRING: 
> "prmDummy"#011PACEMAKER-MIB::pacemakerNotificationNode = 
>>>  STRING: 
> "snmp1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: 
> "monitor"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
>>>  STRING: 
> "OK"#011PACEMAKER-MIB::pacemakerNotificationReturnCode = 
>>>  INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
>>>>  -------
>>>> 
>>>>  A difference of CIB occurring by the start stop of the node seems 
> to have a 
>>>  problem.
>>>>  By this difference, crm_mon transmits an unnecessary SNMP trap.
>>>>  -------
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: +  /cib:  
> @num_updates=19
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: +  
>>>  /cib/status/node_state[@id='3232238190']:  
>>>  @crm-debug-origin=do_update_resource
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: ++ 
>>> 
> /cib/status/node_state[@id='3232238190']/lrm[@id='3232238190']/lrm_resources:  
> 
>>>  <lrm_resource id="prmDummy" type="Dummy" 
>>>  class="ocf" provider="heartbeat"/>
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: ++                      
>         
>>>                                                      <lrm_rsc_op 
>>>  id="prmDummy_last_0" 
> operation_key="prmDummy_monitor_0" 
>>>  operation="monitor" 
> crm-debug-origin="do_update_resource" 
>>>  crm_feature_set="3.0.10" 
>>>  transition-key="6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787" 
>>> 
> transition-magic="0:7;6:6:7:34187f48-1f81-49c8-846e-ff3ed4c8f787" 
>>>  on_node="snmp2" call-id="5" rc-code="7" 
>>>  op-status="0" interval="0" 
> last-run="1437975699" 
>>>  last-rc-change="1437975699" exec-time="18" queue-ti
>>>>  Jul 27 14:41:39 snmp1 cib[29111]:     info: ++                      
>         
>>>                                                    </lrm_resource>
>>>>  -------
>>>> 
>>>>  I registered this problem with Bugzilla.
>>>>    * http://bugs.clusterlabs.org/show_bug.cgi?id=5245
>>>>     * The log attached it to Bugzilla.
>>>> 
>>>>  Best Regards,
>>>>  Hideo Yamauchi.
>>>> 
>>>>  _______________________________________________
>>>>  Users mailing list: Users at clusterlabs.org
>>>>  http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>  Project Home: http://www.clusterlabs.org
>>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>  Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>