[ClusterLabs] Coming in 1.1.15: Event-driven alerts

Klaus Wenninger kwenning at redhat.com
Wed May 11 05:13:03 UTC 2016


On 05/10/2016 11:19 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
> After all our member needs the control of the turn of the transmission of the SNMP trap.
>
> We make a patch of the control of the turn of the transmission and intend to send it.
>
> Probably, with the patch, we add the "ordered" attribute that we sent by an email before.
Actually I still don't think that simple serialization of the calling of
the snmptrap-tool
is a good solution to tackle the problem of loosing the order of traps
arriving at
some management station:

- makes things worse in case of traps coming from multiple nodes
- doesn't help when the order is lost on the network.

Anyway I see 2 other scenarios where a certain degree of serialization might
be helpful:

- alert agent-scripts that can't handle being called concurrently
- performance issues that might arise on some systems that lack the
  performance-headroom needed and/or the agent-scripts in place
  require significant effort and/or there are a lot of resources/events
  that trigger a vast amount of alerts being handled in parallel

So I could imagine the introduction of a meta-atribute that specifies a
queue
to be used for serialization.

- 'none' is default and leads to the behavior we have at the moment.
- any other queue-name leads to the instantiation of an additional queue

This approach should allow merely any kind of serialization you can think of
with as little impact as needed.
e.g. if the agent doesn't cope with concurrent calls you use a queue per
agent leading to all recipients being handled in a serialized way (and of
course the different alerts as well). And all the other agents are running
in parallel.
e.g. you can have a separate queue for a single recipient leading to
the alerts being sent there being serialized.
e.g. if the performance impact should be kept at a minimal level you
would use a single queue for all agents and all recipients 
 
>
>
> Best Regards,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> To: "kwenning at redhat.com" <kwenning at redhat.com>; "users at clusterlabs.org" <users at clusterlabs.org>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
>> Cc: 
>> Date: 2016/4/28, Thu 22:43
>> Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
>>
>> Hi Klaus,
>>
>> Because the script is performed the effectiveness of in async, I think that it 
>> is difficult to set "uptime" by the method of the sample.
>> After all we may request the transmission of the order.
>> #The patch before mine only controls a practice turn of the async and is not a 
>> thing giving load of crmd.
>>
>> Japan begins a rest for one week from tomorrow.
>> I discuss after vacation with a member.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> ----- Original Message -----
>>>  From: Klaus Wenninger <kwenning at redhat.com>
>>>  To: users at clusterlabs.org
>>>  Cc: 
>>>  Date: 2016/4/28, Thu 03:14
>>>  Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
>>>
>>>  On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>   Hi All,
>>>>
>>>>   We have a request for a new SNMP function.
>>>>
>>>>
>>>>   The order of traps is not right.
>>>>
>>>>   The turn of the trap is not sometimes followed.
>>>>   This is because the handling of notice carries out "path" in 
>>>  async.
>>>>   I think that it is necessary to wait for completion of the practice at 
>>>  "path" unit of "alerts".
>>>>    
>>>>   The turn of the trap is different from the real stop order of the 
>> resource.
>>>  Writing the alerts in a local list and having the alert-scripts called
>>>  in a serialized manner
>>>  would lead to the snmptrap-tool creating timestamps in the order of the
>>>  occurrence 
>>>  of the alerts.
>>>  Having the snmp-manager order the traps by timestamp this would indeed
>>>  lead to
>>>  seeing them in the order they had occured.
>>>
>>>  But this approach has a number of drawbacks:
>>>
>>>  - it works just when the traps are coming from one node as there is no
>>>  way to serialize
>>>    over nodes - at least none that would work under all circumstances we
>>>  want alerts
>>>    to be delivered
>>>
>>>  - it distorts the timestamps created even more from the points in time
>>>  when the
>>>    alert had been triggered - making the result in a multi-node-scenario
>>>  even worse and
>>>    making it hard to correlate with other sources of information like
>>>  logfiles
>>>
>>>  - if you imagine a scenario with multiple mechanisms of delivering an
>>>  alert + multiple
>>>    recipients we couldn't use a single list but we would need something 
>> more
>>>    complicated to prevent unneeded delays, delays coming from one of the
>>>  delivery
>>>    methods not working properly due to e.g. a recipient that is not
>>>  reachable, ...
>>>    (all solvable of course but if it doesn't solve your problem in the
>>>  first place why the effort)
>>>
>>>  The alternative approach taken doesn't create the timestamps in the
>>>  scripts but
>>>  provides timestamps to the scripts already.
>>>  This way it doesn't matter if the execution of the script is delayed.
>>>
>>>
>>>  A short example how this approach could be used with snmp-traps:
>>>
>>>  edit pcmk_snmp_helper.sh:
>>>
>>>  ...
>>>  starttickfile="/var/run/starttick"
>>>
>>>  # hack to have a reference
>>>  # can have it e.g. in an attribute to be visible throughout the cluster
>>>  if [ ! -f ${starttickfile} ] ; then
>>>          echo ${CRM_alert_timestamp} > ${starttickfile}
>>>  fi
>>>
>>>  starttick=`cat ${starttickfile}`
>>>  ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
>>>
>>>  if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == 
>> "monitor" 
>>>  ]] || [[
>>>  ${CRM_alert_task} != "monitor" ]] ; then
>>>      # This trap is compliant with PACEMAKER MIB
>>>      # 
>>>  https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
>>>      /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
>>>  PACEMAKER-MIB::pacemakerNotificationTrap \
>>>          PACEMAKER-MIB::pacemakerNotificationNode s 
>> "${CRM_alert_node}" 
>>>  \
>>>          PACEMAKER-MIB::pacemakerNotificationResource s 
>>>  "${CRM_alert_rsc}" \
>>>          PACEMAKER-MIB::pacemakerNotificationOperation s
>>>  "${CRM_alert_task}" \
>>>          PACEMAKER-MIB::pacemakerNotificationDescription s
>>>  "${CRM_alert_desc}" \
>>>          PACEMAKER-MIB::pacemakerNotificationStatus i 
>>>  "${CRM_alert_status}" \
>>>          PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} 
>> \
>>>          PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
>>>  ${CRM_alert_target_rc} && exit 0 || exit 1
>>>  fi
>>>
>>>  exit 0
>>>  ...
>>>
>>>  add a section to the cib:
>>>
>>>  cibadmin --create --xml-text '<configuration> <alerts> 
>> <alert
>>>  id="snmp_traps" 
>>>  path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>>>  <meta_attributes id="meta_snmp_traps"> <nvpair 
>>>  id="snmp_timestamp"
>>>  name="tstamp_format" value="%s%02N"/> 
>>>  </meta_attributes> <recipient
>>>  id="trap_destination" value="192.168.123.3"/> 
>>>  </alert> </alerts>
>>>  </configuration>'
>>>
>>>
>>>  This should solve the issue of correct order after being sorted by
>>>  timestamps
>>>  without having the ugly side-effects as described above.
>>>
>>>  I hope I understood your scenario correctly and this small example
>>>  points out how I roughly would suggest to cope with the issue.
>>>
>>>  Regards,
>>>  Klaus  
>>>>   ----
>>>>   [root at rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>>>>   Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>> prmDummy1_stop_0: 
>>>  ok (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>>>>   Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>> prmDummy3_stop_0: 
>>>  ok (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>>>>   Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>> prmDummy4_stop_0: 
>>>  ok (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>>>>   Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>> prmDummy2_stop_0: 
>>>  ok (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>>>>   Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>> prmDummy5_stop_0: 
>>>  ok (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>>>>   Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
>>>  <UNKNOWN> [UDP: 
>>>
>> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>>
>>>  = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 = 
>> OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
>>
>>>  = STRING: 
>> "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>  STRING: 
>> "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>> = 
>>>  STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>>>   Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
>>>  <UNKNOWN> [UDP: 
>>>
>> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>>
>>>  = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 = 
>> OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
>>
>>>  = STRING: 
>> "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>  STRING: 
>> "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>> = 
>>>  STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>>>   Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
>>>  <UNKNOWN> [UDP: 
>>>
>> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>>
>>>  = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 = 
>> OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
>>
>>>  = STRING: 
>> "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>  STRING: 
>> "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>> = 
>>>  STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>>>   Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
>>>  <UNKNOWN> [UDP: 
>>>
>> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>>
>>>  = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 = 
>> OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
>>
>>>  = STRING: 
>> "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>  STRING: 
>> "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>> = 
>>>  STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>>>   Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
>>>  <UNKNOWN> [UDP: 
>>>
>> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
>>
>>>  = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 = 
>> OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
>>
>>>  = STRING: 
>> "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>  STRING: 
>> "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
>>>  STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>> = 
>>>  STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>> INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>  0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>>>   ----
>>>>
>>>>   I think that there is "timestamp" attribute for async by 
>> this 
>>>  change.
>>>>   The order of traps may be important to a user.
>>>>   I suggest addition to "alert" element with 
>> "orderd" 
>>>  attribute.
>>>>    * orderd 
>>>>       false : The present processing.
>>>>       true  : Control the transmission order of the trap.
>>>>
>>>>   ----
>>>>   <configuration>
>>>>     <alerts>
>>>>       <alert id="notify_9"
>>>>   path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" 
>>>  ordered="true">
>>>>   (snip)
>>>>       </alert>
>>>>       <alert id="notify_9"
>>>>   path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" 
>>>  ordered="false">
>>>>   (snip)
>>>>       </alert>
>>>>     </alerts>
>>>>   </configuration>
>>>>
>>>>   ----
>>>>
>>>>   I send a patch to cope with this problem before.
>>>>   The former patch may be useful for the correction.
>>>>    * https://github.com/ClusterLabs/pacemaker/pull/847
>>>>
>>>>   I intend to write the patch if everybody agrees to "ordered" 
>>>  attribute.
>>>>   Best Regards,
>>>>   Hideo Yamauchi.
>>>>
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org
>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org
>>>  http://clusterlabs.org/mailman/listinfo/users
>>>
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list