[ClusterLabs] Coming in 1.1.15: Event-driven alerts

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Wed May 11 21:28:59 UTC 2016


Hi Klaus,

Thank you for comment.

I confirm your comment.
I think that I ask you a question again.


Many thanks!
Hideo Yamauchi.


----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: users at clusterlabs.org
> Cc: 
> Date: 2016/5/11, Wed 14:13
> Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
> 
> On 05/10/2016 11:19 PM, renayama19661014 at ybb.ne.jp wrote:
>>  Hi All,
>> 
>>  After all our member needs the control of the turn of the transmission of 
> the SNMP trap.
>> 
>>  We make a patch of the control of the turn of the transmission and intend 
> to send it.
>> 
>>  Probably, with the patch, we add the "ordered" attribute that we 
> sent by an email before.
> Actually I still don't think that simple serialization of the calling of
> the snmptrap-tool
> is a good solution to tackle the problem of loosing the order of traps
> arriving at
> some management station:
> 
> - makes things worse in case of traps coming from multiple nodes
> - doesn't help when the order is lost on the network.
> 
> Anyway I see 2 other scenarios where a certain degree of serialization might
> be helpful:
> 
> - alert agent-scripts that can't handle being called concurrently
> - performance issues that might arise on some systems that lack the
>   performance-headroom needed and/or the agent-scripts in place
>   require significant effort and/or there are a lot of resources/events
>   that trigger a vast amount of alerts being handled in parallel
> 
> So I could imagine the introduction of a meta-atribute that specifies a
> queue
> to be used for serialization.
> 
> - 'none' is default and leads to the behavior we have at the moment.
> - any other queue-name leads to the instantiation of an additional queue
> 
> This approach should allow merely any kind of serialization you can think of
> with as little impact as needed.
> e.g. if the agent doesn't cope with concurrent calls you use a queue per
> agent leading to all recipients being handled in a serialized way (and of
> course the different alerts as well). And all the other agents are running
> in parallel.
> e.g. you can have a separate queue for a single recipient leading to
> the alerts being sent there being serialized.
> e.g. if the performance impact should be kept at a minimal level you
> would use a single queue for all agents and all recipients 
> 
>> 
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>> 
>>  ----- Original Message -----
>>>  From: "renayama19661014 at ybb.ne.jp" 
> <renayama19661014 at ybb.ne.jp>
>>>  To: "kwenning at redhat.com" <kwenning at redhat.com>; 
> "users at clusterlabs.org" <users at clusterlabs.org>; Cluster Labs - 
> All topics related to open-source clustering welcomed 
> <users at clusterlabs.org>
>>>  Cc: 
>>>  Date: 2016/4/28, Thu 22:43
>>>  Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
>>> 
>>>  Hi Klaus,
>>> 
>>>  Because the script is performed the effectiveness of in async, I think 
> that it 
>>>  is difficult to set "uptime" by the method of the sample.
>>>  After all we may request the transmission of the order.
>>>  #The patch before mine only controls a practice turn of the async and 
> is not a 
>>>  thing giving load of crmd.
>>> 
>>>  Japan begins a rest for one week from tomorrow.
>>>  I discuss after vacation with a member.
>>> 
>>>  Best Regards,
>>>  Hideo Yamauchi.
>>> 
>>> 
>>> 
>>>  ----- Original Message -----
>>>>   From: Klaus Wenninger <kwenning at redhat.com>
>>>>   To: users at clusterlabs.org
>>>>   Cc: 
>>>>   Date: 2016/4/28, Thu 03:14
>>>>   Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
>>>> 
>>>>   On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>    Hi All,
>>>>> 
>>>>>    We have a request for a new SNMP function.
>>>>> 
>>>>> 
>>>>>    The order of traps is not right.
>>>>> 
>>>>>    The turn of the trap is not sometimes followed.
>>>>>    This is because the handling of notice carries out 
> "path" in 
>>>>   async.
>>>>>    I think that it is necessary to wait for completion of the 
> practice at 
>>>>   "path" unit of "alerts".
>>>>>     
>>>>>    The turn of the trap is different from the real stop order of 
> the 
>>>  resource.
>>>>   Writing the alerts in a local list and having the alert-scripts 
> called
>>>>   in a serialized manner
>>>>   would lead to the snmptrap-tool creating timestamps in the order 
> of the
>>>>   occurrence 
>>>>   of the alerts.
>>>>   Having the snmp-manager order the traps by timestamp this would 
> indeed
>>>>   lead to
>>>>   seeing them in the order they had occured.
>>>> 
>>>>   But this approach has a number of drawbacks:
>>>> 
>>>>   - it works just when the traps are coming from one node as there 
> is no
>>>>   way to serialize
>>>>     over nodes - at least none that would work under all 
> circumstances we
>>>>   want alerts
>>>>     to be delivered
>>>> 
>>>>   - it distorts the timestamps created even more from the points in 
> time
>>>>   when the
>>>>     alert had been triggered - making the result in a 
> multi-node-scenario
>>>>   even worse and
>>>>     making it hard to correlate with other sources of information 
> like
>>>>   logfiles
>>>> 
>>>>   - if you imagine a scenario with multiple mechanisms of delivering 
> an
>>>>   alert + multiple
>>>>     recipients we couldn't use a single list but we would need 
> something 
>>>  more
>>>>     complicated to prevent unneeded delays, delays coming from one 
> of the
>>>>   delivery
>>>>     methods not working properly due to e.g. a recipient that is not
>>>>   reachable, ...
>>>>     (all solvable of course but if it doesn't solve your problem 
> in the
>>>>   first place why the effort)
>>>> 
>>>>   The alternative approach taken doesn't create the timestamps 
> in the
>>>>   scripts but
>>>>   provides timestamps to the scripts already.
>>>>   This way it doesn't matter if the execution of the script is 
> delayed.
>>>> 
>>>> 
>>>>   A short example how this approach could be used with snmp-traps:
>>>> 
>>>>   edit pcmk_snmp_helper.sh:
>>>> 
>>>>   ...
>>>>   starttickfile="/var/run/starttick"
>>>> 
>>>>   # hack to have a reference
>>>>   # can have it e.g. in an attribute to be visible throughout the 
> cluster
>>>>   if [ ! -f ${starttickfile} ] ; then
>>>>           echo ${CRM_alert_timestamp} > ${starttickfile}
>>>>   fi
>>>> 
>>>>   starttick=`cat ${starttickfile}`
>>>>   ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
>>>> 
>>>>   if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == 
>>>  "monitor" 
>>>>   ]] || [[
>>>>   ${CRM_alert_task} != "monitor" ]] ; then
>>>>       # This trap is compliant with PACEMAKER MIB
>>>>       # 
>>>>   
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
>>>>       /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} 
> ${ticks}
>>>>   PACEMAKER-MIB::pacemakerNotificationTrap \
>>>>           PACEMAKER-MIB::pacemakerNotificationNode s 
>>>  "${CRM_alert_node}" 
>>>>   \
>>>>           PACEMAKER-MIB::pacemakerNotificationResource s 
>>>>   "${CRM_alert_rsc}" \
>>>>           PACEMAKER-MIB::pacemakerNotificationOperation s
>>>>   "${CRM_alert_task}" \
>>>>           PACEMAKER-MIB::pacemakerNotificationDescription s
>>>>   "${CRM_alert_desc}" \
>>>>           PACEMAKER-MIB::pacemakerNotificationStatus i 
>>>>   "${CRM_alert_status}" \
>>>>           PACEMAKER-MIB::pacemakerNotificationReturnCode i 
> ${CRM_alert_rc} 
>>>  \
>>>>           PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
>>>>   ${CRM_alert_target_rc} && exit 0 || exit 1
>>>>   fi
>>>> 
>>>>   exit 0
>>>>   ...
>>>> 
>>>>   add a section to the cib:
>>>> 
>>>>   cibadmin --create --xml-text '<configuration> 
> <alerts> 
>>>  <alert
>>>>   id="snmp_traps" 
>>>>   
> path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>>>>   <meta_attributes id="meta_snmp_traps"> <nvpair 
>>>>   id="snmp_timestamp"
>>>>   name="tstamp_format" value="%s%02N"/> 
>>>>   </meta_attributes> <recipient
>>>>   id="trap_destination" 
> value="192.168.123.3"/> 
>>>>   </alert> </alerts>
>>>>   </configuration>'
>>>> 
>>>> 
>>>>   This should solve the issue of correct order after being sorted by
>>>>   timestamps
>>>>   without having the ugly side-effects as described above.
>>>> 
>>>>   I hope I understood your scenario correctly and this small example
>>>>   points out how I roughly would suggest to cope with the issue.
>>>> 
>>>>   Regards,
>>>>   Klaus  
>>>>>    ----
>>>>>    [root at rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>>>>>    Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>>>  prmDummy1_stop_0: 
>>>>   ok (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>>>>>    Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>>>  prmDummy3_stop_0: 
>>>>   ok (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>>>>>    Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>>>  prmDummy4_stop_0: 
>>>>   ok (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>>>>>    Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>>>  prmDummy2_stop_0: 
>>>>   ok (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>>>>>    Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation 
>>>  prmDummy5_stop_0: 
>>>>   ok (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>>>>>    Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 
> 18:48:50 
>>>>   <UNKNOWN> [UDP: 
>>>> 
>>> 
> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>> 
>>>>   = Timeticks: (25512486) 2 days, 
> 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> 
>>> 
>>>>   = STRING: 
>>>  "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>>   STRING: 
>>>  "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation 
> = 
>>>>   STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>>>  = 
>>>>   STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>>>  INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 0
>>>>>    Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 
> 18:48:50 
>>>>   <UNKNOWN> [UDP: 
>>>> 
>>> 
> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>> 
>>>>   = Timeticks: (25512489) 2 days, 
> 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> 
>>> 
>>>>   = STRING: 
>>>  "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>>   STRING: 
>>>  "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation 
> = 
>>>>   STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>>>  = 
>>>>   STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>>>  INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 0
>>>>>    Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 
> 18:48:50 
>>>>   <UNKNOWN> [UDP: 
>>>> 
>>> 
> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>> 
>>>>   = Timeticks: (25512490) 2 days, 
> 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> 
>>> 
>>>>   = STRING: 
>>>  "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>>   STRING: 
>>>  "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation 
> = 
>>>>   STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>>>  = 
>>>>   STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>>>  INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 0
>>>>>    Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 
> 18:48:50 
>>>>   <UNKNOWN> [UDP: 
>>>> 
>>> 
> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>> 
>>>>   = Timeticks: (25512494) 2 days, 
> 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> 
>>> 
>>>>   = STRING: 
>>>  "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>>   STRING: 
>>>  "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation 
> = 
>>>>   STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>>>  = 
>>>>   STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>>>  INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 0
>>>>>    Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 
> 18:48:50 
>>>>   <UNKNOWN> [UDP: 
>>>> 
>>> 
> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> 
>>> 
>>>>   = Timeticks: (25512497) 2 days, 
> 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 = 
>>>  OID: 
>>> 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> 
>>> 
>>>>   = STRING: 
>>>  "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>>>>   STRING: 
>>>  "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation 
> = 
>>>>   STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription 
>>>  = 
>>>>   STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = 
>>>  INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>>>>   0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
> INTEGER: 0
>>>>>    ----
>>>>> 
>>>>>    I think that there is "timestamp" attribute for 
> async by 
>>>  this 
>>>>   change.
>>>>>    The order of traps may be important to a user.
>>>>>    I suggest addition to "alert" element with 
>>>  "orderd" 
>>>>   attribute.
>>>>>     * orderd 
>>>>>        false : The present processing.
>>>>>        true  : Control the transmission order of the trap.
>>>>> 
>>>>>    ----
>>>>>    <configuration>
>>>>>      <alerts>
>>>>>        <alert id="notify_9"
>>>>>   
> path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" 
>>>>   ordered="true">
>>>>>    (snip)
>>>>>        </alert>
>>>>>        <alert id="notify_9"
>>>>>   
> path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" 
>>>>   ordered="false">
>>>>>    (snip)
>>>>>        </alert>
>>>>>      </alerts>
>>>>>    </configuration>
>>>>> 
>>>>>    ----
>>>>> 
>>>>>    I send a patch to cope with this problem before.
>>>>>    The former patch may be useful for the correction.
>>>>>     * https://github.com/ClusterLabs/pacemaker/pull/847
>>>>> 
>>>>>    I intend to write the patch if everybody agrees to 
> "ordered" 
>>>>   attribute.
>>>>>    Best Regards,
>>>>>    Hideo Yamauchi.
>>>>> 
>>>>>    _______________________________________________
>>>>>    Users mailing list: Users at clusterlabs.org
>>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>>> 
>>>>>    Project Home: http://www.clusterlabs.org
>>>>>    Getting started: 
>>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>    Bugs: http://bugs.clusterlabs.org
>>>> 
>>>>   _______________________________________________
>>>>   Users mailing list: Users at clusterlabs.org
>>>>   http://clusterlabs.org/mailman/listinfo/users
>>>> 
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>> 
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org
>>>  http://clusterlabs.org/mailman/listinfo/users
>>> 
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Users mailing list