[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Apr 28 06:33:45 UTC 2016


Hi!

I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event sequence number to the notification script (SNMP trap) help?

Regards,
Ulrich

>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 27.04.2016 um 20:14 in
Nachricht <57210183.6050503 at redhat.com>:
> On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>> Hi All,
>>
>> We have a request for a new SNMP function.
>>
>>
>> The order of traps is not right.
>>
>> The turn of the trap is not sometimes followed.
>> This is because the handling of notice carries out "path" in async.
>> I think that it is necessary to wait for completion of the practice at 
> "path" unit of "alerts".
>>  
>> The turn of the trap is different from the real stop order of the resource.
> Writing the alerts in a local list and having the alert-scripts called
> in a serialized manner
> would lead to the snmptrap-tool creating timestamps in the order of the
> occurrence 
> of the alerts.
> Having the snmp-manager order the traps by timestamp this would indeed
> lead to
> seeing them in the order they had occured.
> 
> But this approach has a number of drawbacks:
> 
> - it works just when the traps are coming from one node as there is no
> way to serialize
>   over nodes - at least none that would work under all circumstances we
> want alerts
>   to be delivered
> 
> - it distorts the timestamps created even more from the points in time
> when the
>   alert had been triggered - making the result in a multi-node-scenario
> even worse and
>   making it hard to correlate with other sources of information like
> logfiles
> 
> - if you imagine a scenario with multiple mechanisms of delivering an
> alert + multiple
>   recipients we couldn't use a single list but we would need something more
>   complicated to prevent unneeded delays, delays coming from one of the
> delivery
>   methods not working properly due to e.g. a recipient that is not
> reachable, ...
>   (all solvable of course but if it doesn't solve your problem in the
> first place why the effort)
> 
> The alternative approach taken doesn't create the timestamps in the
> scripts but
> provides timestamps to the scripts already.
> This way it doesn't matter if the execution of the script is delayed.
> 
> 
> A short example how this approach could be used with snmp-traps:
> 
> edit pcmk_snmp_helper.sh:
> 
> ...
> starttickfile="/var/run/starttick"
> 
> # hack to have a reference
> # can have it e.g. in an attribute to be visible throughout the cluster
> if [ ! -f ${starttickfile} ] ; then
>         echo ${CRM_alert_timestamp} > ${starttickfile}
> fi
> 
> starttick=`cat ${starttickfile}`
> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
> 
> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
> ${CRM_alert_task} != "monitor" ]] ; then
>     # This trap is compliant with PACEMAKER MIB
>     # 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt 
>     /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
> PACEMAKER-MIB::pacemakerNotificationTrap \
>         PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
>         PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
>         PACEMAKER-MIB::pacemakerNotificationOperation s
> "${CRM_alert_task}" \
>         PACEMAKER-MIB::pacemakerNotificationDescription s
> "${CRM_alert_desc}" \
>         PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
>         PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
>         PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
> ${CRM_alert_target_rc} && exit 0 || exit 1
> fi
> 
> exit 0
> ...
> 
> add a section to the cib:
> 
> cibadmin --create --xml-text '<configuration> <alerts> <alert
> id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
> <meta_attributes id="meta_snmp_traps"> <nvpair id="snmp_timestamp"
> name="tstamp_format" value="%s%02N"/> </meta_attributes> <recipient
> id="trap_destination" value="192.168.123.3"/> </alert> </alerts>
> </configuration>'
> 
> 
> This should solve the issue of correct order after being sorted by
> timestamps
> without having the ugly side-effects as described above.
> 
> I hope I understood your scenario correctly and this small example
> points out how I roughly would suggest to cope with the issue.
> 
> Regards,
> Klaus  
>>
>> ----
>> [root at rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy1_stop_0: ok 
> (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy3_stop_0: ok 
> (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy4_stop_0: ok 
> (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy2_stop_0: ok 
> (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy5_stop_0: ok 
> (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>>
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 
> = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 
> = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 
> = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 
> = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 
> = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>
>> ----
>>
>> I think that there is "timestamp" attribute for async by this change.
>>
>> The order of traps may be important to a user.
>> I suggest addition to "alert" element with "orderd" attribute.
>>
>>  * orderd 
>>     false : The present processing.
>>     true  : Control the transmission order of the trap.
>>
>> ----
>> <configuration>
>>   <alerts>
>>     <alert id="notify_9"
>> path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" ordered="true">
>> (snip)
>>     </alert>
>>     <alert id="notify_9"
>> path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" ordered="false">
>> (snip)
>>     </alert>
>>   </alerts>
>> </configuration>
>>
>> ----
>>
>> I send a patch to cope with this problem before.
>> The former patch may be useful for the correction.
>>  * https://github.com/ClusterLabs/pacemaker/pull/847 
>>
>> I intend to write the patch if everybody agrees to "ordered" attribute.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list