[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Apr 28 06:33:45 UTC 2016
Hi!
I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event sequence number to the notification script (SNMP trap) help?
Regards,
Ulrich
>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 27.04.2016 um 20:14 in
Nachricht <57210183.6050503 at redhat.com>:
> On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>> Hi All,
>>
>> We have a request for a new SNMP function.
>>
>>
>> The order of traps is not right.
>>
>> The turn of the trap is not sometimes followed.
>> This is because the handling of notice carries out "path" in async.
>> I think that it is necessary to wait for completion of the practice at
> "path" unit of "alerts".
>>
>> The turn of the trap is different from the real stop order of the resource.
> Writing the alerts in a local list and having the alert-scripts called
> in a serialized manner
> would lead to the snmptrap-tool creating timestamps in the order of the
> occurrence
> of the alerts.
> Having the snmp-manager order the traps by timestamp this would indeed
> lead to
> seeing them in the order they had occured.
>
> But this approach has a number of drawbacks:
>
> - it works just when the traps are coming from one node as there is no
> way to serialize
> over nodes - at least none that would work under all circumstances we
> want alerts
> to be delivered
>
> - it distorts the timestamps created even more from the points in time
> when the
> alert had been triggered - making the result in a multi-node-scenario
> even worse and
> making it hard to correlate with other sources of information like
> logfiles
>
> - if you imagine a scenario with multiple mechanisms of delivering an
> alert + multiple
> recipients we couldn't use a single list but we would need something more
> complicated to prevent unneeded delays, delays coming from one of the
> delivery
> methods not working properly due to e.g. a recipient that is not
> reachable, ...
> (all solvable of course but if it doesn't solve your problem in the
> first place why the effort)
>
> The alternative approach taken doesn't create the timestamps in the
> scripts but
> provides timestamps to the scripts already.
> This way it doesn't matter if the execution of the script is delayed.
>
>
> A short example how this approach could be used with snmp-traps:
>
> edit pcmk_snmp_helper.sh:
>
> ...
> starttickfile="/var/run/starttick"
>
> # hack to have a reference
> # can have it e.g. in an attribute to be visible throughout the cluster
> if [ ! -f ${starttickfile} ] ; then
> echo ${CRM_alert_timestamp} > ${starttickfile}
> fi
>
> starttick=`cat ${starttickfile}`
> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
>
> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
> ${CRM_alert_task} != "monitor" ]] ; then
> # This trap is compliant with PACEMAKER MIB
> #
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
> /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
> PACEMAKER-MIB::pacemakerNotificationTrap \
> PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
> PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
> PACEMAKER-MIB::pacemakerNotificationOperation s
> "${CRM_alert_task}" \
> PACEMAKER-MIB::pacemakerNotificationDescription s
> "${CRM_alert_desc}" \
> PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
> PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
> PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
> ${CRM_alert_target_rc} && exit 0 || exit 1
> fi
>
> exit 0
> ...
>
> add a section to the cib:
>
> cibadmin --create --xml-text '<configuration> <alerts> <alert
> id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
> <meta_attributes id="meta_snmp_traps"> <nvpair id="snmp_timestamp"
> name="tstamp_format" value="%s%02N"/> </meta_attributes> <recipient
> id="trap_destination" value="192.168.123.3"/> </alert> </alerts>
> </configuration>'
>
>
> This should solve the issue of correct order after being sorted by
> timestamps
> without having the ugly side-effects as described above.
>
> I hope I understood your scenario correctly and this small example
> points out how I roughly would suggest to cope with the issue.
>
> Regards,
> Klaus
>>
>> ----
>> [root at rh72-01 ~]# grep Operation /var/log/ha-log | grep stop
>> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy1_stop_0: ok
> (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy3_stop_0: ok
> (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy4_stop_0: ok
> (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy2_stop_0: ok
> (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy5_stop_0: ok
> (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>>
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP:
> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0
> = OID:
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource =
> STRING: "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING:
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING:
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP:
> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0
> = OID:
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource =
> STRING: "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING:
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING:
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP:
> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0
> = OID:
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource =
> STRING: "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING:
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING:
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP:
> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0
> = OID:
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource =
> STRING: "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING:
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING:
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP:
> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
> ance = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0
> = OID:
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource =
> STRING: "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING:
> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING:
> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER:
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>
>> ----
>>
>> I think that there is "timestamp" attribute for async by this change.
>>
>> The order of traps may be important to a user.
>> I suggest addition to "alert" element with "orderd" attribute.
>>
>> * orderd
>> false : The present processing.
>> true : Control the transmission order of the trap.
>>
>> ----
>> <configuration>
>> <alerts>
>> <alert id="notify_9"
>> path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" ordered="true">
>> (snip)
>> </alert>
>> <alert id="notify_9"
>> path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" ordered="false">
>> (snip)
>> </alert>
>> </alerts>
>> </configuration>
>>
>> ----
>>
>> I send a patch to cope with this problem before.
>> The former patch may be useful for the correction.
>> * https://github.com/ClusterLabs/pacemaker/pull/847
>>
>> I intend to write the patch if everybody agrees to "ordered" attribute.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list