[ClusterLabs] Coming in 1.1.15: Event-driven alerts
Klaus Wenninger
kwenning at redhat.com
Wed Apr 27 18:14:27 UTC 2016
On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
>
> We have a request for a new SNMP function.
>
>
> The order of traps is not right.
>
> The turn of the trap is not sometimes followed.
> This is because the handling of notice carries out "path" in async.
> I think that it is necessary to wait for completion of the practice at "path" unit of "alerts".
>
> The turn of the trap is different from the real stop order of the resource.
Writing the alerts in a local list and having the alert-scripts called
in a serialized manner
would lead to the snmptrap-tool creating timestamps in the order of the
occurrence
of the alerts.
Having the snmp-manager order the traps by timestamp this would indeed
lead to
seeing them in the order they had occured.
But this approach has a number of drawbacks:
- it works just when the traps are coming from one node as there is no
way to serialize
over nodes - at least none that would work under all circumstances we
want alerts
to be delivered
- it distorts the timestamps created even more from the points in time
when the
alert had been triggered - making the result in a multi-node-scenario
even worse and
making it hard to correlate with other sources of information like
logfiles
- if you imagine a scenario with multiple mechanisms of delivering an
alert + multiple
recipients we couldn't use a single list but we would need something more
complicated to prevent unneeded delays, delays coming from one of the
delivery
methods not working properly due to e.g. a recipient that is not
reachable, ...
(all solvable of course but if it doesn't solve your problem in the
first place why the effort)
The alternative approach taken doesn't create the timestamps in the
scripts but
provides timestamps to the scripts already.
This way it doesn't matter if the execution of the script is delayed.
A short example how this approach could be used with snmp-traps:
edit pcmk_snmp_helper.sh:
...
starttickfile="/var/run/starttick"
# hack to have a reference
# can have it e.g. in an attribute to be visible throughout the cluster
if [ ! -f ${starttickfile} ] ; then
echo ${CRM_alert_timestamp} > ${starttickfile}
fi
starttick=`cat ${starttickfile}`
ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
${CRM_alert_task} != "monitor" ]] ; then
# This trap is compliant with PACEMAKER MIB
#
https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
/usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
PACEMAKER-MIB::pacemakerNotificationTrap \
PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
PACEMAKER-MIB::pacemakerNotificationOperation s
"${CRM_alert_task}" \
PACEMAKER-MIB::pacemakerNotificationDescription s
"${CRM_alert_desc}" \
PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
${CRM_alert_target_rc} && exit 0 || exit 1
fi
exit 0
...
add a section to the cib:
cibadmin --create --xml-text '<configuration> <alerts> <alert
id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
<meta_attributes id="meta_snmp_traps"> <nvpair id="snmp_timestamp"
name="tstamp_format" value="%s%02N"/> </meta_attributes> <recipient
id="trap_destination" value="192.168.123.3"/> </alert> </alerts>
</configuration>'
This should solve the issue of correct order after being sorted by
timestamps
without having the ugly side-effects as described above.
I hope I understood your scenario correctly and this small example
points out how I roughly would suggest to cope with the issue.
Regards,
Klaus
>
> ----
> [root at rh72-01 ~]# grep Operation /var/log/ha-log | grep stop
> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy1_stop_0: ok (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy3_stop_0: ok (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy4_stop_0: ok (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy2_stop_0: ok (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
> Apr 25 18:48:48 rh72-01 crmd[28897]: notice: Operation prmDummy5_stop_0: ok (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>
> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 = OID: PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = STRING: "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>
> ----
>
> I think that there is "timestamp" attribute for async by this change.
>
> The order of traps may be important to a user.
> I suggest addition to "alert" element with "orderd" attribute.
>
> * orderd
> false : The present processing.
> true : Control the transmission order of the trap.
>
> ----
> <configuration>
> <alerts>
> <alert id="notify_9"
> path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" ordered="true">
> (snip)
> </alert>
> <alert id="notify_9"
> path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" ordered="false">
> (snip)
> </alert>
> </alerts>
> </configuration>
>
> ----
>
> I send a patch to cope with this problem before.
> The former patch may be useful for the correction.
> * https://github.com/ClusterLabs/pacemaker/pull/847
>
> I intend to write the patch if everybody agrees to "ordered" attribute.
>
> Best Regards,
> Hideo Yamauchi.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list