[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

Klaus Wenninger kwenning at redhat.com
Thu Apr 28 09:11:50 UTC 2016


On 04/28/2016 08:33 AM, Ulrich Windl wrote:
> Hi!
>
> I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event sequence number to the notification script (SNMP trap) help?

CRM_alert_node_sequence is there already but as the name says it is just
a reference within one node
and for the case of SNMP you would have to feed it somehow into snmptrap...

The CIB generation would be something cluster-wide but just in the case
that the cluster-nodes are
seeing each other at the moment. Alerts are often especially interesting
during theses periods
of time where this is not the case. But definitely something to think
about...
And again something abstract that some alert-collection-tool wouldn't
know about and thus
probably would refuse to sort by that value.
Some kind of time is probably something you'll find support for easier.

Regards,
Klaus
>
> Regards,
> Ulrich
>
>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 27.04.2016 um 20:14 in
> Nachricht <57210183.6050503 at redhat.com>:
>> On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi All,
>>>
>>> We have a request for a new SNMP function.
>>>
>>>
>>> The order of traps is not right.
>>>
>>> The turn of the trap is not sometimes followed.
>>> This is because the handling of notice carries out "path" in async.
>>> I think that it is necessary to wait for completion of the practice at 
>> "path" unit of "alerts".
>>>  
>>> The turn of the trap is different from the real stop order of the resource.
>> Writing the alerts in a local list and having the alert-scripts called
>> in a serialized manner
>> would lead to the snmptrap-tool creating timestamps in the order of the
>> occurrence 
>> of the alerts.
>> Having the snmp-manager order the traps by timestamp this would indeed
>> lead to
>> seeing them in the order they had occured.
>>
>> But this approach has a number of drawbacks:
>>
>> - it works just when the traps are coming from one node as there is no
>> way to serialize
>>   over nodes - at least none that would work under all circumstances we
>> want alerts
>>   to be delivered
>>
>> - it distorts the timestamps created even more from the points in time
>> when the
>>   alert had been triggered - making the result in a multi-node-scenario
>> even worse and
>>   making it hard to correlate with other sources of information like
>> logfiles
>>
>> - if you imagine a scenario with multiple mechanisms of delivering an
>> alert + multiple
>>   recipients we couldn't use a single list but we would need something more
>>   complicated to prevent unneeded delays, delays coming from one of the
>> delivery
>>   methods not working properly due to e.g. a recipient that is not
>> reachable, ...
>>   (all solvable of course but if it doesn't solve your problem in the
>> first place why the effort)
>>
>> The alternative approach taken doesn't create the timestamps in the
>> scripts but
>> provides timestamps to the scripts already.
>> This way it doesn't matter if the execution of the script is delayed.
>>
>>
>> A short example how this approach could be used with snmp-traps:
>>
>> edit pcmk_snmp_helper.sh:
>>
>> ...
>> starttickfile="/var/run/starttick"
>>
>> # hack to have a reference
>> # can have it e.g. in an attribute to be visible throughout the cluster
>> if [ ! -f ${starttickfile} ] ; then
>>         echo ${CRM_alert_timestamp} > ${starttickfile}
>> fi
>>
>> starttick=`cat ${starttickfile}`
>> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
>>
>> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
>> ${CRM_alert_task} != "monitor" ]] ; then
>>     # This trap is compliant with PACEMAKER MIB
>>     # 
>> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt 
>>     /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
>> PACEMAKER-MIB::pacemakerNotificationTrap \
>>         PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
>>         PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
>>         PACEMAKER-MIB::pacemakerNotificationOperation s
>> "${CRM_alert_task}" \
>>         PACEMAKER-MIB::pacemakerNotificationDescription s
>> "${CRM_alert_desc}" \
>>         PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
>>         PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
>>         PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
>> ${CRM_alert_target_rc} && exit 0 || exit 1
>> fi
>>
>> exit 0
>> ...
>>
>> add a section to the cib:
>>
>> cibadmin --create --xml-text '<configuration> <alerts> <alert
>> id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>> <meta_attributes id="meta_snmp_traps"> <nvpair id="snmp_timestamp"
>> name="tstamp_format" value="%s%02N"/> </meta_attributes> <recipient
>> id="trap_destination" value="192.168.123.3"/> </alert> </alerts>
>> </configuration>'
>>
>>
>> This should solve the issue of correct order after being sorted by
>> timestamps
>> without having the ugly side-effects as described above.
>>
>> I hope I understood your scenario correctly and this small example
>> points out how I roughly would suggest to cope with the issue.
>>
>> Regards,
>> Klaus  
>>> ----
>>> [root at rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy1_stop_0: ok 
>> (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy3_stop_0: ok 
>> (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy4_stop_0: ok 
>> (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy2_stop_0: ok 
>> (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy5_stop_0: ok 
>> (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
>> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
>> ance = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 
>> = OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
>> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>> STRING: "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
>> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
>> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
>> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
>> ance = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 
>> = OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
>> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>> STRING: "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
>> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
>> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
>> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
>> ance = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 
>> = OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
>> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>> STRING: "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
>> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
>> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
>> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
>> ance = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 
>> = OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
>> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>> STRING: "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
>> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
>> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>> Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 <UNKNOWN> [UDP: 
>> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInst
>> ance = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 
>> = OID: 
>> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotific
>> ationNode = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
>> STRING: "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
>> "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
>> "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
>> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>> ----
>>>
>>> I think that there is "timestamp" attribute for async by this change.
>>>
>>> The order of traps may be important to a user.
>>> I suggest addition to "alert" element with "orderd" attribute.
>>>
>>>  * orderd 
>>>     false : The present processing.
>>>     true  : Control the transmission order of the trap.
>>>
>>> ----
>>> <configuration>
>>>   <alerts>
>>>     <alert id="notify_9"
>>> path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" ordered="true">
>>> (snip)
>>>     </alert>
>>>     <alert id="notify_9"
>>> path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" ordered="false">
>>> (snip)
>>>     </alert>
>>>   </alerts>
>>> </configuration>
>>>
>>> ----
>>>
>>> I send a patch to cope with this problem before.
>>> The former patch may be useful for the correction.
>>>  * https://github.com/ClusterLabs/pacemaker/pull/847 
>>>
>>> I intend to write the patch if everybody agrees to "ordered" attribute.
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>





More information about the Users mailing list