[ClusterLabs] Coming in 1.1.15: Event-driven alerts

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Thu Apr 28 13:43:02 UTC 2016


Hi Klaus,

Because the script is performed the effectiveness of in async, I think that it is difficult to set "uptime" by the method of the sample.
After all we may request the transmission of the order.
#The patch before mine only controls a practice turn of the async and is not a thing giving load of crmd.

Japan begins a rest for one week from tomorrow.
I discuss after vacation with a member.

Best Regards,
Hideo Yamauchi.



----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: users at clusterlabs.org
> Cc: 
> Date: 2016/4/28, Thu 03:14
> Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
> 
> On 04/27/2016 04:19 PM, renayama19661014 at ybb.ne.jp wrote:
>>  Hi All,
>> 
>>  We have a request for a new SNMP function.
>> 
>> 
>>  The order of traps is not right.
>> 
>>  The turn of the trap is not sometimes followed.
>>  This is because the handling of notice carries out "path" in 
> async.
>>  I think that it is necessary to wait for completion of the practice at 
> "path" unit of "alerts".
>>   
>>  The turn of the trap is different from the real stop order of the resource.
> Writing the alerts in a local list and having the alert-scripts called
> in a serialized manner
> would lead to the snmptrap-tool creating timestamps in the order of the
> occurrence 
> of the alerts.
> Having the snmp-manager order the traps by timestamp this would indeed
> lead to
> seeing them in the order they had occured.
> 
> But this approach has a number of drawbacks:
> 
> - it works just when the traps are coming from one node as there is no
> way to serialize
>   over nodes - at least none that would work under all circumstances we
> want alerts
>   to be delivered
> 
> - it distorts the timestamps created even more from the points in time
> when the
>   alert had been triggered - making the result in a multi-node-scenario
> even worse and
>   making it hard to correlate with other sources of information like
> logfiles
> 
> - if you imagine a scenario with multiple mechanisms of delivering an
> alert + multiple
>   recipients we couldn't use a single list but we would need something more
>   complicated to prevent unneeded delays, delays coming from one of the
> delivery
>   methods not working properly due to e.g. a recipient that is not
> reachable, ...
>   (all solvable of course but if it doesn't solve your problem in the
> first place why the effort)
> 
> The alternative approach taken doesn't create the timestamps in the
> scripts but
> provides timestamps to the scripts already.
> This way it doesn't matter if the execution of the script is delayed.
> 
> 
> A short example how this approach could be used with snmp-traps:
> 
> edit pcmk_snmp_helper.sh:
> 
> ...
> starttickfile="/var/run/starttick"
> 
> # hack to have a reference
> # can have it e.g. in an attribute to be visible throughout the cluster
> if [ ! -f ${starttickfile} ] ; then
>         echo ${CRM_alert_timestamp} > ${starttickfile}
> fi
> 
> starttick=`cat ${starttickfile}`
> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
> 
> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" 
> ]] || [[
> ${CRM_alert_task} != "monitor" ]] ; then
>     # This trap is compliant with PACEMAKER MIB
>     # 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
>     /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
> PACEMAKER-MIB::pacemakerNotificationTrap \
>         PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" 
> \
>         PACEMAKER-MIB::pacemakerNotificationResource s 
> "${CRM_alert_rsc}" \
>         PACEMAKER-MIB::pacemakerNotificationOperation s
> "${CRM_alert_task}" \
>         PACEMAKER-MIB::pacemakerNotificationDescription s
> "${CRM_alert_desc}" \
>         PACEMAKER-MIB::pacemakerNotificationStatus i 
> "${CRM_alert_status}" \
>         PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
>         PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
> ${CRM_alert_target_rc} && exit 0 || exit 1
> fi
> 
> exit 0
> ...
> 
> add a section to the cib:
> 
> cibadmin --create --xml-text '<configuration> <alerts> <alert
> id="snmp_traps" 
> path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
> <meta_attributes id="meta_snmp_traps"> <nvpair 
> id="snmp_timestamp"
> name="tstamp_format" value="%s%02N"/> 
> </meta_attributes> <recipient
> id="trap_destination" value="192.168.123.3"/> 
> </alert> </alerts>
> </configuration>'
> 
> 
> This should solve the issue of correct order after being sorted by
> timestamps
> without having the ugly side-effects as described above.
> 
> I hope I understood your scenario correctly and this small example
> points out how I roughly would suggest to cope with the issue.
> 
> Regards,
> Klaus  
>> 
>>  ----
>>  [root at rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy1_stop_0: 
> ok (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy3_stop_0: 
> ok (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy4_stop_0: 
> ok (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy2_stop_0: 
> ok (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy5_stop_0: 
> ok (node=rh72-01, call=41, rc=0, cib-update=60, confirmed=true)
>> 
>>  Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
> <UNKNOWN> [UDP: 
> [192.168.28.170]:40613->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> = Timeticks: (25512486) 2 days, 22:52:04.86#011SNMPv2-MIB::snmpTrapOID.0 = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy3"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
> STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
> STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>  Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
> <UNKNOWN> [UDP: 
> [192.168.28.170]:39581->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> = Timeticks: (25512489) 2 days, 22:52:04.89#011SNMPv2-MIB::snmpTrapOID.0 = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy4"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
> STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
> STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>  Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
> <UNKNOWN> [UDP: 
> [192.168.28.170]:37166->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> = Timeticks: (25512490) 2 days, 22:52:04.90#011SNMPv2-MIB::snmpTrapOID.0 = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy1"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
> STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
> STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>  Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
> <UNKNOWN> [UDP: 
> [192.168.28.170]:53502->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> = Timeticks: (25512494) 2 days, 22:52:04.94#011SNMPv2-MIB::snmpTrapOID.0 = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy2"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
> STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
> STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>>  Apr 25 18:48:50 snmp-manager snmptrapd[6865]: 2016-04-25 18:48:50 
> <UNKNOWN> [UDP: 
> [192.168.28.170]:45956->[192.168.28.189]:162]:#012DISMAN-EVENT-MIB::sysUpTimeInstance 
> = Timeticks: (25512497) 2 days, 22:52:04.97#011SNMPv2-MIB::snmpTrapOID.0 = OID: 
> PACEMAKER-MIB::pacemakerNotificationTrap#011PACEMAKER-MIB::pacemakerNotificationNode 
> = STRING: "rh72-01"#011PACEMAKER-MIB::pacemakerNotificationResource = 
> STRING: "prmDummy5"#011PACEMAKER-MIB::pacemakerNotificationOperation = 
> STRING: "stop"#011PACEMAKER-MIB::pacemakerNotificationDescription = 
> STRING: "ok"#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
> 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 0
>> 
>>  ----
>> 
>>  I think that there is "timestamp" attribute for async by this 
> change.
>> 
>>  The order of traps may be important to a user.
>>  I suggest addition to "alert" element with "orderd" 
> attribute.
>> 
>>   * orderd 
>>      false : The present processing.
>>      true  : Control the transmission order of the trap.
>> 
>>  ----
>>  <configuration>
>>    <alerts>
>>      <alert id="notify_9"
>>  path="/usr/share/pacemaker/tests/pcmk_alert_sample1.sh" 
> ordered="true">
>>  (snip)
>>      </alert>
>>      <alert id="notify_9"
>>  path="/usr/share/pacemaker/tests/pcmk_alert_sample2.sh" 
> ordered="false">
>>  (snip)
>>      </alert>
>>    </alerts>
>>  </configuration>
>> 
>>  ----
>> 
>>  I send a patch to cope with this problem before.
>>  The former patch may be useful for the correction.
>>   * https://github.com/ClusterLabs/pacemaker/pull/847
>> 
>>  I intend to write the patch if everybody agrees to "ordered" 
> attribute.
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>>  _______________________________________________
>>  Users mailing list: Users at clusterlabs.org
>>  http://clusterlabs.org/mailman/listinfo/users
>> 
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Users mailing list