[ClusterLabs] About the log indicating RA execution

Ken Gaillot kgaillot at redhat.com
Thu Jul 2 11:09:07 EDT 2020

On Thu, 2020-07-02 at 18:12 +0900, 井上和徳 wrote:
> Hi all,
> We think it is desirable to output the log indicating the start and
> finish of RA execution to syslog on the same node. (End users
> monitoring the syslog are requesting that the output be on the same
> node.)
> Currently, the start[1] and finish[2] logs may be output by different
> nodes.
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: r81-2 (version 2.0.4-556cef416) - partition with
> quorum
>   * Last updated: Thu Jul  2 12:42:17 2020
>   * Last change:  Thu Jul  2 12:42:13 2020 by hacluster via crmd on
> r81-2
>   * 2 nodes configured
>   * 3 resource instances configured
> Node List:
>   * Online: [ r81-1 r81-2 ]
> Full List of Resources:
>   * dummy1      (ocf::pacemaker:Dummy):  Started r81-1
>   * fence1-ipmilan      (stonith:fence_ipmilan):         Started r81-
> 2
>   * fence2-ipmilan      (stonith:fence_ipmilan):         Started r81-
> 1
> *1
> Jul  2 12:42:15 r81-2 pacemaker-controld[18009]:
>  notice: Initiating start operation dummy1_start_0 on r81-1
> *2
> Jul  2 12:42:15 r81-1 pacemaker-controld[10109]:
>  notice: Result of start operation for dummy1 on r81-1: ok

Some background for readers who might not be familiar:

The "Initiating" message comes from the Designated Controller (DC)
node, which decides what actions need to be done, then asks the
appropriate nodes to do them.

The "Result of" message comes from the node executing the action.

> As a suggestion,
> 1) change the following log levels to NOTICE and output the start and
>    finish logs to syslog on the node where RA was executed.
> Jul 02 12:42:15 r81-1 pacemaker-execd     [10106] (log_execute)
>   info: executing - rsc:dummy1 action:start call_id:10
> Jul 02 12:42:15 r81-1 pacemaker-execd     [10106] (log_finished)
>   info: dummy1 start (call 10, PID 10164) exited with status 0
> (execution time 91ms, queue time 0ms)
> 2) alternatively, change the following log levels to NOTICE and
>    output a log indicating the finish at the DC node.
> Jul 02 12:42:15 r81-2 pacemaker-controld  [18009]
> (process_graph_event)
>   info: Transition 2 action 7 (dummy1_start_0 on r81-1) confirmed: ok
> > rc=0 call-id=10

It's a tricky balance deciding which logs go to syslog (by default
that's "notice" level and higher) and which go only to the pacemaker
detail log. Pacemaker logs can already be overwhelming, but clustering
is inherently complex and lots of information is needed to understand
problems in depth.

We want enough information in the syslog for the average user to
understand what the cluster is doing and troubleshoot problems with the
clustered services, while the detail log helps with more complex
problems, especially with the cluster software itself.

Of your two recommendations, I like (2) better, because it reports the
complete round-trip of the whole execution process. If something goes
wrong on the DC side, (1) might not give any information about it.

However I'd want to reword the message in (2) to be more user-friendly, 
something like:

(in syslog)
Received result of start operation for dummy1 on r81-1: ok

(in detail log)
Received result of start operation for dummy1 on r81-1: ok | Transition
2 action 7 (dummy1_start_0) rc=0 call-id=10

> What do you think about this? (Do you have a better idea?)
> Best Regards,
> Kazunori INOUE
Ken Gaillot <kgaillot at redhat.com>

More information about the Users mailing list