<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    I hope I'll be able to explain the problem clearly and correctly.<br>

    <br>

    My setup (simplified): I have two cloned resources, a filesystem

    mount and a process which writes to that filesystem. The filesystem

    is Gluster so its OK to clone it. I also have a mandatory ordering

    constraint "start gluster-mount-clone then start

    writer-process-clone". I don't have a STONITH device, so I've

    disable STONITH by settin<code></code><code class="literal"></code>g 

    stonith-enabled=false.<br>

    <br>

    The problem: Sometimes the Gluster freezes for a while, which causes

    the gluster-mount resource's monitor with the OCF_CHECK_LEVEL=20 to

    timeout (it is unable to write the status file). When this happens,

    the cluster tries to recover by restarting the writer-process

    resource. But the writer-process is writing to the frozen filesystem

    which makes it uninterruptable, not even SIGKILL works. Then the

    stop operation times out and on-fail with disabled STONITH defaults

    to block (don’t perform any further operations on the resource):<br>

    <tt>warning: Forcing writer-process-clone away from

      node1.example.org after 1000000 failures (max=1000000)<br>

    </tt>After that, the cluster continues with the recovery process by

    restarting the gluster-mount resource on that node and it usually

    succeeds. As a consequence of that remount, the uninterruptable

    system call in the writer process fails, signals are finally

    delivered and the writer-process is terminated. But the cluster

    doesn't know about that!<br>

    <br>

    I thought I can solve this by setting the failure-timeout meta

    attribute to the writer-process resource, but it only made things

    worse. The documentation states: "Stop failures are slightly

    different and crucial. ... If a resource fails to stop and STONITH

    is not enabled, then the cluster has no way to continue and will not

    try to start the resource elsewhere, but will try to stop it again

    after the failure timeout.", but I'm seeing something different.

    When the policy engine is launched after the nearest

    cluster-recheck-interval, following lines are written to the syslog:<br>

    <tt>crmd[11852]: notice: State transition S_IDLE ->

      S_POLICY_ENGINE</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Clearing expired failcount for

      writer-process:1 on node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Clearing expired failcount for

      writer-process:1 on node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Ignoring expired calculated

      failure writer-process_stop_0 (rc=1,

      magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on

      node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Clearing expired failcount for

      writer-process:1 on node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Ignoring expired calculated

      failure writer-process_stop_0 (rc=1,

      magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on

      node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Processing failed op monitor for

      gluster-mount:1 on node1.example.org: unknown error (1)</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Calculated transition 564, saving

      inputs in /var/lib/pacemaker/pengine/pe-input-362.bz2</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: Transition 564 (Complete=2,

      Pending=0, Fired=0, Skipped=0, Incomplete=0,

      Source=/var/lib/pacemaker/pengine/pe-input-362.bz2): Complete</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: State transition S_TRANSITION_ENGINE

      -> S_IDLE</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: State transition S_IDLE ->

      S_POLICY_ENGINE</tt><tt><br>

    </tt><tt>crmd[11852]: warning: No reason to expect node 3 to be down</tt><tt><br>

    </tt><tt>crmd[11852]: warning: No reason to expect node 1 to be down</tt><tt><br>

    </tt><tt>crmd[11852]: warning: No reason to expect node 1 to be down</tt><tt><br>

    </tt><tt>crmd[11852]: warning: No reason to expect node 3 to be down</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Processing failed op stop for

      writer-process:1 on node1.example.org: unknown error (1)</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Processing failed op monitor for

      gluster-mount:1 on node1.example.org: unknown error (1)</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Forcing writer-process-clone away

      from node1.example.org after 1000000 failures (max=1000000)</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Forcing writer-process-clone away

      from node1.example.org after 1000000 failures (max=1000000)</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Forcing writer-process-clone away

      from node1.example.org after 1000000 failures (max=1000000)</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Calculated transition 565, saving

      inputs in /var/lib/pacemaker/pengine/pe-input-363.bz2</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Ignoring expired calculated

      failure writer-process_stop_0 (rc=1,

      magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on

      node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Processing failed op monitor for

      gluster-mount:1 on node1.example.org: unknown error (1)</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: Transition 566 (Complete=0,

      Pending=0, Fired=0, Skipped=0, Incomplete=0,

      Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: State transition S_TRANSITION_ENGINE

      -> S_IDLE</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Calculated transition 566, saving

      inputs in /var/lib/pacemaker/pengine/pe-input-364.bz2</tt><br>

    <br>

    Then after each cluster-recheck-interval:<br>

    <tt>crmd[11852]:  notice: State transition S_IDLE ->

      S_POLICY_ENGINE</tt><tt><br>

    </tt><tt>pengine[11851]:  notice: Ignoring expired calculated

      failure writer-process_stop_0 (rc=1,

      magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on

      node1.example.org</tt><tt><br>

    </tt><tt>pengine[11851]: warning: Processing failed op monitor for

      gluster-mount:1 on node1.example.org: unknown error (1)</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: Transition 567 (Complete=0,

      Pending=0, Fired=0, Skipped=0, Incomplete=0,

      Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete</tt><tt><br>

    </tt><tt>crmd[11852]:  notice: State transition S_TRANSITION_ENGINE

      -> S_IDLE</tt><br>

    <br>

    And the crm_mon is happily showing the writer-process as Started,

    although it is not running. This is very confusing. Could anyone

    please explain what is going on here?<br>

    <p><code></code><code></code> </p>

  </body>

</html>