[ClusterLabs] state file not created for Stateful resource agent

ashutosh tiwari ashutosh.kvas at gmail.com
Wed Mar 21 01:16:29 EDT 2018


Hi,

Thanks for the prompt reply.

creating the directory "/var/run/state" in stateful update solves the issue.

This requirement for directory to preexist should be for both the nodes
right.
But issue is observed only at node where cloned resource is "not" created
with pcs resource create.

Regards,
Ashutosh


On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> Hi,
>
>
> We have two node active/standby cluster with a dummy? Stateful
> resource (pacemaker/Stateful).
>
> We observed that in case one node is up with master resource and
> other node is booted up, state file for dummy resource is not created
> on the node coming up.
>
> /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> e_attributes[@id='status-2']:? <nvpair id="status-2-master-unicloud"
> name="master-unicloud" value="5"/>
> Mar 17 12:22:29 [24875] tigana? ? ? ?lrmd:? ?notice:
> operation_finished:? ? ? ? unicloud_start_0:25729:stderr [
> /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> file or directory ]

The resource agent is ocf:pw:uc -- I assume this is a local
customization of the ocf:pacemaker:Stateful agent?

It looks to me like the /var/run/uc directory is not being created on
the second node. /var/run is a memory filesystem, so it's wiped at
every reboot, and any directories need to be created (as root) before
they are used, every boot.

ocf:pacemaker:Stateful puts its state file directly in /var/run to
avoid needing to create any directories. You can change that by setting
the "state" parameter, but in that case you have to make sure the
directory you specify exists beforehand.

> This issue is not observed in case secondary do not wait for? cib
> sync and starts the resource on secondary as well.
>
> We are in process of upgrading from centos6 to centos7, We never
> observed this issue with centos6 releases.
>
> Attributes for clone resource:?master-max=1 master-node-max=1 clone-
> max=2 clone-node-max=1?
>
> setup under observation is:
>
> CentOS Linux release 7.4.1708 (Core)
> corosync-2.4.0-9.el7.x86_64
> pacemaker-1.1.16-12.el7.x86_64.

On Wed, Mar 21, 2018 at 9:09 AM, <users-request at clusterlabs.org> wrote:

> Send Users mailing list submissions to
>         users at clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>         users-request at clusterlabs.org
>
> You can reach the person managing the list at
>         users-owner at clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>    1. Re: state file not created for Stateful resource agent
>       (Ken Gaillot)
>    2.  symmetric-cluster=false doesn't work (George Melikov)
>    3. Re: state file not created for Stateful resource agent
>       (Jehan-Guillaume de Rorthais)
>    4. Error observed while starting cluster (Roshni Chatterjee)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 20 Mar 2018 13:00:49 -0500
> From: Ken Gaillot <kgaillot at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] state file not created for Stateful
>         resource agent
> Message-ID: <1521568849.5401.7.camel at redhat.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> > Hi,
> >
> >
> > We have two node active/standby cluster with a dummy? Stateful
> > resource (pacemaker/Stateful).
> >
> > We observed that in case one node is up with master resource and
> > other node is booted up, state file for dummy resource is not created
> > on the node coming up.
> >
> > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> > e_attributes[@id='status-2']:? <nvpair id="status-2-master-unicloud"
> > name="master-unicloud" value="5"/>
> > Mar 17 12:22:29 [24875] tigana? ? ? ?lrmd:? ?notice:
> > operation_finished:? ? ? ? unicloud_start_0:25729:stderr [
> > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> > file or directory ]
>
> The resource agent is ocf:pw:uc -- I assume this is a local
> customization of the ocf:pacemaker:Stateful agent?
>
> It looks to me like the /var/run/uc directory is not being created on
> the second node. /var/run is a memory filesystem, so it's wiped at
> every reboot, and any directories need to be created (as root) before
> they are used, every boot.
>
> ocf:pacemaker:Stateful puts its state file directly in /var/run to
> avoid needing to create any directories. You can change that by setting
> the "state" parameter, but in that case you have to make sure the
> directory you specify exists beforehand.
>
> > This issue is not observed in case secondary do not wait for? cib
> > sync and starts the resource on secondary as well.
> >
> > We are in process of upgrading from centos6 to centos7, We never
> > observed this issue with centos6 releases.
> >
> > Attributes for clone resource:?master-max=1 master-node-max=1 clone-
> > max=2 clone-node-max=1?
> >
> > setup under observation is:
> >
> > CentOS Linux release 7.4.1708 (Core)
> > corosync-2.4.0-9.el7.x86_64
> > pacemaker-1.1.16-12.el7.x86_64.
> >
> >
> > Thanks and Regards,
> > Ashutosh?
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> > pdf
> > Bugs: http://bugs.clusterlabs.org
> --
> Ken Gaillot <kgaillot at redhat.com>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 20 Mar 2018 22:03:18 +0300
> From: George Melikov <mail at gmelikov.ru>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed <users at clusterlabs.org>
> Subject: [ClusterLabs]  symmetric-cluster=false doesn't work
> Message-ID: <260041521572598 at web47j.yandex.ru>
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL: <https://lists.clusterlabs.org/pipermail/users/
> attachments/20180320/a4e12349/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 20 Mar 2018 21:18:29 +0100
> From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] state file not created for Stateful
>         resource agent
> Message-ID: <20180320211829.46cfc4c5 at firost>
> Content-Type: text/plain; charset=UTF-8
>
> On Tue, 20 Mar 2018 13:00:49 -0500
> Ken Gaillot <kgaillot at redhat.com> wrote:
>
> > On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> > > Hi,
> > >
> > >
> > > We have two node active/standby cluster with a dummy? Stateful
> > > resource (pacemaker/Stateful).
> > >
> > > We observed that in case one node is up with master resource and
> > > other node is booted up, state file for dummy resource is not created
> > > on the node coming up.
> > >
> > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> > > e_attributes[@id='status-2']:? <nvpair id="status-2-master-unicloud"
> > > name="master-unicloud" value="5"/>
> > > Mar 17 12:22:29 [24875] tigana? ? ? ?lrmd:? ?notice:
> > > operation_finished:? ? ? ? unicloud_start_0:25729:stderr [
> > > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> > > file or directory ]
> >
> > The resource agent is ocf:pw:uc -- I assume this is a local
> > customization of the ocf:pacemaker:Stateful agent?
> >
> > It looks to me like the /var/run/uc directory is not being created on
> > the second node. /var/run is a memory filesystem, so it's wiped at
> > every reboot, and any directories need to be created (as root) before
> > they are used, every boot.
> >
> > ocf:pacemaker:Stateful puts its state file directly in /var/run to
> > avoid needing to create any directories. You can change that by setting
> > the "state" parameter, but in that case you have to make sure the
> > directory you specify exists beforehand.
>
> Another way to create the folder at each boot is to ask systemd.
>
> Eg.:
>
>   cat <<EOF > /etc/tmpfiles.d/ocf-pw-uc.conf
>   # Directory for ocf:pw:uc resource agent
>   d /var/run/uc 0700 root root - -
>   EOF
>
> Adjust the rights and owner to suit your need.
>
> To take this file in consideration immediately without rebooting the
> server,
> run the following command:
>
>   systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf
>
> Regards,
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 20 Mar 2018 05:59:12 +0000
> From: Roshni Chatterjee <roshni.chatterjee at india.nec.com>
> To: "users at clusterlabs.org" <users at clusterlabs.org>
> Subject: [ClusterLabs] Error observed while starting cluster
> Message-ID:
>         <OSAPR01MB1683C4A00B05E1D0C60D4B27A0AB0 at OSAPR01MB1683.
> jpnprd01.prod.outlook.com>
>
> Content-Type: text/plain; charset="iso-2022-jp"
>
> Hi ,
>
> Error observed in pacemaker and pcs status
> Error: cluster is not currently running on this node
>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> I have built the source  code of corosync (2.4.2) and pacemaker (1.1.16)
> and  have followed the below steps for building a 2 node cluster .
>
>
> 1.       Download source  code of corosync and pacemaker (versions as
> mentioned above ) and compile .
>
> 2.       Install pcsd using ?yum install pcs?
>
> 3.       Allow cluster services through firewall using #firewall-cmd
> --permanent --add-service=high-availability
>
> 4.       Start and enable pcsd #systemctl start pcsd and #systemctl enable
> pcsd
>
> 5.       Change password for user hacluster
>
> 6.       pcs cluster auth pcmk3 node2
>
> 7.       pcs cluster setup --name mycluster pcmk3 node2
>
> 8.       pcs cluster start -all
>
> 9.       pcs status
>
>
> It is observed that the no error is received till step 8 . At step 9 when
> pcs status is checked error is received (highlighted below)
> [root at node2 ~]# pacemakerd --features
> Pacemaker 1.1.16 (Build: 94ff4df51a)
> Supporting v3.0.11:  agent-manpages libqb-logging libqb-ipc nagios
> corosync-native atomic-attrd acls
> [root at node2 ~]# pcs cluster start --all
> pcmk3: Starting Cluster...
> node2: Starting Cluster...
> [root at node2 ~]# pcs status
> Error: cluster is not currently running on this node
>
> On checking pacemaker status the following issue is found -
> [root at pcmk3 ~]# systemctl pacemaker status -l
> Unknown operation 'pacemaker'.
> [root at pcmk3 ~]# systemctl status pacemaker -l
> ? pacemaker.service - Pacemaker High Availability Cluster Manager
>    Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled;
> vendor preset: disabled)
>    Active: active (running) since Tue 2018-03-20 10:55:44 IST; 13min ago
>      Docs: man:pacemakerd
>            http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/
> Pacemaker_Explained/index.html
> Main PID: 26932 (pacemakerd)
>    CGroup: /system.slice/pacemaker.service
>            ??26932 /usr/sbin/pacemakerd -f
>            ??26933 /usr/libexec/pacemaker/cib
>            ??26934 /usr/libexec/pacemaker/stonithd
>            ??26935 /usr/libexec/pacemaker/lrmd
>            ??26936 /usr/libexec/pacemaker/attrd
>            ??26937 /usr/libexec/pacemaker/pengine
>
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child
> process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27035) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child
> process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27036) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child
> process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27037) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child
> process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27038) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: Child respawn count
> exceeded by crmd
> Mar 20 10:56:21 pcmk3 cib[26933]:    error: Operation ignored, cluster
> configuration is invalid. Please repair and restart: Update does not
> conform to the configured schema
> [root at pcmk3 ~]#
>
> Corosync.log
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: start_child:
> Forked child 27035 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error: pcmk_child_exit:
> The crmd process (27035) exited: Key has expired (127)
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice: pcmk_process_exit:
> Respawning failed child process: crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: start_child:
> Using uid=189 and group=189 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: start_child:
> Forked child 27036 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error: pcmk_child_exit:
> The crmd process (27036) exited: Key has expired (127)
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice: pcmk_process_exit:
> Respawning failed child process: crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: start_child:
> Using uid=189 and group=189 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: start_child:
> Forked child 27037 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info: mcp_cpg_deliver:
> Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error: pcmk_child_exit:
> The crmd process (27037) exited: Key has expired (127)
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice: pcmk_process_exit:
> Respawning failed child process: crmd
>
>
> Regards,
> Roshni
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://lists.clusterlabs.org/pipermail/users/
> attachments/20180320/983cf2bc/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Users mailing list
> Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
>
> ------------------------------
>
> End of Users Digest, Vol 38, Issue 43
> *************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180321/97095f0f/attachment-0002.html>


More information about the Users mailing list