[Pacemaker] How can I wait for a device to be ready?

Wed May 20 00:03:01 EDT 2015

> On 15 May 2015, at 3:55 am, Carlos Xavier <cbastos at connection.com.br> wrote:
> 
> Hi.
> 
> I are doing some testes with OCFS2 running on a AOE shared disk.
> The tests are going on a OpenSuse 12.3 with the following packages:
> ocfs2-tools-o2cb-1.8.2-4.8.1.x86_64
> ocfs2-tools-1.8.2-4.8.1.x86_64
> corosync-1.4.3-4.1.1.x86_64
> libcorosync4-1.4.3-4.1.1.x86_64
> libopenais3-1.1.4-15.1.1.x86_64
> openais-1.1.4-15.1.1.x86_64
> aoetools-35-3.1.x86_64
> 
> Thinking about the availability of the device, I created a ping resource with the aim to get the filesystem mounted only after the
> disk provider is available.
> 
> This is my test configuration and it works fine if I start the openais by hand after the system is up with all modules loaded.
> 
> node cluster-1
> node cluster-2
> primitive p_ping ocf:pacemaker:ping \
>        params name="p_ping" host_list="172.31.0.199" multiplier="1000" debug="true" \
>        op start interval="0" timeout="60" \
>        op monitor interval="10s" timeout="60"
> primitive resDLM ocf:pacemaker:controld \
>        op monitor interval="120s"
> primitive resFS_BACKUP ocf:heartbeat:Filesystem \
>        params device="/dev/etherd/e4.1p1" directory="/backup" fstype="ocfs2" options="rw,noatime" \
>        op monitor interval="120s"
> primitive resO2CB ocf:ocfs2:o2cb \
>        op monitor interval="120s"
> clone cl_ping p_ping \
>        meta target-role="Started"
> clone cloneDLM resDLM \
>        meta globally-unique="false" interleave="true" target-role="Started"
> clone cloneFS_BACKUP resFS_BACKUP \
>        meta interleave="true" ordered="true" target-role="Started"
> clone cloneO2CB resO2CB \
>        meta globally-unique="false" interleave="true" target-role="Started"
> colocation colFS_BACKUP-PING inf: cloneFS_BACKUP cl_ping
> colocation colO2CBDLM inf: cloneO2CB cloneDLM
> colocation colPING-O2CB inf: cl_ping cloneO2CB
> order ordDLMO2CB 0: cloneDLM cloneO2CB
> order ordO2CB-PING 0: cloneO2CB cl_ping
> order ordPING-FS_BACKUP 0: cl_ping cloneFS_BACKUP
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore" \
>        last-lrm-refresh="1431606872"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
> #vim:set syntax=pcmk
> 
> However, if I let the machine reboot and start the openais during the boot process it wont work because the AOE disk is inserted on
> the system way after the Pacemaker cluster get started and the ping resource is not holding the start process

Modify the pacemaker unit file to include Requires=whatever.inserts.the.aoe.disk perhaps?

> 
> 2015-05-14T06:52:31.704941-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 probe[2] (pid 1363)
> 2015-05-14T06:52:31.706465-03:00 cluster-2 lrmd: [1272]: info: rsc:resO2CB:1 probe[3] (pid 1364)
> 2015-05-14T06:52:31.708459-03:00 cluster-2 lrmd: [1272]: info: rsc:p_ping:1 probe[4] (pid 1365)
> 2015-05-14T06:52:31.710244-03:00 cluster-2 lrmd: [1272]: info: rsc:resFS_BACKUP:1 probe[5] (pid 1366)
> 2015-05-14T06:52:31.976245-03:00 cluster-2 lrmd: [1272]: info: operation monitor[4] on p_ping:1 for client 1275: pid 1
> 365 exited with return code 7
> 2015-05-14T06:52:31.998091-03:00 cluster-2 lrmd: [1272]: info: operation monitor[2] on resDLM:1 for client 1275: pid 1
> 363 exited with return code 7
> 2015-05-14T06:52:31.998122-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation p_ping:1_monitor_0 (ca
> ll=4, rc=7, cib-update=8, confirmed=true) not running
> 2015-05-14T06:52:31.998129-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:monitor:stderr) dlm_controld.pcmk:
> no process found
> 2015-05-14T06:52:31.999380-03:00 cluster-2 o2cb(resO2CB:1)[1364]: [1400]: INFO: configfs not laoded
> 2015-05-14T06:52:32.000362-03:00 cluster-2 Filesystem(resFS_BACKUP:1)[1366]: [1401]: WARNING: Couldn't find device [/d
> ev/etherd/e4.1p1]. Expected /dev/??? to exist
> 2015-05-14T06:52:32.020263-03:00 cluster-2 lrmd: [1272]: info: operation monitor[3] on resO2CB:1 for client 1275: pid 
> 1364 exited with return code 7
> 2015-05-14T06:52:32.020297-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resDLM:1_monitor_0 (ca
> ll=2, rc=7, cib-update=9, confirmed=true) not running
> 2015-05-14T06:52:32.040191-03:00 cluster-2 lrmd: [1272]: info: operation monitor[5] on resFS_BACKUP:1 for client 1275:
> pid 1366 exited with return code 7
> 2015-05-14T06:52:32.041092-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resO2CB:1_monitor_0 (c
> all=3, rc=7, cib-update=10, confirmed=true) not running
> 2015-05-14T06:52:32.065341-03:00 cluster-2 crmd: [1275]: info: process_lrm_event: LRM operation resFS_BACKUP:1_monitor
> _0 (call=5, rc=7, cib-update=11, confirmed=true) not running
> 2015-05-14T06:52:32.069975-03:00 cluster-2 attrd: [1273]: notice: attrd_trigger_update: Sending flush op to all hosts 
> for: probe_complete (true)
> 2015-05-14T06:52:32.071782-03:00 cluster-2 lrmd: [1272]: info: rsc:resDLM:1 start[6] (pid 1455)
> 2015-05-14T06:52:32.093820-03:00 cluster-2 lrmd: [1272]: info: RA output: (resDLM:1:start:stderr) dlm_controld.pcmk: n
> o process found
> 2015-05-14T06:52:32.118738-03:00 cluster-2 systemd[1]: Mounting Configuration File System...
> 2015-05-14T06:52:32.124161-03:00 cluster-2 systemd[1]: Mounted Configuration File System.
> 2015-05-14T06:52:32.125353-03:00 cluster-2 mount[1468]: mount: configfs is already mounted or /sys/kernel/config busy
> 2015-05-14T06:52:32.126138-03:00 cluster-2 systemd[1]: sys-kernel-config.mount mount process exited, code=exited statu
> s=32
> 
> .
> .
> .
> 
> 2015-05-14T08:54:26.319506-03:00 cluster-2 systemd-logind[509]: New session 1 of user root.
> 2015-05-14T08:54:28.494462-03:00 cluster-2 kernel: [   65.504178] aoe: e4.1: setting 1024 byte data frames
> 2015-05-14T08:54:28.494489-03:00 cluster-2 kernel: [   65.504359] aoe: 5cd998b17867 e4.1 v400f has 1953525168 sectors
> 2015-05-14T08:54:28.495400-03:00 cluster-2 kernel: [   65.505138]  etherd/e4.1: p1
> 
> So the question is, how can I make the Pacemaker wait for a device to be ready before trying to use it?
> 
> Regards,
> Carlos. 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org