[ClusterLabs] RES: RES: Pacemaker and OCFS2 on stand alone mode

Carlos Xavier cbastos at connection.com.br
Thu Jul 7 19:18:34 UTC 2016



Hi

> 
> dlm_tool dump ?
>

It gives a empty return for this command, since we do not have the dlm resource started yet.

I tried then the dlm_controld.pcmk command and this is the result.

apolo:~ # dlm_controld.pcmk -D
cluster-dlm[4616]: main: dlm_controld master started

1467918891 dlm_controld master started
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_clm' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_evt' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_ckpt' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_amf_v2' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_msg' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_lck' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'openais_tmr' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: config_find_next: Processing additional service options...
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found 'pacemaker' for option: name
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_config_opt: Found '0' for option: ver
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_cluster_type: Detected an active 'classic openais (with plugin)' cluster
cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_classic: Creating connection to our Corosync plugin
cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_classic: AIS connection established
cluster-dlm[4616]: 2016/07/07_16:14:51 info: get_ais_nodeid: Server details: id=16845322 uname=apolo cname=pcmk
cluster-dlm[4616]: 2016/07/07_16:14:51 info: init_ais_connection_once: Connection to 'classic openais (with plugin)': established
cluster-dlm[4616]: 2016/07/07_16:14:51 info: crm_new_peer: Node apolo now has id: 16845322
cluster-dlm[4616]: 2016/07/07_16:14:51 info: crm_new_peer: Node 16845322 is now known as apolo
1467918891 Is dlm missing from kernel? No misc devices found.
1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1467918891 No /sys/kernel/config, is configfs loaded?
1467918891 shutdown
1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
cluster-dlm[4616]: 2016/07/07_16:14:51 notice: terminate_ais_connection: Disconnecting from AIS



I even tried to remove this constraints from the cluster configuration, in order to see if the DLM resource get up but with no good
result

colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master
order ordDRBDDLM 0: msDRBD_01:promote cloneDLM:start


Tank you.
Carlos

 
> 2016-07-07 18:57 GMT+02:00 Carlos Xavier <cbastos at connection.com.br>:
> > Tank you for the fast reply
> >
> >>
> >> have you configured the stonith and drbd stonith handler?
> >>
> >
> > Yes. they were configured.
> > The cluster was running fine for more than 4 years, until we loose one host by power supply failure.
> > Now I need to access the files on the host that is working.
> >
> >> 2016-07-07 16:43 GMT+02:00 Carlos Xavier <cbastos at connection.com.br>:
> >> > Hi.
> >> > We had a Pacemaker cluster running OCFS2 filesystem over a DRBD
> >> > device and we completely lost one of
> >> the hosts.
> >> > Now I need some help to recover the data on the remaining machine.
> >> > I was able to load the DRBD module by hand bring up the devices using the drbdadm command line:
> >> > apolo:~ # modprobe drbd
> >> > apolo:~ # cat /proc/drbd
> >> > version: 8.3.9 (api:88/proto:86-95)
> >> > srcversion: A67EB2D25C5AFBFF3D8B788
> >> >
> >> > apolo:~ # drbd-overview
> >> >   0:backup
> >> >   1:export
> >> > apolo:~ # drbdadm attach backup
> >> > apolo:~ # drbdadm attach export
> >> > apolo:~ # drbd-overview
> >> >   0:backup  StandAlone Secondary/Unknown UpToDate/DUnknown r-----
> >> >   1:export  StandAlone Secondary/Unknown UpToDate/DUnknown r-----
> >> > apolo:~ # drbdadm primary backup apolo:~ # drbdadm primary export apolo:~ # drbd-overview
> >> >   0:backup  StandAlone Primary/Unknown   UpToDate/DUnknown r-----
> >> >   1:export  StandAlone Primary/Unknown UpToDate/DUnknown r-----
> >> >
> >> > We have these resources and constraints configured:
> >> > primitive resDLM ocf:pacemaker:controld \
> >> >         op monitor interval="120s"
> >> > primitive resDRBD_0 ocf:linbit:drbd \
> >> >         params drbd_resource="backup" \
> >> >         operations $id="resDRBD_0-operations" \
> >> >         op start interval="0" timeout="240" \
> >> >         op stop interval="0" timeout="100" \
> >> >         op monitor interval="20" role="Master" timeout="20" \
> >> >         op monitor interval="30" role="Slave" timeout="20"
> >> > primitive resDRBD_1 ocf:linbit:drbd \
> >> >         params drbd_resource="export" \
> >> >         operations $id="resDRBD_1-operations" \
> >> >         op start interval="0" timeout="240" \
> >> >         op stop interval="0" timeout="100" \
> >> >         op monitor interval="20" role="Master" timeout="20" \
> >> >         op monitor interval="30" role="Slave" timeout="20"
> >> > primitive resFS_BACKUP ocf:heartbeat:Filesystem \
> >> >         params device="/dev/drbd/by-res/backup" directory="/backup"
> >> > fstype="ocfs2" options="rw,noatime" \
> >> >         op monitor interval="120s"
> >> > primitive resFS_EXPORT ocf:heartbeat:Filesystem \
> >> >         params device="/dev/drbd/by-res/export" directory="/export"
> >> > fstype="ocfs2" options="rw,noatime" \
> >> >         op monitor interval="120s"
> >> > primitive resO2CB ocf:ocfs2:o2cb \
> >> >         op monitor interval="120s"
> >> > group DRBD_01 resDRBD_0 resDRBD_1
> >> > ms msDRBD_01 DRBD_01 \
> >> >         meta resource-stickines="100" notify="true" master-max="2"
> >> > interleave="true" target-role="Started"
> >> > clone cloneDLM resDLM \
> >> >         meta globally-unique="false" interleave="true"
> >> > target-role="Started"
> >> > clone cloneFS_BACKUP resFS_BACKUP \
> >> >         meta interleave="true" ordered="true" target-role="Started"
> >> > clone cloneFS_EXPORT resFS_EXPORT \
> >> >         meta interleave="true" ordered="true" target-role="Started"
> >> > clone cloneO2CB resO2CB \
> >> >         meta globally-unique="false" interleave="true"
> >> > target-role="Started"
> >> > colocation colDLMDRBD inf: cloneDLM msDRBD_01:Master colocation
> >> > colFS_BACKUP-O2CB inf: cloneFS_BACKUP cloneO2CB colocation
> >> > colFS_EXPORT-O2CB inf: cloneFS_EXPORT cloneO2CB colocation
> >> > colO2CBDLM inf: cloneO2CB cloneDLM order
> >> ordDLMO2CB 0: cloneDLM cloneO2CB order ordDRBDDLM 0:
> >> msDRBD_01:promote cloneDLM:start order ordO2CB- FS_BACKUP 0: cloneO2CB cloneFS_BACKUP order
> ordO2CB-FS_EXPORT 0:
> >> > cloneO2CB cloneFS_EXPORT
> >> >
> >> > As the DRBD devices were brought up by hand, Pacemaker doesn't
> >> > recognize they are up and so it doesn't start the DLM resource and
> >> > all resources that depends on it
> >> stay stopped.
> >> > Is there any way I can circumvent this issue?
> >> > Is it possible to bring the OCFS2 resources working on standalone mode?
> >> > Please, any help will be very welcome.
> >> >
> >> > Best regards,
> >> > Carlos.






More information about the Users mailing list