[Pacemaker] Help with DRBD resources on Pacemaker

Thu Aug 23 17:07:05 EDT 2012

Hi.

I configured one cluster using one resouce of DRBD and it works fine, 
althoug I had to downgrade the kernel to the version 2.6.37.6 to get it 
stable.
Now I´m configuring another cluster but two DRBD resources are required. 
They were created and if I use the system init script to start them both get 
up

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

When I try to start them using the cluster stack i get some random results 
like this one:

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo ]
     Stopped: [ resDRBD_1:1 ]

This was solved with a cleanup on the resource resDRBD_1:1

crm(live)resource# cleanup resDRBD_1:1
Cleaning up resDRBD_1:1 on apolo
Cleaning up resDRBD_1:1 on diana
Waiting for 3 replies from the CRMd... OK
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo diana ]

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Trying to test a litle more the configuration I took down the resources and 
tryed to get them up again

crm(live)resource# stop resDRBD_1
crm(live)resource# stop resDRBD_0
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Stopped: [ resDRBD_0:0 resDRBD_0:1 ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Stopped: [ resDRBD_1:0 resDRBD_1:1 ]
crm(live)resource# start resDRBD_1
crm(live)resource# start resDRBD_0
crm(live)resource# status
 Master/Slave Set: msDRBD_0 [resDRBD_0]
     Masters: [ apolo diana ]
 Master/Slave Set: msDRBD_1 [resDRBD_1]
     Masters: [ apolo diana ]

The crm says its all ok but if you go on the command line what you see is a 
split brain:

apolo:~ # cat /proc/drbd
version: 8.3.9 (api:88/proto:86-95)
srcversion: A67EB2D25C5AFBFF3D8B788
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:680 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Those are my DRBD resource configurations and the cluster configuration:
https://dl.dropbox.com/u/96446079/backup.res
https://dl.dropbox.com/u/96446079/export.res
https://dl.dropbox.com/u/96446079/crm_resources.txt

Can you help me to fix this issue?

Best regards,
Carlos Xavier.