[Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server

Wed Nov 30 11:22:23 UTC 2011

30.11.2011 14:08, Vadim Bulst wrote:
> Hello,
> 
> first of all I'd like to ask you a general question:
> 
> Does somebody successfully set up a clvm cluster with pacemaker and run
> it in productive mode?

I will say yes after I finally resolve remaining dlm&fencing issues.

> 
> Now back to the concrete problem:
> 
>  I configured two interfaces for corosync:
> 
> root at bbzclnode04:~# corosync-cfgtool -s
> Printing ring status.
> Local node ID 897624256
> RING ID 0
>     id    = 192.168.128.53
>     status    = ring 0 active with no faults
> RING ID 1
>     id    = 192.168.129.23
>     status    = ring 1 active with no faults
> 
> RRD set to passive
> 
> I also made some changes to my cib:
> 
> node bbzclnode04
> node bbzclnode06
> node bbzclnode07
> primitive clvm ocf:lvm2:clvmd \
>     params daemon_timeout="30" \
>     meta target-role="Started"

Please instruct clvmd to use corosync stack instead of openais (-I
corosync): otherwise it uses LCK service which is not mature and I
observed major problems with it.

> primitive dlm ocf:pacemaker:controld \
>     meta target-role="Started"
> group dlm-clvm dlm clvm
> clone dlm-clvm-clone dlm-clvm \
>     meta interleave="true" ordered="true"
> property $id="cib-bootstrap-options" \
>     dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>     cluster-infrastructure="openais" \
>     expected-quorum-votes="3" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     last-lrm-refresh="1322643084"
> 
> I cleaned and restarted the resources - nothing! :
> 
> crm(live)resource# cleanup dlm-clvm-clone
> Cleaning up dlm:0 on bbzclnode04
> Cleaning up dlm:0 on bbzclnode06
> Cleaning up dlm:0 on bbzclnode07
> Cleaning up clvm:0 on bbzclnode04
> Cleaning up clvm:0 on bbzclnode06
> Cleaning up clvm:0 on bbzclnode07
> Cleaning up dlm:1 on bbzclnode04
> Cleaning up dlm:1 on bbzclnode06
> Cleaning up dlm:1 on bbzclnode07
> Cleaning up clvm:1 on bbzclnode04
> Cleaning up clvm:1 on bbzclnode06
> Cleaning up clvm:1 on bbzclnode07
> Cleaning up dlm:2 on bbzclnode04
> Cleaning up dlm:2 on bbzclnode06
> Cleaning up dlm:2 on bbzclnode07
> Cleaning up clvm:2 on bbzclnode04
> Cleaning up clvm:2 on bbzclnode06
> Cleaning up clvm:2 on bbzclnode07
> Waiting for 19 replies from the CRMd................... OK
> 
> crm_mon:
> 
> ============
> Last updated: Wed Nov 30 10:15:09 2011
> Stack: openais
> Current DC: bbzclnode04 - partition with quorum
> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> 3 Nodes configured, 3 expected votes
> 1 Resources configured.
> ============
> 
> Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ]
> 
> 
> Failed actions:
>     clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete):
> unknown error
>     clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete):
> unknown error
>     clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete):
> unknown error
> 
> 
> When I look in the log - there is a message which tells me that may be
> another clvm process is already running - but it isn't so.
> 
> "clvmd could not create local socket Another clvmd is probably already
> running"
> 
> Or is it a permission problem - writing to the filesystem? Is there a
> way to get rid of it?

You can try to run it manually under strace. It will show you what happens.

> 
> Shell I use a different distro - our install from source?
> 
> 
> Am 24.11.2011 22:59, schrieb Andreas Kurz:
>> Hello,
>>
>> On 11/24/2011 10:12 PM, Vadim Bulst wrote:
>>> Hi Andreas,
>>>
>>> I changed my cib:
>>>
>>> node bbzclnode04
>>> node bbzclnode06
>>> node bbzclnode07
>>> primitive clvm ocf:lvm2:clvmd \
>>>         params daemon_timeout="30"
>>> primitive dlm ocf:pacemaker:controld
>>> group g_lock dlm clvm
>>> clone g_lock-clone g_lock \
>>>         meta interleave="true"
>>> property $id="cib-bootstrap-options" \
>>>         dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>>>         cluster-infrastructure="openais" \
>>>         expected-quorum-votes="3" \
>>>         no-quorum-policy="ignore" \
>>>         stonith-enabled="false" \
>>>         last-lrm-refresh="1322049979
>>>
>>> but no luck at all.
>> I assume you did at least a cleanup on clvm and it still does not work
>> ... next step would be to grep for ERROR in your cluster log and look
>> for other suspicious messages to find out why clvm is not that motivated
>> to start.
>>
>>> "And use Corosync 1.4.x with redundant rings and automatic ring recovery
>>> feature enabled."
>>>
>>> I got two interfaces per server - there are bonded together and bridged
>>> for virtualization.  Only one untagged vlan. I tried to give a tagged
>>> Vlan Bridge a Address but didn't worked. My network conf looks like that:
>> One ore two extra nics are quite affordable today to build e.g. a direct
>> connection between the nodes (if possible)
>>
>> Regards,
>> Andreas
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> -- 
> Mit freundlichen Grüßen
> 
> Vadim Bulst
> Systemadministrator BBZ
> 
> Biotechnologisch-Biomedizinisches Zentrum
> Universität Leipzig
> Deutscher Platz 5, 04103 Leipzig
> Tel.: 0341 97 - 31 307
> Fax : 0341 97 - 31 309
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org