[Pacemaker] drbd 8.3.2 stacked resources controlled by pacemaker

Tue Sep 22 03:41:36 EDT 2009

Hello all,

I'm trying to build an architecture that I'm not 100% sure will work so 
I need some input on the matter. The design is:
- 4 servers (now running on test xen virtual machines)
- the servers will be divided in two separate geographical locations, 2 
servers in site A and 2 servers in site B
- each pair of servers from the same site will form a cluster
- each cluster from both sites will be linked as a stacked resource 
(stacked-on-top-of)

For this setup I'm using drbd-8.3.2-3.x86_64.rpm, 
heartbeat-3.0.0-33.2.x86_64.rpm, openais-0.80.5-15.1.x86_64.rpm, 
resource-agents-1.0-31.4.x86_64.rpm and their respective dependencies.

One question that popped to my mind was that pacemaker is using 
multicast, so the connection between the two sites (if it is done 
through the public internet) should involve multicast routing? What 
bandwidth requirements should be for this kind of setup? I'm assuming 
that the high end requirements will be on the drbd replication part, but 
also I'm interested if there are any specific requirements related to 
latency, delay and (probably) jitter on the multicast connection as it 
uses (correct me if I'm wrong) UDP as a transport?

So far I've configured all 4 virtual machines with drbd, created 
"normal" resources, stacked resources and all seems to be in order, the 
problem is that I don't know how does pacemaker deal with stacked 
resources.

I mean, the goal here is to have one service, let's say apache, run on a 
device on the stacked resource and to be handled by pacemaker so that 
the apache server either runs on primary stacked resource on site A, or 
fails over on stacked resource on site B, and underneath all that, if in 
site A one of the two servers fails, it switches to the other one and 
the same should happen in site B.

I see it like a RAID-1 "array" on top of two RAID-1 "arrays", if you 
understand the analogy.

cat /etc/ais/openais.conf
aisexec {
        user: root
        group: root
}

amf {
        mode: disabled
}

logging {
        to_stderr: yes
        debug: off
        timestamp: on
        to_file: no
        to_syslog: yes
        syslog_facility: daemon
}

totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 1500
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: off
        threads: 0
        # nodeid: 1234
        rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.1.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }

        interface {
                ringnumber: 1
                bindnetaddr: 172.16.0.0
                mcastaddr: 226.94.1.1
                mcastport: 5406
        }
}

service {
        ver: 0
        name: pacemaker
        use_mgmtd: no
}

A question on the openais.conf, these two "rings", do they have to be on 
separate subnets or can they be on the same subnet. On the same issue, 
should one interface declaration be linked to the subnet where drbd is 
also running and the other declaration to be linked to pacemaker, or it 
doesn't matter which goes where?

In /etc/drbd.conf the stacked resources look like this:

resource "st1" {
  protocol C;

  on srv1 {
    device     /dev/drbd0;
    disk       /dev/xvda3;
    address    172.16.0.1:7700;
    flexible-meta-disk  internal;
  }

  on srv2 {
    device    /dev/drbd0;
    disk      /dev/xvda3;
    address   172.16.0.2:7700;
    flexible-meta-disk  internal;
  }
}

resource "st2" {
  protocol C;

  on srv3 {
    device     /dev/drbd0;
    disk       /dev/xvda3;
    address    172.16.0.3:7701;
    flexible-meta-disk  internal;
  }

  on srv4 {
    device     /dev/drbd0;
    disk       /dev/xvda3;
    address    172.16.0.4:7701;
    flexible-meta-disk  internal;
  }
}

resource "stacked" {
  protocol      C;
  stacked-on-top-of st1 {
    device      /dev/drbd10;
    address     172.16.0.1:7704;
  }

  stacked-on-top-of st2 {
    device      /dev/drbd10;
    address     172.16.0.3:7704;
  }
}

The cat /proc/drbd shows the resources both normal and stacked are up 
and running:

 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:1 nr:0 dw:1 dr:110 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

10: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Now in order for me to start the resources, I have to perform the start 
process manually, involving /etc/init.d/drbd start, which I have read in 
the docs that isn't recommended if I want to use OCF style resources, 
which I do.
I first start drbd, this loads the normal resources, which stay in a 
"cs:Connected ro:Secondary/Secondary" state until I do a "drbdadm 
--primary all". Then I have to do a "drbdadm --stacked up stacked" on 
the cluster members I've assigned as primary, then the stacked resources 
go into a "cs:Connected ro:Secondary/Secondary" state until I, again, 
set one of them as primary.
So the question is, how do I get pacemaker to start both normal and 
stacked resources, promote normal and stacked resources as primary and 
then start a web server on the stacked resource that becomes primary.

Any relevant documentation links or examples are more than welcome, as 
well as ideas and suggestions.

Thank you in advance,
Dan.

-- 
Dan FRINCU
Streamwide Romania
E-mail: dfrincu at streamwide.ro