[Pacemaker] cloned IPaddr2 on 4 nodes
Vladimir Legeza
vladimir.legeza at gmail.com
Tue Nov 2 12:19:45 EDT 2010
Hi everybody,
I've fix my problem by inserting followed code to the application
consistency checking script.
Please let me know whether you found a better solution.
PS. Thanks for replies
Vladimir
...
check_marker(){
# Hardcoded to avoid extra forks.
MARKER_COUNT=8
NODE_COUNT=4
#Operate only on the first online node (from the Online: list)
if [[ ! "`hostname`" -eq "`crm status |grep Online:| awk '{print
$3}'`" ]]; then
exit 0;
fi
# Number of started FMS servers
ONLINE_FMS=`crm_resource --resource StreamFMS --locate |sed -e
"s/\(.*\)://g"|wc -w`
#Current state of clone_node-max value
CLONE_NODE_MAX=$(crm configure show xml|grep
StreamIP-meta_attributes-clone-node-max|sed -e "s/^\(.*\)value=\"//" -e
"s/\"\/>//")
#How many nodes got a marker.
ONLINE_MARKER_COUNT=$(crm status|grep ClusterIP:|grep
'(ocf::heartbeat:IPaddr2):'|grep -v Stopped |awk '{print $4}'|sort|uniq |wc
-l)
REQUIRED_MARKER_COUNT=$(echo -e "a=`echo
\"$MARKER_COUNT/$ONLINE_FMS\"| bc`\nb=`echo \"$MARKER_COUNT/$ONLINE_FMS\" |
bc -l`\n if (a < b){print ++a} else {print a}" |bc -l )
if [ $((ONLINE_MARKER_COUNT*CLONE_NODE_MAX)) -lt "$MARKER_COUNT" ];
then
# Means: some markers are not allocated!
# Avoid packet loss:
crm_resource --resource StreamIP
--set-parameter=clone-node-max --meta
--parameter-value=$REQUIRED_MARKER_COUNT
exit $?
elif [ "$ONLINE_MARKER_COUNT" -lt $ONLINE_FMS ]; then
# Means: some nodes do not get workload
crm_resource --resource StreamIP
--set-parameter=clone-node-max --meta
--parameter-value=$REQUIRED_MARKER_COUNT
exit $?
elif [ "$REQUIRED_MARKER_COUNT" -eq "$CLONE_NODE_MAX" ]; then
# NULL exception
if [ $((ONLINE_FMS -1)) -gt '0' ]; then
PREDICT_CLONE_NODE_MAX=$(echo -e "a=`echo
\"$MARKER_COUNT/($ONLINE_FMS -1)\"| bc`\nb=`echo
\"$MARKER_COUNT/($ONLINE_FMS -1)\" | bc -l`\n if (a < b){print ++a} else
{print a}" |bc -l )
crm_resource --resource StreamIP
--set-parameter=clone-node-max --meta
--parameter-value=$PREDICT_CLONE_NODE_MAX
exit $?
fi
fi
}
...
On Fri, Oct 29, 2010 at 4:40 PM, Dan Frincu <dfrincu at streamwide.ro> wrote:
> Hi,
>
> Vladimir Legeza wrote:
>
> Hello,
>
> On Fri, Oct 29, 2010 at 12:35 PM, Dan Frincu <dfrincu at streamwide.ro>wrote:
>
>> Hi,
>>
>>
>> Vladimir Legeza wrote:
>>
>> *Hello folks.
>>
>> I try to setup four ip balanced nodes but, I didn't found the right way
>> to balance load between nodes when some of them are filed.
>>
>> I've done:*
>>
>> [root at node1 ~]# crm configure show
>> node node1
>> node node2
>> node node3
>> node node4
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>> params ip="10.138.10.252" cidr_netmask="32"
>> clusterip_hash="sourceip-sourceport" \
>> op monitor interval="30s"
>> clone StreamIP ClusterIP \
>> meta globally-unique="true" *clone-max="8" clone-node-max="2"*target-role="Started" notify="true" ordered="true" interleave="true"
>> property $id="cib-bootstrap-options" \
>> dc-version="1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438" \
>> cluster-infrastructure="openais" \
>> expected-quorum-votes="4" \
>> no-quorum-policy="ignore" \
>> stonith-enabled="false"
>>
>> *When all the nodes are up and running:*
>>
>> [root at node1 ~]# crm status
>> ============
>> Last updated: Thu Oct 28 17:26:13 2010
>> Stack: openais
>> Current DC: node2 - partition with quorum
>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ node1 node2 node3 node4 ]
>>
>> Clone Set: StreamIP (unique)
>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started node1
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started node1
>> ClusterIP:2 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:3 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:4 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:5 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:6 (ocf::heartbeat:IPaddr2): Started node4
>> ClusterIP:7 (ocf::heartbeat:IPaddr2): Started node4
>> *
>> Everything is OK and each node takes 1/4 of all traffic - wonderfull.
>> But we become to 25% traffic loss if one of them goes down:
>> *
>>
>> Isn't this supposed to be normal behavior in a load balancing situation,
>> 4 nodes receive 25% of traffic each, one node goes down, the load balancer
>> notices the failure and directs 33,33% of traffic to the remaining nodes?
>>
>>
> The only way I see to achive 33...% is to decrease *clone-max *param
> value (that should be multiple of online nodes number)
> also *clone-max *should be changed on the fly (automaticly).
>
> hmm... Idea is very interesting. =8- )
> *
> *
>
>> Just out of curiosity.
>>
>> [root at node1 ~]# crm node standby node1
>> [root at node1 ~]# crm status
>> ============
>> Last updated: Thu Oct 28 17:30:01 2010
>> Stack: openais
>> Current DC: node2 - partition with quorum
>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Node node1: standby
>> Online: [ node2 node3 node4 ]
>>
>> Clone Set: StreamIP (unique)
>> * ClusterIP:0 (ocf::heartbeat:IPaddr2): Stopped
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Stopped *
>> ClusterIP:2 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:3 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:4 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:5 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:6 (ocf::heartbeat:IPaddr2): Started node4
>> ClusterIP:7 (ocf::heartbeat:IPaddr2): Started node4
>>
>> *I found the solution (to prevent loosing) by set clone-node-max to 3*
>>
>> [root at node1 ~]# crm resource meta StreamIP set clone-node-max 3
>> [root at node1 ~]# crm status
>> ============
>> Last updated: Thu Oct 28 17:35:05 2010
>> Stack: openais
>> Current DC: node2 - partition with quorum
>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> *Node node1: standby*
>> Online: [ node2 node3 node4 ]
>>
>> Clone Set: StreamIP (unique)
>> * ClusterIP:0 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started node3*
>> ClusterIP:2 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:3 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:4 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:5 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:6 (ocf::heartbeat:IPaddr2): Started node4
>> ClusterIP:7 (ocf::heartbeat:IPaddr2): Started node4
>>
>> *The problem is that nothing gonna changed when node1 back online.*
>>
>> [root at node1 ~]# crm node online node1
>> [root at node1 ~]# crm status
>> ============
>> Last updated: Thu Oct 28 17:37:43 2010
>> Stack: openais
>> Current DC: node2 - partition with quorum
>> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
>> 4 Nodes configured, 4 expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ *node1* node2 node3 node4 ]
>>
>> Clone Set: StreamIP (unique)
>> * ClusterIP:0 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started node3*
>> ClusterIP:2 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:3 (ocf::heartbeat:IPaddr2): Started node2
>> ClusterIP:4 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:5 (ocf::heartbeat:IPaddr2): Started node3
>> ClusterIP:6 (ocf::heartbeat:IPaddr2): Started node4
>> ClusterIP:7 (ocf::heartbeat:IPaddr2): Started node4
>> *
>> There are NO TRAFFIC on node1.
>> If I back clone-node-max to 2 - all nodes revert to the original state.*
>>
>>
>>
>> So, My question is How to avoid such "hand-made" changes ( or is it
>> possible to automate* clone-node-max* adjustments)?
>>
>> Thanks!
>>
>> You could use location constraints for the clones, something like:
>>
>> location StreamIP:0 200: node1
>> location StreamIP:0 100: node2
>>
>> This way if node1 is up, it will run there, but if node1 fails it will
>> move to node2. And if you don't define resource stickiness, when node1 comes
>> back online, the resource migrates back to it.
>>
>
> I already tried to do so, but such configuration is not seems to be
> acceptable:
>
> crm(live)configure# location location_marker_0 StreamIP:0 200: node1
> crm(live)configure# commit
> element rsc_location: Relax-NG validity error : Expecting an element rule,
> got nothing
> element rsc_location: Relax-NG validity error : Element constraints has
> extra content: rsc_location
> element configuration: Relax-NG validity error : Invalid sequence in
> interleave
> element configuration: Relax-NG validity error : Element configuration
> failed to validate content
> element cib: Relax-NG validity error : Element cib failed to validate
> content
> crm_verify[20887]: 2010/10/29_16:00:21 ERROR: main: CIB did not pass *DTD/schema
> validation*
> Errors found during check: config not valid
>
> Here the issue is with the name of the resource in the location
> constraint, the name is StreamIP, and it seems it doesn't allow for
> referencing child clones, only the parent clone. This is probably the
> expected behavior in this case.
>
> Now you got me thinking, how would such a setup work. The way I see it,
> probably there's a better way of doing this.
> Create 8 clusterip resources, clusterip{1..8}.
>
> For each pair of clusterip resource (1+2, 3+4,etc), set a location
> constraint of 2x (location clusterip1_on_node1 clusterip1 200: node1,
> location clusterip2_on_node1 clusterip2 200: node1) and 6 location
> constraints of x for the other nodes.
>
> This way, you have 2 clusterip resources always preferring one node, with
> failover to any of the other 3 available nodes if the current node fails.
> Failback is possible when the node comes back online due to the larger score
> preference for that node.
>
> I know this will result in a rather complex set of resources and
> constraints, so maybe someone has a better / more simple vision of this.
>
> Regards,
>
> Dan
>
>
>
>> I haven't tested this, but it should give you a general idea about how it
>> could be implemented.
>>
>> Regards,
>>
>> Dan
>>
>
>> ------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.orghttp://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>> --
>> Dan FRINCU
>> Systems Engineer
>> CCNA, RHCE
>> Streamwide Romania
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
> ------------------------------
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.orghttp://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> --
> Dan FRINCU
> Systems Engineer
> CCNA, RHCE
> Streamwide Romania
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101102/c33b59a4/attachment.html>
More information about the Pacemaker
mailing list