[Pacemaker] Required resources for stateful clones
Eamon Roque
Eamon.Roque at lex-com.net
Fri May 20 09:09:37 EDT 2011
>
> On Fri, May 20, 2011 at 3:42 AM, Eamon Roque
<Eamon.Roque at lex-com.net>wrote:
>
> > Hi,
> >
> >
> > >> On Thu, May 19, 2011 at 5:05 AM, Eamon Roque
<Eamon.Roque at lex-com.net
> > >wrote:
> >
> > >> Hi,
> > >>
> > >> I've put together a cluster of two nodes running a databank without
> > shared
> > >> storage. Both nodes replicate data between them, which is taken
care of
> > by
> > >> the databank itself.
> > >>
> > >> I have a resource for the databank and ip. I then created a
stateful
> > clone
> > >> from the databank resource. I created colocation rules joining the
> > >> databank-ms-clone and ip:
> > >>
> > >> node pgsqltest1
> > >> node pgsqltest2
> > >> primitive Postgres-IP ocf:heartbeat:IPaddr2 \
> > >> params ip="10.19.57.234" cidr_netmask="32" \
> > >> op monitor interval="30s" \
> > >> meta is-managed="false"
> > >> primitive resPostgres ocf:heartbeat:pgsql \
> > >> params pgctl="/opt/PostgreSQL/9.0/bin/pg_ctl"
> > >>pgdata="/opt/PostgreSQL/9.0/data"
psql="/opt/PostgreSQL/9.0/bin/psql"
> > >> pgdba="postgres" \
> > >> op monitor interval="1min" \
> > >> meta is-managed="false"
> > >> ms msPostgres resPostgres \
> > >> meta master-max="1" master-node-max="1" clone-max="2"
> > >> clone-node-max="1" notify="true" target-role="started"
> > >> colocation colPostgres inf: Postgres-IP msPostgres:Master
> > >> order ordPostgres inf: msPostgres:promote Postgres-IP:start
> > >> property $id="cib-bootstrap-options" \
> > >> dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5"
\
> > >> cluster-infrastructure="openais" \
> > >> expected-quorum-votes="2" \
> > >> stonith-enabled="false" \
> > >> no-quorum-policy="ignore" \
> > >> last-lrm-refresh="1302707146"
> > >> rsc_defaults $id="rsc-options" \
> > >> resource-stickiness="200"
> > >> op_defaults $id="op_defaults-options" \
> > >> record-pending="false"
> > >>
> > >> The normal postgres agent doesn't support this functionality, but
I've
> > put
> > >> together my own using the mysql agent as a model. Before running
the
> > script
> > >> through ocf-tester, I unmanage the postgres resource.
> > >>
> >
> > > Could you show how you implemented promote/demote for pgsql?
> >
> > Sure, let's start with the ultra-simple "promote" function:
> >
> > #
> > # These variables are higher up in the file, but they will probably
help
> > with understanding the error of
> > # my ways.
> >
> > CRM_MASTER="${HA_SBIN_DIR}/crm_master"
> > ATTRD_UPDATER="${HA_SBIN_DIR}/attrd_updater"
> >
> > pgsql_promote() {
> > local output
> > local rc
> > local CHECK_PG_SQL
> > local COMPLETE_STANDBY_QUERY
> > local PROMOTE_SCORE_HIGH
> > local MOD_PSQL_M_FORMAT
> >
> >
> > PROMOTE_SCORE_HIGH=1000
> > CHECK_PG_SQL="SELECT pg_is_in_recovery()"
> > MOD_PSQL_M_FORMAT="$OCF_RESKEY_psql -Atc"
> > COMPLETE_STANDBY_QUERY="$MOD_PSQL_M_FORMAT \"$CHECK_PG_SQL\""
> >
> > output=$(su - $OCF_RESKEY_pgdba -c "$COMPLETE_STANDBY_QUERY"
2>&1)
> > echo $output
> >
> > rc=$?
> >
> > case $output in
> > f)
> > ocf_log debug "PostgreSQL Node is running in
Master
> > mode..."
> > return $OCF_RUNNING_MASTER
> > ;;
> >
> > t)
> > ocf_log debug "PostgreSQL Node is in
Hot_Standby
> > mode..."
> > return $OCF_SUCCESS
> > ;;
> >
> > *)
> > ocf_log err "Critical error in $CHECK_PG_SQL:
> > $output"
> > return $OCF_ERR_GENERIC
> > ;;
> > esac
> >
> > #
> > # "Real" promotion is handled here.
> > # The trigger file is created and we check for "recovery.conf" on the
host.
> > # If we can't find it, then the file will be copied from the HA-Config
into
> > postgres' data folder.
> > #
> >
> > if ! touch $OCF_RESKEY_trigger_file; then
> > ocf_log err "$OCF_RESKEY_trigger_file could not be created!"
> > return $OCF_ERR_GENERIC
> > fi
> >
> > if [ ! -f $OCF_RESKEY_recovery_conf ]; then
> > ocf_log err "$OCF_RESKEY_recovery_conf doesn't exist!"
> > cp $OCF_RESKEY_recovery_conf_ersatz $OCF_RESKEY_pgdata
> > return $OCF_SUCCESS
> > fi
>
>
> Why do you need this? As far as I know when you switch standby database
to
> primary using trigger file recovery.conf gets renamed to recovery.done.
If
> you rename it back DB will be put into standby mode after restart.We are
> talking about streaming replication, right?
>
>
Right. The order is wrong. According to the Binary Replication tutorial on
the postgres wiki, when I perform a failover with a trigger file, it wants
to find a "recovery.conf", which it then processes (checking the archive
for missing updates etc.) and renames (after noticing the trigger file).
I assumed that this would work in exactly the same way with Streaming
Replication.
Am I wrong?
> >
> >
> > # If both file exist or can be created, then the failover fun can
start.
> >
> > ocf_log info "$OCF_RESKEY_trigger_file was created."
> > ocf_log info "$OCF_RESKEY_recovery_conf exists and can be copied to
the
> > correct location."
> >
> > # Sometimes, the master needs a bit of time to take the reins. So...
> >
> > while :
> > do
> > pgsql_monitor warn
> > rc=$?
> >
> > if [ $rc -eq $OCF_RUNNING_MASTER ]; then
> > break;
> > fi
> >
> > ocf_log debug "Postgres Server could not be promoted. Please
> > wait..."
> >
> > sleep 1
> >
> > done
> >
> > ocf_log info "Postgres Server has been promoted. Please check on the
> > previous master."
> >
> > #################################
> > #Attributes Update: #
> > #################################
> >
> > $ATTRD_UPDATER -n $PGSQL_STATUS_NAME -v \"PRI\" || exit $(echo "Eh!
> > Attrd_updater is not working!")
> >
> > #############################################
> > # Resource stickiness pumped up to 1000 : #
> > #############################################
> >
> > $CRM_MASTER -v $PROMOTE_WERT_HOCH || exit $(echo "crm_master could not
> > change the Master's status!")
> >
> > ############
> > # Success! #
> > ############
> >
> > return $OCF_SUCCESS
> >
> > }
> >
> >
> >
>
######################################################################################################
> >
> > Thanks!
> >
> >
> And what about demote? Switching standby into primary using trigger
files
> changes TIMELINE in the DB and that invalidates all other standby
databases
> as well as previous master database. After that you have to restore them
> from a fresh backup made on new master. This particular behavior stopped
me
> from implementing Master/Slave functionality in pgsql RA so far.
>
> BTW, why pgsql is set to is-managed="false" in your configuration.With
this
> setting cluster will keep monitoring it but won't take any other actions
> AFAIK.
Demote? Well, seeing as neither promote nor demote actually worked for me,
I thought I would start small.
As far as the trigger file switching goes, you're of course completely
right. This behavior isn't really a big deal in my environment, as it's
meant as more of test and we want to bring back the demoted servers up
manually, but I can see that it would cause a lot of problems in a more
complex environment. When I tested the failover functionality without
pacemaker, I have to perform a fresh backup even if I waited less than 30s
to bring the old master back up as a standby.
I guess that with 9.1 this will be easier...
I unmanaged the resources so that my test agent would handle them. Is this
incorrect?
>
>
> ?amon
> >
> >
> >
> > >> Unfortunately, promote/demote doesn't work. ocf-tester tries to use
the
> > >> "crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l
reboot -v
> > >> 100", but the (unmanaged) resources don't accept the score change.
> > >>
> > >> I'm pretty sure that I just need to be hit with a clue stick and
would
> > be
> > >> grateful for any help.
> > >>
> > >> Thanks,
> > >>
> > >> ?amon
> > >>
> >
> >
> >
> > --
> > Serge Dubrouski.
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> >
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
>
>
> --
> Serge Dubrouski.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/
> 20110520/e1f26230/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> End of Pacemaker Digest, Vol 42, Issue 53
> *****************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110520/624616a8/attachment-0003.html>
More information about the Pacemaker
mailing list