<br><br><div class="gmail_quote">On Mon, Dec 12, 2011 at 5:32 AM, Takatoshi MATSUO <span dir="ltr"><<a href="mailto:matsuo.tak@gmail.com">matsuo.tak@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Hello<br>

<br>

2011/12/12 Serge Dubrouski <<a href="mailto:sergeyfd@gmail.com">sergeyfd@gmail.com</a>>:<br>

<div><div class="h5">><br>

><br>

> On Thu, Dec 8, 2011 at 10:34 PM, Takatoshi MATSUO <<a href="mailto:matsuo.tak@gmail.com">matsuo.tak@gmail.com</a>><br>

> wrote:<br>

>><br>

>> Hi Attila<br>

>><br>

>> 2011/12/8 Attila Megyeri <<a href="mailto:amegyeri@minerva-soft.com">amegyeri@minerva-soft.com</a>>:<br>

>> > Hi Takatoshi,<br>

>> ><br>

>> > One strange thing I noticed and could probably be improved.<br>

>> > When there is data inconsistency, I have the following node properties:<br>

>> ><br>

>> > * Node psql2:<br>

>> >    + default_ping_set                  : 100<br>

>> >    + master-postgresql:1               : -INFINITY<br>

>> >    + pgsql-data-status                 : DISCONNECT<br>

>> >    + pgsql-status                      : HS:alone<br>

>> > * Node psql1:<br>

>> >    + default_ping_set                  : 100<br>

>> >    + master-postgresql:0               : 1000<br>

>> >    + master-postgresql:1               : -INFINITY<br>

>> >    + pgsql-data-status                 : LATEST<br>

>> >    + pgsql-master-baseline             : 58:000000004B000020<br>

>> >    + pgsql-status                      : PRI<br>

>> ><br>

>> > This is fine, and understandable - but I can see this only if I do a<br>

>> > crm_mon -A.<br>

>> ><br>

>> > My problem is, that CRM shows the following:<br>

>> ><br>

>> > Master/Slave Set: db-ms-psql [postgresql]<br>

>> >     Masters: [ psql1 ]<br>

>> >     Slaves: [ psql2 ]<br>

>> ><br>

>> > So if I monitor the system from crm_mon, HAWK or ther tools - I have no<br>

>> > indication at all that the slave is running in an inconsistent mode.<br>

>> ><br>

>> > I would expect the RA to stop the psql2 node in such cases, because:<br>

>> > - It is running, but has non-up-to-date data, therefore noone will use<br>

>> > it (the slave IP points to the master as well, which is good)<br>

>> > - In CRM status eveything looks perfect, even though it is NOT perfect<br>

>> > and admin intervention is required.<br>

>> ><br>

>> ><br>

>> > Shouldn't the disconnected PSQL server be stopped instead?<br>

>><br>

>> hmm..<br>

>> It's not better to stop PGSQL server.<br>

>> RA cannot know whether PGSQL is disconnected because of<br>

>> data-inconsistent or network-down or<br>

>> starting-up and so on.<br>

><br>

><br>

> Why does it matter? If the state is degraded and inconsistent and there is<br>

> no way to fix it from inside of the RA, RA should probably stop it.<br>

<br>

</div></div>In this case, HS's data may be cosistent but Primary dosen't have enough wals or<br>

HS dosen't have enough wal-archives to be replication-mode.<br>

Unfortunately this RA dosen't calculate the number of wals.<br></blockquote><div><br>Honestly I don't know how to better handle this. Pacemaker doesn't have a concept of degraded node state.<br><br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div class="im"><br>

<br>

> Let's say that there is pgpool running in front of the cluster, keeping an<br>

> inconsistent node up would lead to the routing SQL queries to it and<br>

> possibly getting wrong results.<br>

><br>

<br>

</div>It dosen't happen in my sample configuration.<br>

vip-slave is up at master when slave is not "HS:sync".<br></blockquote><div><br>So you have a VIP for each slave node?<br> <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div class="im"><br>

>><br>

>><br>

>><br>

>> How about using dummy RA such as vip-slave?<br>

>> -------------------------------------------<br>

>> primitive runningSlaveOK ocf:heartbeat:Dummy<br>

>> .....(snip)<br>

>><br>

>> location rsc_location-dummy runningSlaveOK \<br>

>>     rule  200: pgsql-status eq "HS:sync"<br>

>> -------------------------------------------<br>

<br>

><br>

> That probably fixes visibility issue. What about notifications on DISCONNECT<br>

> state? How administrator would know that cluster is inconsistent? May be the<br>

> better option in this case would be collocating MailTo resource with<br>

> "HS:alone"?<br>

<br>

</div>Yes, it's good idea if you want to receive notifications.<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

Regards,<br>

Takatoshi MATSUO<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Serge Dubrouski.<br>