[ClusterLabs] Antw: Replicated PGSQL woes
Israel Brewster
israel at ravnalaska.net
Fri Oct 14 16:33:05 UTC 2016
On Oct 13, 2016, at 11:36 PM, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
>>>> Israel Brewster <israel at ravnalaska.net> schrieb am 13.10.2016 um 19:04 in
> Nachricht <34091524-D35E-4E28-9C3E-DDA6C6A1E362 at ravnalaska.net>:
> [...]
>> Oct 13 08:29:39 CentTest1 crmd[30096]: notice: State transition S_IDLE ->
>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
>> origin=abort_transition_graph ]
>> Oct 13 08:29:39 CentTest1 pengine[30095]: notice: On loss of CCM Quorum:
>> Ignore
>> Oct 13 08:29:39 CentTest1 pengine[30095]: notice: Stop
>> virtual_ip#011(centtest2.ravnalaska.net)
>> Oct 13 08:29:39 CentTest1 pengine[30095]: notice: Demote
>> pgsql_96:0#011(Master -> Stopped centtest2.ravnalaska.net)
>> Oct 13 08:29:39 CentTest1 pengine[30095]: notice: Calculated Transition
>> 193: /var/lib/pacemaker/pengine/pe-input-500.bz2
>
>> Oct 13 08:29:39 CentTest1 crmd[30096]: notice: Initiating action 43:
>> notify pgsql_96_pre_notify_demote_0 on centtest2.ravnalaska.net
>> Oct 13 08:29:39 CentTest1 crmd[30096]: notice: Initiating action 45:
>> notify pgsql_96_pre_notify_demote_0 on centtest1.ravnalaska.net (local)
>
> The above section looks wrong, because if one resource is master and the other is slave, both cannot be demoted (AFAIK).. I'm also surprised that the cluster tries to demote a failed master; maybe you have no fencing configured?
Well, technically it's not a "failure" the way I'm testing, it's a clean shutdown. So no fencing is needed, because the system knows I shut down that node. Effectively, I manually fenced the node. FWIW, I've also tried doing a complete shutdown of the node (not just the cluster software, but the actual OS). Still never promotes on the other machine. From further investigation, it *looks* like it might be because it doesn't think the other machine is replicating properly, and as such shouldn't be trusted to be a master.
And no, I don't have fencing configured yet. I know it is important, but these are just test VM's I'm working on without fencing hardware, trying to get the basic operation working. The final deployment will, of course, have proper fencing (once I figure out how to make *that* work, but that's a different subject)
>
>> Oct 13 08:29:39 CentTest1 crmd[30096]: notice: Operation
>> pgsql_96_notify_0: ok (node=centtest1.ravnalaska.net, call=230, rc=0,
>> cib-update=0, confirmed=true)
>> Oct 13 08:29:39 CentTest1 crmd[30096]: notice: Initiating action 6: demote
>> pgsql_96_demote_0 on centtest2.ravnalaska.net
>
> "action 6": Where does it come from? We had 43 and 45!
>
> [...]
>
> Ulrich
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list