[Pacemaker] Pacemaker Digest, Vol 58, Issue 3

Deepshikha Singh deepshikhasingh at drishti-soft.com
Tue Sep 4 05:29:46 EDT 2012


Hii,

          Is pacmaker also work on WAN, if yes then how????


Thank you
On Mon, Sep 3, 2012 at 3:30 PM, <pacemaker-request at oss.clusterlabs.org>wrote:

> Send Pacemaker mailing list submissions to
>         pacemaker at oss.clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> or, via email, send a message with subject or body 'help' to
>         pacemaker-request at oss.clusterlabs.org
>
> You can reach the person managing the list at
>         pacemaker-owner at oss.clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pacemaker digest..."
>
>
> Today's Topics:
>
>    1.  Pacemaker 1.1.6 order possible bug ? (Tom?? Vav?i?ka)
>    2. Two c72f5ca stonithd coredumps (Vladislav Bogdanov)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 03 Sep 2012 07:41:25 +0200
> From: Tom?? Vav?i?ka <vavricka at ttc.cz>
> To: pacemaker at oss.clusterlabs.org
> Subject: [Pacemaker]  Pacemaker 1.1.6 order possible bug ?
> Message-ID: <50444305.4090501 at ttc.cz>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hello,
>
> Sorry If I send same question twice, but message did not appeared on
> mailing list.
>
> I have a problem with orders in pacemaker 1.1.6 and corosync 1.4.1.
>
> Order below is working for failover, but it is not working when one
> cluster node starts up (drbd stays in Slave state and ms_toponet is
> started before DRBD gets promoted).
>
> order o_start inf: ms_drbd_postgres:promote postgres:start
> ms_toponet:promote monitor_cluster:start
>
> Order below is not working for failover (it kills slave toponet app and
> start it again) but it is working correctly when cluster starts up.
>
> order o_start inf: ms_drbd_postgres:promote postgres:start
> ms_toponet:start ms_toponet:promote monitor_cluster:start
>
> I want to the pacemaker to act as in 1.0.12 version.
> * when toponet master app is killed, move postgres resource to other
> node and promote ms_toponet and ms_drbd_postgres to Master
> * when one node is starting promote DRBD to master is is UpToDate
>
> Am I doing something wrong?
>
> It looks to me pacemaker ignores some orders (pacemaker should wait for
> DRBD promotion when starting toponet app, but toponet app is started
> right after DRBD start (slave)). I tried to solve this by different
> orders with combination symmetrical=false, split orders, different
> orders for start and stop, but no success at all (seems to me like
> completely ignoring symmetrical=false directive).
>
> Pacemaker 1.1.7 is not working for me, because it has broken on-fail
> directive.
>
> crm_mon output:
>
> ============
> Last updated: Fri Aug 31 14:51:11 2012
> Last change: Fri Aug 31 14:50:27 2012 by hacluster via crmd on toponet30
> Stack: openais
> Current DC: toponet30 - partition WITHOUT quorum
> Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e
> 2 Nodes configured, 2 expected votes
> 10 Resources configured.
> ============
>
> Online: [ toponet30 toponet31 ]
>
> st_primary      (stonith:external/xen0):        Started toponet30
> st_secondary    (stonith:external/xen0):        Started toponet31
>   Master/Slave Set: ms_drbd_postgres
>       Masters: [ toponet30 ]
>       Slaves: [ toponet31 ]
>   Resource Group: postgres
>       pg_fs      (ocf::heartbeat:Filesystem):    Started toponet30
>       PGIP       (ocf::heartbeat:IPaddr2):       Started toponet30
>       postgresql (ocf::heartbeat:pgsql): Started toponet30
> monitor_cluster (ocf::heartbeat:monitor_cluster):       Started toponet30
>   Master/Slave Set: ms_toponet
>       Masters: [ toponet30 ]
>       Slaves: [ toponet31 ]
>
> configuration:
>
> node toponet30
> node toponet31
> primitive PGIP ocf:heartbeat:IPaddr2 \
>          params ip="192.168.100.3" cidr_netmask="29" \
>          op monitor interval="5s"
> primitive drbd_postgres ocf:linbit:drbd \
>          params drbd_resource="postgres" \
>          op start interval="0" timeout="240s" \
>          op stop interval="0" timeout="120s" \
>          op monitor interval="5s" role="Master" timeout="10s" \
>          op monitor interval="10s" role="Slave" timeout="20s"
> primitive monitor_cluster ocf:heartbeat:monitor_cluster \
>          op monitor interval="30s" \
>          op start interval="0" timeout="30s" \
>          meta target-role="Started"
> primitive pg_fs ocf:heartbeat:Filesystem \
>          params device="/dev/drbd0" directory="/var/lib/pgsql"
> fstype="ext3"
> primitive postgresql ocf:heartbeat:pgsql \
>          op start interval="0" timeout="80s" \
>          op stop interval="0" timeout="60s" \
>          op monitor interval="10s" timeout="10s" depth="0"
> primitive st_primary stonith:external/xen0 \
>          op start interval="0" timeout="60s" \
>          params hostlist="toponet31:/etc/xen/vm/toponet31"
> dom0="172.16.103.54"
> primitive st_secondary stonith:external/xen0 \
>          op start interval="0" timeout="60s" \
>          params hostlist="toponet30:/etc/xen/vm/toponet30"
> dom0="172.16.103.54"
> primitive toponet ocf:heartbeat:toponet \
>          op start interval="0" timeout="180s" \
>          op stop interval="0" timeout="60s" \
>          op monitor interval="10s" role="Master" timeout="20s"
> on-fail="standby" \
>          op monitor interval="20s" role="Slave" timeout="40s" \
>          op promote interval="0" timeout="120s" \
>          op demote interval="0" timeout="120s"
> group postgres pg_fs PGIP postgresql
> ms ms_drbd_postgres drbd_postgres \
>          meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Master"
> ms ms_toponet toponet \
>          meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" target-role="Master"
> location loc_st_pri st_primary -inf: toponet31
> location loc_st_sec st_secondary -inf: toponet30
> location master-prefer-node1 postgres 100: toponet30
> colocation pg_on_drbd inf: monitor_cluster ms_toponet:Master postgres
> ms_drbd_postgres:Master
> order o_start inf: ms_drbd_postgres:start ms_drbd_postgres:promote
> postgres:start ms_toponet:start ms_toponet:promote monitor_cluster:start
> property $id="cib-bootstrap-options" \
>          dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>          cluster-infrastructure="openais" \
>          expected-quorum-votes="2" \
>          no-quorum-policy="ignore" \
>          stonith-enabled="true"
> rsc_defaults $id="rsc-options" \
>          resource-stickiness="5000"
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 03 Sep 2012 10:03:28 +0300
> From: Vladislav Bogdanov <bubble at hoster-ok.com>
> To: pacemaker at oss.clusterlabs.org
> Subject: [Pacemaker] Two c72f5ca stonithd coredumps
> Message-ID: <50445640.1060505 at hoster-ok.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Andrew, all,
>
> as I wrote before, I caught two paths where stonithd (c72f5ca) dumps core.
> Here are gdb backtraces for them (sorry for posting them inline, I was
> requested to do that ASAP and I hope it is not yet too late for 1.1.8 ;)
> ). Some vars are optimized out, but I hope that doesn't matter. If some
> more information is needed please just request it.
>
> First one is:
> ...
> Core was generated by `/usr/libexec/pacemaker/stonithd'.
> Program terminated with signal 11, Segmentation fault.
> ...
> (gdb) bt
> #0  0x00007f4aec6cdb51 in __strlen_sse2 () from /lib64/libc.so.6
> #1  0x00007f4aec6cd866 in strdup () from /lib64/libc.so.6
> #2  0x000000000040c6f6 in create_remote_stonith_op (client=0x1871120
> "2194f1b8-5722-49c3-bed1-c8fecc78ca02", request=0x1884840, peer=<value
> optimized out>)
>     at remote.c:313
> #3  0x000000000040cf40 in initiate_remote_stonith_op (client=<value
> optimized out>, request=0x1884840, manual_ack=0) at remote.c:336
> #4  0x000000000040a2be in stonith_command (client=0x1870a80, id=<value
> optimized out>, flags=<value optimized out>, request=0x1884840, remote=0x0)
>     at commands.c:1380
> #5  0x0000000000403252 in st_ipc_dispatch (c=0x18838d0, data=<value
> optimized out>, size=329) at main.c:142
> #6  0x00007f4aebaf8d64 in ?? () from /usr/lib64/libqb.so.0
> #7  0x00007f4aebaf908e in qb_ipcs_dispatch_connection_request () from
> /usr/lib64/libqb.so.0
> #8  0x00007f4aee26fda5 in gio_read_socket (gio=<value optimized out>,
> condition=G_IO_IN, data=0x18732f0) at mainloop.c:353
> #9  0x00007f4aebf8ef0e in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #10 0x00007f4aebf92938 in ?? () from /lib64/libglib-2.0.so.0
> #11 0x00007f4aebf92d55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
> #12 0x0000000000403a98 in main (argc=<value optimized out>,
> argv=0x7fffa3443148) at main.c:890
> (gdb) bt full
> #0  0x00007f4aec6cdb51 in __strlen_sse2 () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f4aec6cd866 in strdup () from /lib64/libc.so.6
> No symbol table info available.
> #2  0x000000000040c6f6 in create_remote_stonith_op (client=0x1871120
> "2194f1b8-5722-49c3-bed1-c8fecc78ca02", request=0x1884840, peer=<value
> optimized out>)
>     at remote.c:313
>         nodeid = <value optimized out>
>         node = 0x1871790
>         op = 0x187e2e0
>         dev = <value optimized out>
>         __func__ = "create_remote_stonith_op"
>         __PRETTY_FUNCTION__ = "create_remote_stonith_op"
> #3  0x000000000040cf40 in initiate_remote_stonith_op (client=<value
> optimized out>, request=0x1884840, manual_ack=0) at remote.c:336
>         query = 0x0
>         client_id = 0x1871120 "2194f1b8-5722-49c3-bed1-c8fecc78ca02"
>         op = 0x0
>         __func__ = "initiate_remote_stonith_op"
>         __PRETTY_FUNCTION__ = "initiate_remote_stonith_op"
> #4  0x000000000040a2be in stonith_command (client=0x1870a80, id=<value
> optimized out>, flags=<value optimized out>, request=0x1884840, remote=0x0)
>     at commands.c:1380
>         alternate_host = <value optimized out>
>         dev = <value optimized out>
>         target = 0x1883f40 "1074005258"
>         call_options = 4610
>         rc = -95
>         is_reply = 0
>         always_reply = 0
>         reply = 0x0
>         data = 0x0
>         op = 0x187e550 "st_fence"
>         client_id = 0x1874cb0 "2194f1b8-5722-49c3-bed1-c8fecc78ca02"
>         __func__ = "stonith_command"
>         __PRETTY_FUNCTION__ = "stonith_command"
>         __FUNCTION__ = "stonith_command"
> #5  0x0000000000403252 in st_ipc_dispatch (c=0x18838d0, data=<value
> optimized out>, size=329) at main.c:142
>         id = 4
>         flags = 1
>         request = 0x1884840
>         client = 0x1870a80
>         __FUNCTION__ = "st_ipc_dispatch"
>         __func__ = "st_ipc_dispatch"
>         __PRETTY_FUNCTION__ = "st_ipc_dispatch"
> #6  0x00007f4aebaf8d64 in ?? () from /usr/lib64/libqb.so.0
> No symbol table info available.
> #7  0x00007f4aebaf908e in qb_ipcs_dispatch_connection_request () from
> /usr/lib64/libqb.so.0
> No symbol table info available.
> #8  0x00007f4aee26fda5 in gio_read_socket (gio=<value optimized out>,
> condition=G_IO_IN, data=0x18732f0) at mainloop.c:353
>         adaptor = 0x18732f0
>         fd = 15
>         __func__ = "gio_read_socket"
> #9  0x00007f4aebf8ef0e in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> No symbol table info available.
> #10 0x00007f4aebf92938 in ?? () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #11 0x00007f4aebf92d55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #12 0x0000000000403a98 in main (argc=<value optimized out>,
> argv=0x7fffa3443148) at main.c:890
>         flag = <value optimized out>
>         lpc = 0
> ---Type <return> to continue, or q <return> to quit---
>         argerr = 0
>         option_index = 0
>         cluster = {uuid = 0x176da90 "1090782474", uname = 0x176dac0
> "vd01-b", nodeid = 1090782474, cs_dispatch = 0x404050
> <stonith_peer_ais_callback>,
>           destroy = 0x404230 <stonith_peer_ais_destroy>}
>         actions = {0x40e3fb "reboot", 0x40e402 "off", 0x40ea75 "list",
> 0x40e406 "monitor", 0x40e40e "status"}
>         __func__ = "main"
>
>
>
> Second is (segfault in CRM_ASSERT()):
> ...
> Core was generated by `/usr/libexec/pacemaker/stonithd'.
> Program terminated with signal 11, Segmentation fault.
> #0  stonith_command (client=0x0, id=0, flags=0, request=0xb342f0,
> remote=0xb39cf0 "vd01-d") at commands.c:1258
> 1258    commands.c: No such file or directory.
>         in commands.c
> ...
> (gdb) bt
> #0  stonith_command (client=0x0, id=0, flags=0, request=0xb342f0,
> remote=0xb39cf0 "vd01-d") at commands.c:1258
> #1  0x00000000004040e4 in stonith_peer_callback (kind=<value optimized
> out>, from=<value optimized out>,
>     data=0x7fffa5327cc8 "<st-reply
> st_origin=\"stonith_construct_async_reply\" t=\"stonith-ng\"
> st_op=\"st_notify\" st_device_id=\"manual_ack\"
> st_remote_op=\"25917710-8972-4c40-b783-9648749396a4\"
> st_clientid=\"936ea671-61ba-4258-8e12-"...) at main.c:218
> #2  stonith_peer_ais_callback (kind=<value optimized out>, from=<value
> optimized out>,
>     data=0x7fffa5327cc8 "<st-reply
> st_origin=\"stonith_construct_async_reply\" t=\"stonith-ng\"
> st_op=\"st_notify\" st_device_id=\"manual_ack\"
> st_remote_op=\"25917710-8972-4c40-b783-9648749396a4\"
> st_clientid=\"936ea671-61ba-4258-8e12-"...) at main.c:254
> #3  0x00007f92ded376ca in ais_dispatch_message (handle=<value optimized
> out>, groupName=<value optimized out>, nodeid=<value optimized out>,
>     pid=<value optimized out>, msg=0x7fffa5327a78, msg_len=<value
> optimized out>) at corosync.c:551
> #4  pcmk_cpg_deliver (handle=<value optimized out>, groupName=<value
> optimized out>, nodeid=<value optimized out>, pid=<value optimized out>,
>     msg=0x7fffa5327a78, msg_len=<value optimized out>) at corosync.c:619
> #5  0x00007f92de91ceaf in cpg_dispatch (handle=7749363892505018368,
> dispatch_types=<value optimized out>) at cpg.c:412
> #6  0x00007f92ded34a42 in pcmk_cpg_dispatch (user_data=<value optimized
> out>) at corosync.c:577
> #7  0x00007f92def61d27 in mainloop_gio_callback (gio=<value optimized
> out>, condition=G_IO_IN, data=0xb2d400) at mainloop.c:535
> #8  0x00007f92dcc7ff0e in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #9  0x00007f92dcc83938 in ?? () from /lib64/libglib-2.0.so.0
> #10 0x00007f92dcc83d55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
> #11 0x0000000000403a98 in main (argc=<value optimized out>,
> argv=0x7fffa5427de8) at main.c:890
> (gdb) bt full
> #0  stonith_command (client=0x0, id=0, flags=0, request=0xb342f0,
> remote=0xb39cf0 "vd01-d") at commands.c:1258
>         call_options = 4104
>         rc = -95
>         is_reply = 1
>         always_reply = 0
>         reply = 0x0
>         data = 0x0
>         op = 0xb34370 "st_notify"
>         client_id = 0xb3ddc0 "936ea671-61ba-4258-8e12-98542a541b23"
>         __func__ = "stonith_command"
>         __PRETTY_FUNCTION__ = "stonith_command"
>         __FUNCTION__ = "stonith_command"
> #1  0x00000000004040e4 in stonith_peer_callback (kind=<value optimized
> out>, from=<value optimized out>,
>     data=0x7fffa5327cc8 "<st-reply
> st_origin=\"stonith_construct_async_reply\" t=\"stonith-ng\"
> st_op=\"st_notify\" st_device_id=\"manual_ack\"
> st_remote_op=\"25917710-8972-4c40-b783-9648749396a4\"
> st_clientid=\"936ea671-61ba-4258-8e12-"...) at main.c:218
>         remote = 0xb39cf0 "vd01-d"
> #2  stonith_peer_ais_callback (kind=<value optimized out>, from=<value
> optimized out>,
>     data=0x7fffa5327cc8 "<st-reply
> st_origin=\"stonith_construct_async_reply\" t=\"stonith-ng\"
> st_op=\"st_notify\" st_device_id=\"manual_ack\"
> st_remote_op=\"25917710-8972-4c40-b783-9648749396a4\"
> st_clientid=\"936ea671-61ba-4258-8e12-"...) at main.c:254
>         xml = 0xb342f0
>         __func__ = "stonith_peer_ais_callback"
> #3  0x00007f92ded376ca in ais_dispatch_message (handle=<value optimized
> out>, groupName=<value optimized out>, nodeid=<value optimized out>,
>     pid=<value optimized out>, msg=0x7fffa5327a78, msg_len=<value
> optimized out>) at corosync.c:551
>         data = 0x7fffa5327cc8 "<st-reply
> st_origin=\"stonith_construct_async_reply\" t=\"stonith-ng\"
> st_op=\"st_notify\" st_device_id=\"manual_ack\"
> st_remote_op=\"25917710-8972-4c40-b783-9648749396a4\"
> st_clientid=\"936ea671-61ba-4258-8e12-"...
>         uncompressed = 0x0
>         xml = 0x0
> #4  pcmk_cpg_deliver (handle=<value optimized out>, groupName=<value
> optimized out>, nodeid=<value optimized out>, pid=<value optimized out>,
>     msg=0x7fffa5327a78, msg_len=<value optimized out>) at corosync.c:619
>         ais_msg = 0x7fffa5327a78
>         __func__ = "pcmk_cpg_deliver"
> #5  0x00007f92de91ceaf in cpg_dispatch (handle=7749363892505018368,
> dispatch_types=<value optimized out>) at cpg.c:412
>         timeout = 0
>         error = <value optimized out>
>         cpg_inst = 0xb2cd90
>         res_cpg_confchg_callback = <value optimized out>
>         res_cpg_deliver_callback = 0x7fffa53279c0
>         res_cpg_totem_confchg_callback = <value optimized out>
>         cpg_inst_copy = {c = 0xb2cdf0, finalize = 0, context = 0x0,
> {model_data = {model = CPG_MODEL_V1}, model_v1_data = {model =
> CPG_MODEL_V1,
>               cpg_deliver_fn = 0x7f92ded37300 <pcmk_cpg_deliver>,
> cpg_confchg_fn = 0x7f92ded33fb0 <pcmk_cpg_membership>,
> cpg_totem_confchg_fn = 0,
>               flags = 0}}, iteration_list_head = {next = 0xb2cdd0, prev
> = 0xb2cdd0}}
>         dispatch_data = 0x7fffa53279c0
>         member_list = {{nodeid = 1090782474, pid = 4965, reason = 0},
> {nodeid = 1107559690, pid = 3544, reason = 0}, {nodeid = 1124336906, pid
> = 4487,
>             reason = 3544}, {nodeid = 0, pid = 0, reason = 0} <repeats
> 125 times>}
>         left_list = {{nodeid = 0, pid = 0, reason = 0} <repeats 128 times>}
>         joined_list = {{nodeid = 1107559690, pid = 3544, reason = 1},
> {nodeid = 0, pid = 0, reason = 0} <repeats 127 times>}
>         group_name = {length = 11, value = "stonith-ng", '\000' <repeats
> 117 times>}
>         left_list_start = <value optimized out>
>         joined_list_start = <value optimized out>
>         i = <value optimized out>
>         ring_id = {nodeid = 0, seq = 0}
>         totem_member_list = {0 <repeats 128 times>}
>         errno_res = <value optimized out>
>         dispatch_buf =
>
> "\005\000\000\000\000\000\000\000W\004\000\000\000\000\000\000\240\224\327)\204\177\000\000\v\000\000\000\000\000\000\000stonith-ng",
> '\000' <repeats 118 times>"\237,
>
> \003\000\000\000\000\000\000\n\005\004C\000\000\000\000\207\021\000\000\204\177\000\000\000\000\000\000\000\000\000\000\237\003\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001",
> '\000' <repeats 19 times>, "\t", '\000' <repeats 263 times>,
>
> "\n\005\004C\207\021\000\000\000\000\000\000\t\000\000\000\006\000\000\000vd01-d",
> '\000' <repeats 250 times>, "O\001\000\000\000\000\000\000<st-reply
> st_origin=\"stonith_construct_asyn"...
> #6  0x00007f92ded34a42 in pcmk_cpg_dispatch (user_data=<value optimized
> out>) at corosync.c:577
>         rc = 0
> ---Type <return> to continue, or q <return> to quit---
>         __func__ = "pcmk_cpg_dispatch"
> #7  0x00007f92def61d27 in mainloop_gio_callback (gio=<value optimized
> out>, condition=G_IO_IN, data=0xb2d400) at mainloop.c:535
>         keep = 1
>         client = 0xb2d400
>         __func__ = "mainloop_gio_callback"
> #8  0x00007f92dcc7ff0e in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> No symbol table info available.
> #9  0x00007f92dcc83938 in ?? () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #10 0x00007f92dcc83d55 in g_main_loop_run () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #11 0x0000000000403a98 in main (argc=<value optimized out>,
> argv=0x7fffa5427de8) at main.c:890
>         flag = <value optimized out>
>         lpc = 0
>         argerr = 0
>         option_index = 0
>         cluster = {uuid = 0xb2da90 "1090782474", uname = 0xb2dac0
> "vd01-b", nodeid = 1090782474, cs_dispatch = 0x404050
> <stonith_peer_ais_callback>,
>           destroy = 0x404230 <stonith_peer_ais_destroy>}
>         actions = {0x40e3fb "reboot", 0x40e402 "off", 0x40ea75 "list",
> 0x40e406 "monitor", 0x40e40e "status"}
>         __func__ = "main"
>
> Best,
> Vladislav
>
>
>
> ------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> End of Pacemaker Digest, Vol 58, Issue 3
> ****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120904/ffc03c6f/attachment-0002.html>


More information about the Pacemaker mailing list