[ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

Klaus Wenninger kwenning at redhat.com
Tue Mar 13 11:18:24 EDT 2018


On 03/13/2018 03:43 PM, Muhammad Sharfuddin wrote:
> Thanks a lot for the explanation. But other then the ocfs2 resource
> group, this cluster starts all other resources
>
> on a single node, without any issue just because the use of
> "no-quorum-policy=ignore" option.

Yes I know. And what I tried to point out is that "no-quorum-policy=ignore"
is dangerous for services that do require a resource-manager. If you don't
have any of those go with a systemd startup.

Regards,
Klaus

>
> -- 
> Regards,
> Muhammad Sharfuddin
>
> On 3/13/2018 7:32 PM, Klaus Wenninger wrote:
>> On 03/13/2018 02:30 PM, Muhammad Sharfuddin wrote:
>>> Yes, by saying pacemaker,  I meant to say corosync as well.
>>>
>>> Is there any fix ? or a two node cluster can't run ocfs2 resources
>>> when one node is offline ?
>> Actually there can't be a "fix" as 2 nodes are just not enough
>> for a partial-cluster to be quorate in the classical sense
>> (more votes than half of the cluster nodes).
>>
>> So to still be able to use it we have this 2-node config that
>> permanently sets quorum. But not to run into issues on
>> startup we need it to require both nodes seeing each
>> other once.
>>
>> So this is definitely nothing that is specific to ocfs2.
>> It just looks specific to ocfs2 because you've disabled
>> quorum for pacemaker.
>> To be honnest doing this you wouldn't need a resource-manager
>> at all and could just start up your services using systemd.
>>
>> If you don't want a full 3rd node, and still want to handle cases
>> where one node doesn't come up after a full shutdown of
>> all nodes, you probably could go for a setup with qdevice.
>>
>> Regards,
>> Klaus
>>
>>> -- 
>>> Regards,
>>> Muhammad Sharfuddin
>>>
>>> On 3/13/2018 6:16 PM, Klaus Wenninger wrote:
>>>> On 03/13/2018 02:03 PM, Muhammad Sharfuddin wrote:
>>>>> Hi,
>>>>>
>>>>> 1 - if I put a node(node2) offline; ocfs2 resources keep running on
>>>>> online node(node1)
>>>>>
>>>>> 2 - while node2 was offline, via cluster I stop/start the ocfs2
>>>>> resource group successfully so many times in a row.
>>>>>
>>>>> 3 - while node2 was offline; I restart the pacemaker service on the
>>>>> node1 and then tries to start the ocfs2 resource group, dlm started
>>>>> but ocfs2 file system resource does not start.
>>>>>
>>>>> Nutshell:
>>>>>
>>>>> a - both nodes must be online to start the ocfs2 resource.
>>>>>
>>>>> b - if one crashes or offline(gracefully) ocfs2 resource keeps
>>>>> running
>>>>> on the other/surviving node.
>>>>>
>>>>> c - while one node was offline, we can stop/start the ocfs2 resource
>>>>> group on the surviving node but if we stops the pacemaker service,
>>>>> then ocfs2 file system resource does not start with the following
>>>>> info
>>>>> in the logs:
>>>> >From the logs I would say startup of dlm_controld times out
>>>> because it
>>>> is waiting
>>>> for quorum - which doesn't happen because of wait-for-all.
>>>> Question is if you really just stopped pacemaker or if you stopped
>>>> corosync as well.
>>>> In the latter case I would say it is the expected behavior.
>>>>
>>>> Regards,
>>>> Klaus
>>>>  
>>>>> lrmd[4317]:   notice: executing - rsc:p-fssapmnt action:start
>>>>> call_id:53
>>>>> Filesystem(p-fssapmnt)[5139]: INFO: Running start for
>>>>> /dev/mapper/sapmnt on /sapmnt
>>>>> kernel: [  706.162676] dlm: Using TCP for communications
>>>>> kernel: [  706.162916] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining
>>>>> the lockspace group...
>>>>> dlm_controld[5105]: 759 fence work wait for quorum
>>>>> dlm_controld[5105]: 764 BFA9FF042AA045F4822C2A6A06020EE9 wait for
>>>>> quorum
>>>>> lrmd[4317]:  warning: p-fssapmnt_start_0 process (PID 5139) timed out
>>>>> lrmd[4317]:  warning: p-fssapmnt_start_0:5139 - timed out after
>>>>> 60000ms
>>>>> lrmd[4317]:   notice: finished - rsc:p-fssapmnt action:start
>>>>> call_id:53 pid:5139 exit-code:1 exec-time:60002ms queue-time:0ms
>>>>> kernel: [  766.056514] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group
>>>>> event done -512 0
>>>>> kernel: [  766.056528] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group
>>>>> join failed -512 0
>>>>> crmd[4320]:   notice: Result of stop operation for p-fssapmnt on
>>>>> pipci001: 0 (ok)
>>>>> crmd[4320]:   notice: Initiating stop operation dlm_stop_0 locally on
>>>>> pipci001
>>>>> lrmd[4317]:   notice: executing - rsc:dlm action:stop call_id:56
>>>>> dlm_controld[5105]: 766 shutdown ignored, active lockspaces
>>>>> lrmd[4317]:  warning: dlm_stop_0 process (PID 5326) timed out
>>>>> lrmd[4317]:  warning: dlm_stop_0:5326 - timed out after 100000ms
>>>>> lrmd[4317]:   notice: finished - rsc:dlm action:stop call_id:56
>>>>> pid:5326 exit-code:1 exec-time:100003ms queue-time:0ms
>>>>> crmd[4320]:    error: Result of stop operation for dlm on pipci001:
>>>>> Timed Out
>>>>> crmd[4320]:  warning: Action 15 (dlm_stop_0) on pipci001 failed
>>>>> (target: 0 vs. rc: 1): Error
>>>>> crmd[4320]:   notice: Transition aborted by operation dlm_stop_0
>>>>> 'modify' on pipci001: Event failed
>>>>> crmd[4320]:  warning: Action 15 (dlm_stop_0) on pipci001 failed
>>>>> (target: 0 vs. rc: 1): Error
>>>>> pengine[4319]:   notice: Watchdog will be used via SBD if fencing is
>>>>> required
>>>>> pengine[4319]:   notice: On loss of CCM Quorum: Ignore
>>>>> pengine[4319]:  warning: Processing failed op stop for dlm:0 on
>>>>> pipci001: unknown error (1)
>>>>> pengine[4319]:  warning: Processing failed op stop for dlm:0 on
>>>>> pipci001: unknown error (1)
>>>>> pengine[4319]:  warning: Cluster node pipci001 will be fenced: dlm:0
>>>>> failed there
>>>>> pengine[4319]:  warning: Processing failed op start for p-fssapmnt:0
>>>>> on pipci001: unknown error (1)
>>>>> pengine[4319]:   notice: Stop of failed resource dlm:0 is implicit
>>>>> after pipci001 is fenced
>>>>> pengine[4319]:   notice:  * Fence pipci001
>>>>> pengine[4319]:   notice: Stop    sbd-stonith#011(pipci001)
>>>>> pengine[4319]:   notice: Stop    dlm:0#011(pipci001)
>>>>> crmd[4320]:   notice: Requesting fencing (reboot) of node pipci001
>>>>> stonith-ng[4316]:   notice: Client crmd.4320.4c2f757b wants to fence
>>>>> (reboot) 'pipci001' with device '(any)'
>>>>> stonith-ng[4316]:   notice: Requesting peer fencing (reboot) of
>>>>> pipci001
>>>>> stonith-ng[4316]:   notice: sbd-stonith can fence (reboot) pipci001:
>>>>> dynamic-list
>>>>>
>>>>>
>>>>> -- 
>>>>> Regards,
>>>>> Muhammad Sharfuddin | +923332144823 | nds.com.pk
>>>>>
>>>>> On 3/13/2018 1:04 PM, Ulrich Windl wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I'd recommend this:
>>>>>> Cleanly boot your nodes, avoiding any manual operation with cluster
>>>>>> resources. Keep the logs.
>>>>>> Then start your tests, keeping the logs for each.
>>>>>> Try to fix issues by reading the logs and adjusting the cluster
>>>>>> configuration, and not by starting commands that the cluster should
>>>>>> start.
>>>>>>
>>>>>> We had an 2-node OCFS2 cluster running for quite some time with
>>>>>> SLES11, but now the cluster is three nodes. To me the output of
>>>>>> "crm_mon -1Arfj" combined with having set record-pending=true was
>>>>>> very valuable finding problems.
>>>>>>
>>>>>> Regards,
>>>>>> Ulrich
>>>>>>
>>>>>>
>>>>>>>>> Muhammad Sharfuddin <M.Sharfuddin at nds.com.pk> schrieb am
>>>>>>>>> 13.03.2018 um 08:43 in
>>>>>> Nachricht <7b773ae9-4209-d246-b5c0-2c8b67e623b3 at nds.com.pk>:
>>>>>>> Dear Klaus,
>>>>>>>
>>>>>>> If I understand you properly then, its a fencing issue, and
>>>>>>> whatever I
>>>>>>> am facing is "natural" or "by-design" in a two node cluster where
>>>>>>> quorum
>>>>>>> is incomplete.
>>>>>>>
>>>>>>> I am quite convinced that you have pointed out right because,
>>>>>>> when I
>>>>>>> start the dlm resource via cluster and then tries to start the
>>>>>>> ocfs2
>>>>>>> file system manually from command line, mount command remains
>>>>>>> hanged
>>>>>>> and
>>>>>>> following events are reported in the logs:
>>>>>>>
>>>>>>>         kernel: [62622.864828] ocfs2: Registered cluster interface
>>>>>>> user
>>>>>>>         kernel: [62622.884427] dlm: Using TCP for communications
>>>>>>>         kernel: [62622.884750] dlm:
>>>>>>> BFA9FF042AA045F4822C2A6A06020EE9:
>>>>>>> joining the lockspace group...
>>>>>>>         dlm_controld[17655]: 62627 fence work wait for quorum
>>>>>>>         dlm_controld[17655]: 62680 BFA9FF042AA045F4822C2A6A06020EE9
>>>>>>> wait
>>>>>>> for quorum
>>>>>>>
>>>>>>> and then following messages keep reported every 5-10 minutes,
>>>>>>> till I
>>>>>>> kill the mount.ocfs2 process:
>>>>>>>
>>>>>>>         dlm_controld[17655]: 62627 fence work wait for quorum
>>>>>>>         dlm_controld[17655]: 62680 BFA9FF042AA045F4822C2A6A06020EE9
>>>>>>> wait
>>>>>>> for quorum
>>>>>>>
>>>>>>> I am also very much confused, because yesterday I did the same and
>>>>>>> was
>>>>>>> able to mount the ocfs2 file system manually from command line(at
>>>>>>> least
>>>>>>> once), and then unmount the file system manually stop the dlm
>>>>>>> resource
>>>>>>> from cluster and then complete ocfs2 resource stack(dlm, file
>>>>>>> systems)
>>>>>>> start/stop successfully via cluster even when only machine was
>>>>>>> online.
>>>>>>>
>>>>>>> In a two-node cluster, which have ocfs2 resources, we can't run the
>>>>>>> ocfs2 resources when quorum is incomplete(one node is offline) ?
>>>>>>>
>>>>>>> -- 
>>>>>>> Regards,
>>>>>>> Muhammad Sharfuddin
>>>>>>>
>>>>>>> On 3/12/2018 5:58 PM, Klaus Wenninger wrote:
>>>>>>>> On 03/12/2018 01:44 PM, Muhammad Sharfuddin wrote:
>>>>>>>>> Hi Klaus,
>>>>>>>>>
>>>>>>>>> primitive sbd-stonith stonith:external/sbd \
>>>>>>>>>             op monitor interval=3000 timeout=20 \
>>>>>>>>>             op start interval=0 timeout=240 \
>>>>>>>>>             op stop interval=0 timeout=100 \
>>>>>>>>>             params sbd_device="/dev/mapper/sbd" \
>>>>>>>>>             meta target-role=Started
>>>>>>>> Makes more sense now.
>>>>>>>> Using pcmk_delay_max would probably be useful here
>>>>>>>> to prevent a fence-race.
>>>>>>>> That stonith-resource was not in your resource-list below ...
>>>>>>>>
>>>>>>>>> property cib-bootstrap-options: \
>>>>>>>>>             have-watchdog=true \
>>>>>>>>>             stonith-enabled=true \
>>>>>>>>>             no-quorum-policy=ignore \
>>>>>>>>>             stonith-timeout=90 \
>>>>>>>>>             startup-fencing=true
>>>>>>>> You've set no-quorum-policy=ignore for pacemaker.
>>>>>>>> Whether this is a good idea or not in your setup is
>>>>>>>> written on another page.
>>>>>>>> But isn't dlm directly interfering with corosync so
>>>>>>>> that it would get the quorum state from there?
>>>>>>>> As you have 2-node set probably on a 2-node-cluster
>>>>>>>> this would - after both nodes down - wait for all
>>>>>>>> nodes up first.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Klaus
>>>>>>>>
>>>>>>>>> # ps -eaf |grep sbd
>>>>>>>>> root      6129     1  0 17:35 ?        00:00:00 sbd: inquisitor
>>>>>>>>> root      6133  6129  0 17:35 ?        00:00:00 sbd: watcher:
>>>>>>>>> /dev/mapper/sbd - slot: 1 - uuid:
>>>>>>>>> 6e80a337-95db-4608-bd62-d59517f39103
>>>>>>>>> root      6134  6129  0 17:35 ?        00:00:00 sbd: watcher:
>>>>>>>>> Pacemaker
>>>>>>>>> root      6135  6129  0 17:35 ?        00:00:00 sbd: watcher:
>>>>>>>>> Cluster
>>>>>>>>>
>>>>>>>>> This cluster does not start ocfs2 resources when I first
>>>>>>>>> intentionally
>>>>>>>>> crashed(reboot) both the nodes, then try to start ocfs2 resource
>>>>>>>>> while
>>>>>>>>> one node is  offline.
>>>>>>>>>
>>>>>>>>> To fix the issue, I have one permanent solution, bring the other
>>>>>>>>> node(offline) online and things get fixed automatically, i.e
>>>>>>>>> ocfs2
>>>>>>>>> resources mounts.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Regards,
>>>>>>>>> Muhammad Sharfuddin
>>>>>>>>>
>>>>>>>>> On 3/12/2018 5:25 PM, Klaus Wenninger wrote:
>>>>>>>>>> Hi Muhammad!
>>>>>>>>>>
>>>>>>>>>> Could you be a little bit more elaborate on your fencing-setup!
>>>>>>>>>> I read about you using SBD but I don't see any
>>>>>>>>>> sbd-fencing-resource.
>>>>>>>>>> For the case you wanted to use watchdog-fencing with SBD this
>>>>>>>>>> would require stonith-watchdog-timeout property to be set.
>>>>>>>>>> But watchdog-fencing relies on quorum (without 2-node trickery)
>>>>>>>>>> and thus wouldn't work on a 2-node-cluster anyway.
>>>>>>>>>>
>>>>>>>>>> Didn't read through the whole thread - so I might be missing
>>>>>>>>>> something ...
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Klaus
>>>>>>>>>>
>>>>>>>>>> On 03/12/2018 12:51 PM, Muhammad Sharfuddin wrote:
>>>>>>>>>>> Hello Gang,
>>>>>>>>>>>
>>>>>>>>>>> as informed, previously cluster was fixed to start the ocfs2
>>>>>>>>>>> resources by
>>>>>>>>>>>
>>>>>>>>>>> a) crm resource start dlm
>>>>>>>>>>>
>>>>>>>>>>> b) mount/umount the ocfs2 file system manually. (this step was
>>>>>>>>>>> the
>>>>>>>>>>> fix)
>>>>>>>>>>>
>>>>>>>>>>> and then starting the clone group(which include dlm, ocfs2 file
>>>>>>>>>>> systems) worked fine:
>>>>>>>>>>>
>>>>>>>>>>> c) crm resource start base-clone.
>>>>>>>>>>>
>>>>>>>>>>> Now I crash the nodes intentionally and then keep only one node
>>>>>>>>>>> online, again cluster stopped starting the ocfs2 resources. I
>>>>>>>>>>> again
>>>>>>>>>>> tried to follow your instructions i.e
>>>>>>>>>>>
>>>>>>>>>>> i) crm resource start dlm
>>>>>>>>>>>
>>>>>>>>>>> then try to mount the ocfs2 file system manually which got
>>>>>>>>>>> hanged this
>>>>>>>>>>> time(previously manually mounting helped me):
>>>>>>>>>>>
>>>>>>>>>>> # cat /proc/3966/stack
>>>>>>>>>>> [<ffffffffa039f18e>] do_uevent+0x7e/0x200 [dlm]
>>>>>>>>>>> [<ffffffffa039fe0a>] new_lockspace+0x80a/0xa70 [dlm]
>>>>>>>>>>> [<ffffffffa03a02d9>] dlm_new_lockspace+0x69/0x160 [dlm]
>>>>>>>>>>> [<ffffffffa038e758>] user_cluster_connect+0xc8/0x350
>>>>>>>>>>> [ocfs2_stack_user]
>>>>>>>>>>> [<ffffffffa03c2872>] ocfs2_cluster_connect+0x192/0x240
>>>>>>>>>>> [ocfs2_stackglue]
>>>>>>>>>>> [<ffffffffa045eefc>] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>>>>>>>>>> [<ffffffffa04a9983>] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>>>>>>>>>> [<ffffffff8120e130>] mount_bdev+0x1a0/0x1e0
>>>>>>>>>>> [<ffffffff8120ea1a>] mount_fs+0x3a/0x170
>>>>>>>>>>> [<ffffffff81228bf2>] vfs_kern_mount+0x62/0x110
>>>>>>>>>>> [<ffffffff8122b123>] do_mount+0x213/0xcd0
>>>>>>>>>>> [<ffffffff8122bed5>] SyS_mount+0x85/0xd0
>>>>>>>>>>> [<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>>>>>
>>>>>>>>>>> I killed the mount.ocfs2 process stop(crm resource stop dlm)
>>>>>>>>>>> the
>>>>>>>>>>> dlm
>>>>>>>>>>> process, and then try to start(crm resource start dlm) the
>>>>>>>>>>> dlm(which
>>>>>>>>>>> previously always get started successfully), this time dlm
>>>>>>>>>>> didn't
>>>>>>>>>>> start and I checked the dlm_controld process
>>>>>>>>>>>
>>>>>>>>>>> cat /proc/3754/stack
>>>>>>>>>>> [<ffffffff8121dc55>] poll_schedule_timeout+0x45/0x60
>>>>>>>>>>> [<ffffffff8121f0bc>] do_sys_poll+0x38c/0x4f0
>>>>>>>>>>> [<ffffffff8121f2dd>] SyS_poll+0x5d/0xe0
>>>>>>>>>>> [<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>>>>>
>>>>>>>>>>> Nutshell:
>>>>>>>>>>>
>>>>>>>>>>> 1 - this cluster is configured to run when single node is
>>>>>>>>>>> online
>>>>>>>>>>>
>>>>>>>>>>> 2 - this cluster does not start the ocfs2 resources after a
>>>>>>>>>>> crash when
>>>>>>>>>>> only one node is online.
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Muhammad Sharfuddin | +923332144823 | nds.com.pk
>>>>>>>>>>>
>>>>>>>>>>> On 3/12/2018 12:41 PM, Gang He wrote:
>>>>>>>>>>>>> Hello Gang,
>>>>>>>>>>>>>
>>>>>>>>>>>>> to follow your instructions, I started the dlm resource via:
>>>>>>>>>>>>>
>>>>>>>>>>>>>            crm resource start dlm
>>>>>>>>>>>>>
>>>>>>>>>>>>> then mount/unmount the ocfs2 file system manually..(which
>>>>>>>>>>>>> seems to be
>>>>>>>>>>>>> the fix of the situation).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now resources are getting started properly on a single node..
>>>>>>>>>>>>> I am
>>>>>>>>>>>>> happy
>>>>>>>>>>>>> as the issue is fixed, but at the same time I am lost because
>>>>>>>>>>>>> I have
>>>>>>>>>>>>> no idea
>>>>>>>>>>>>>
>>>>>>>>>>>>> how things get fixed here(merely by mounting/unmounting the
>>>>>>>>>>>>> ocfs2
>>>>>>>>>>>>> file
>>>>>>>>>>>>> systems)
>>>>>>>>>>>> >From your description.
>>>>>>>>>>>> I just wonder  the DLM resource does not work normally under
>>>>>>>>>>>> that
>>>>>>>>>>>> situation.
>>>>>>>>>>>> Yan/Bin, do you have any comments about two-node cluster?
>>>>>>>>>>>> which
>>>>>>>>>>>> configuration settings will affect corosync quorum/DLM ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Gang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Muhammad Sharfuddin
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/12/2018 10:59 AM, Gang He wrote:
>>>>>>>>>>>>>> Hello Muhammad,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Usually, ocfs2 resource startup failure is caused by mount
>>>>>>>>>>>>>> command
>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>> (or hanged).
>>>>>>>>>>>>>> The sample debugging method is,
>>>>>>>>>>>>>> remove ocfs2 resource from crm first,
>>>>>>>>>>>>>> then mount this file system manually, see if the mount
>>>>>>>>>>>>>> command
>>>>>>>>>>>>>> will be
>>>>>>>>>>>>> timeout or hanged.
>>>>>>>>>>>>>> If this command is hanged, please watch where is mount.ocfs2
>>>>>>>>>>>>>> process hanged
>>>>>>>>>>>>> via "cat /proc/xxx/stack" command.
>>>>>>>>>>>>>> If the back trace is stopped at DLM kernel module, usually
>>>>>>>>>>>>>> the root
>>>>>>>>>>>>>> cause is
>>>>>>>>>>>>> cluster configuration problem.
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Gang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 3/12/2018 7:32 AM, Gang He wrote:
>>>>>>>>>>>>>>>> Hello Muhammad,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this problem is not in ocfs2, the cause looks
>>>>>>>>>>>>>>>> like the
>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>> quorum is missed.
>>>>>>>>>>>>>>>> For two-node cluster (does not three-node cluster), if one
>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>> is offline,
>>>>>>>>>>>>>>> the quorum will be missed by default.
>>>>>>>>>>>>>>>> So, you should configure two-node related quorum setting
>>>>>>>>>>>>>>>> according to the
>>>>>>>>>>>>>>> pacemaker manual.
>>>>>>>>>>>>>>>> Then, DLM can work normal, and ocfs2 resource can start
>>>>>>>>>>>>>>>> up.
>>>>>>>>>>>>>>> Yes its configured accordingly, no-quorum is set to
>>>>>>>>>>>>>>> "ignore".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> property cib-bootstrap-options: \
>>>>>>>>>>>>>>>                  have-watchdog=true \
>>>>>>>>>>>>>>>                  stonith-enabled=true \
>>>>>>>>>>>>>>>                  stonith-timeout=80 \
>>>>>>>>>>>>>>>                  startup-fencing=true \
>>>>>>>>>>>>>>>                  no-quorum-policy=ignore
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Gang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This two node cluster starts resources when both nodes
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> online but
>>>>>>>>>>>>>>>>> does not start the ocfs2 resources
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> when one node is offline. e.g if I gracefully stop the
>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>> resources
>>>>>>>>>>>>>>>>> then stop the pacemaker service on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> either node, and try to start the ocfs2 resource on the
>>>>>>>>>>>>>>>>> online
>>>>>>>>>>>>>>>>> node, it
>>>>>>>>>>>>>>>>> fails.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> logs:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> pipci001 pengine[17732]:   notice: Start
>>>>>>>>>>>>>>>>> dlm:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Start
>>>>>>>>>>>>>>>>> p-fssapmnt:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Start
>>>>>>>>>>>>>>>>> p-fsusrsap:0#011(pipci001)
>>>>>>>>>>>>>>>>> pipci001 pengine[17732]:   notice: Calculated
>>>>>>>>>>>>>>>>> transition 2,
>>>>>>>>>>>>>>>>> saving
>>>>>>>>>>>>>>>>> inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
>>>>>>>>>>>>>>>>> pipci001 crmd[17733]:   notice: Processing graph 2
>>>>>>>>>>>>>>>>> (ref=pe_calc-dc-1520613202-31) derived from
>>>>>>>>>>>>>>>>> /var/lib/pacemaker/pengine/pe-input-339.bz2
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Initiating start operation
>>>>>>>>>>>>>>>>> dlm_start_0
>>>>>>>>>>>>>>>>> locally on
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> lrmd[17730]:   notice: executing - rsc:dlm action:start
>>>>>>>>>>>>>>>>> call_id:69
>>>>>>>>>>>>>>>>> dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
>>>>>>>>>>>>>>>>> lrmd[17730]:   notice: finished - rsc:dlm action:start
>>>>>>>>>>>>>>>>> call_id:69
>>>>>>>>>>>>>>>>> pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Result of start operation for
>>>>>>>>>>>>>>>>> dlm on
>>>>>>>>>>>>>>>>> pipci001: 0 (ok)
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Initiating monitor operation
>>>>>>>>>>>>>>>>> dlm_monitor_60000
>>>>>>>>>>>>>>>>> locally on pipci001
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Initiating start operation
>>>>>>>>>>>>>>>>> p-fssapmnt_start_0
>>>>>>>>>>>>>>>>> locally on pipci001
>>>>>>>>>>>>>>>>> lrmd[17730]:   notice: executing - rsc:p-fssapmnt
>>>>>>>>>>>>>>>>> action:start
>>>>>>>>>>>>>>>>> call_id:71
>>>>>>>>>>>>>>>>> Filesystem(p-fssapmnt)[19052]: INFO: Running start for
>>>>>>>>>>>>>>>>> /dev/mapper/sapmnt on /sapmnt
>>>>>>>>>>>>>>>>> kernel: [ 4576.529938] dlm: Using TCP for communications
>>>>>>>>>>>>>>>>> kernel: [ 4576.530233] dlm:
>>>>>>>>>>>>>>>>> BFA9FF042AA045F4822C2A6A06020EE9:
>>>>>>>>>>>>>>>>> joining
>>>>>>>>>>>>>>>>> the lockspace group.
>>>>>>>>>>>>>>>>> dlm_controld[19019]: 4629 fence work wait for quorum
>>>>>>>>>>>>>>>>> dlm_controld[19019]: 4634
>>>>>>>>>>>>>>>>> BFA9FF042AA045F4822C2A6A06020EE9
>>>>>>>>>>>>>>>>> wait
>>>>>>>>>>>>>>>>> for quorum
>>>>>>>>>>>>>>>>> lrmd[17730]:  warning: p-fssapmnt_start_0 process (PID
>>>>>>>>>>>>>>>>> 19052)
>>>>>>>>>>>>>>>>> timed out
>>>>>>>>>>>>>>>>> kernel: [ 4636.418223] dlm:
>>>>>>>>>>>>>>>>> BFA9FF042AA045F4822C2A6A06020EE9:
>>>>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>> event done -512 0
>>>>>>>>>>>>>>>>> kernel: [ 4636.418227] dlm:
>>>>>>>>>>>>>>>>> BFA9FF042AA045F4822C2A6A06020EE9:
>>>>>>>>>>>>>>>>> group join
>>>>>>>>>>>>>>>>> failed -512 0
>>>>>>>>>>>>>>>>> lrmd[17730]:  warning: p-fssapmnt_start_0:19052 -
>>>>>>>>>>>>>>>>> timed out
>>>>>>>>>>>>>>>>> after 60000ms
>>>>>>>>>>>>>>>>> lrmd[17730]:   notice: finished - rsc:p-fssapmnt
>>>>>>>>>>>>>>>>> action:start
>>>>>>>>>>>>>>>>> call_id:71
>>>>>>>>>>>>>>>>> pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms
>>>>>>>>>>>>>>>>> kernel: [ 4636.420628] ocfs2: Unmounting device
>>>>>>>>>>>>>>>>> (254,1) on
>>>>>>>>>>>>>>>>> (node 0)
>>>>>>>>>>>>>>>>> crmd[17733]:    error: Result of start operation for
>>>>>>>>>>>>>>>>> p-fssapmnt on
>>>>>>>>>>>>>>>>> pipci001: Timed Out
>>>>>>>>>>>>>>>>> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on
>>>>>>>>>>>>>>>>> pipci001 failed
>>>>>>>>>>>>>>>>> (target: 0 vs. rc: 1): Error
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Transition aborted by operation
>>>>>>>>>>>>>>>>> p-fssapmnt_start_0 'modify' on pipci001: Event failed
>>>>>>>>>>>>>>>>> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on
>>>>>>>>>>>>>>>>> pipci001 failed
>>>>>>>>>>>>>>>>> (target: 0 vs. rc: 1): Error
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Transition 2 (Complete=5,
>>>>>>>>>>>>>>>>> Pending=0,
>>>>>>>>>>>>>>>>> Fired=0,
>>>>>>>>>>>>>>>>> Skipped=0, Incomplete=6,
>>>>>>>>>>>>>>>>> Source=/var/lib/pacemaker/pengine/pe-input-339.bz2):
>>>>>>>>>>>>>>>>> Complete
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Watchdog will be used via
>>>>>>>>>>>>>>>>> SBD if
>>>>>>>>>>>>>>>>> fencing is
>>>>>>>>>>>>>>>>> required
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: On loss of CCM Quorum: Ignore
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Processing failed op start for
>>>>>>>>>>>>>>>>> p-fssapmnt:0 on
>>>>>>>>>>>>>>>>> pipci001: unknown error (1)
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Processing failed op start for
>>>>>>>>>>>>>>>>> p-fssapmnt:0 on
>>>>>>>>>>>>>>>>> pipci001: unknown error (1)
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Forcing base-clone away from
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> 1000000 failures (max=2)
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Forcing base-clone away from
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> 1000000 failures (max=2)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Stop
>>>>>>>>>>>>>>>>> p-fssapmnt:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Calculated transition 3, saving
>>>>>>>>>>>>>>>>> inputs in
>>>>>>>>>>>>>>>>> /var/lib/pacemaker/pengine/pe-input-340.bz2
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Watchdog will be used via
>>>>>>>>>>>>>>>>> SBD if
>>>>>>>>>>>>>>>>> fencing is
>>>>>>>>>>>>>>>>> required
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: On loss of CCM Quorum: Ignore
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Processing failed op start for
>>>>>>>>>>>>>>>>> p-fssapmnt:0 on
>>>>>>>>>>>>>>>>> pipci001: unknown error (1)
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Processing failed op start for
>>>>>>>>>>>>>>>>> p-fssapmnt:0 on
>>>>>>>>>>>>>>>>> pipci001: unknown error (1)
>>>>>>>>>>>>>>>>> pengine[17732]:  warning: Forcing base-clone away from
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> 1000000 failures (max=2)
>>>>>>>>>>>>>>>>> pipci001 pengine[17732]:  warning: Forcing base-clone
>>>>>>>>>>>>>>>>> away
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> after 1000000 failures (max=2)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Stop
>>>>>>>>>>>>>>>>> p-fssapmnt:0#011(pipci001)
>>>>>>>>>>>>>>>>> pengine[17732]:   notice: Calculated transition 4, saving
>>>>>>>>>>>>>>>>> inputs in
>>>>>>>>>>>>>>>>> /var/lib/pacemaker/pengine/pe-input-341.bz2
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Processing graph 4
>>>>>>>>>>>>>>>>> (ref=pe_calc-dc-1520613263-36)
>>>>>>>>>>>>>>>>> derived from /var/lib/pacemaker/pengine/pe-input-341.bz2
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Initiating stop operation
>>>>>>>>>>>>>>>>> p-fssapmnt_stop_0
>>>>>>>>>>>>>>>>> locally on pipci001
>>>>>>>>>>>>>>>>> lrmd[17730]:   notice: executing - rsc:p-fssapmnt
>>>>>>>>>>>>>>>>> action:stop
>>>>>>>>>>>>>>>>> call_id:72
>>>>>>>>>>>>>>>>> Filesystem(p-fssapmnt)[19189]: INFO: Running stop for
>>>>>>>>>>>>>>>>> /dev/mapper/sapmnt
>>>>>>>>>>>>>>>>> on /sapmnt
>>>>>>>>>>>>>>>>> pipci001 lrmd[17730]:   notice: finished - rsc:p-fssapmnt
>>>>>>>>>>>>>>>>> action:stop
>>>>>>>>>>>>>>>>> call_id:72 pid:19189 exit-code:0 exec-time:83ms
>>>>>>>>>>>>>>>>> queue-time:0ms
>>>>>>>>>>>>>>>>> pipci001 crmd[17733]:   notice: Result of stop operation
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> p-fssapmnt
>>>>>>>>>>>>>>>>> on pipci001: 0 (ok)
>>>>>>>>>>>>>>>>> crmd[17733]:   notice: Initiating stop operation
>>>>>>>>>>>>>>>>> dlm_stop_0
>>>>>>>>>>>>>>>>> locally on
>>>>>>>>>>>>>>>>> pipci001
>>>>>>>>>>>>>>>>> pipci001 lrmd[17730]:   notice: executing - rsc:dlm
>>>>>>>>>>>>>>>>> action:stop
>>>>>>>>>>>>>>>>> call_id:74
>>>>>>>>>>>>>>>>> pipci001 dlm_controld[19019]: 4636 shutdown ignored,
>>>>>>>>>>>>>>>>> active
>>>>>>>>>>>>>>>>> lockspaces
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> resource configuration:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> primitive p-fssapmnt Filesystem \
>>>>>>>>>>>>>>>>>                  params device="/dev/mapper/sapmnt"
>>>>>>>>>>>>>>>>> directory="/sapmnt"
>>>>>>>>>>>>>>>>> fstype=ocfs2 \
>>>>>>>>>>>>>>>>>                  op monitor interval=20 timeout=40 \
>>>>>>>>>>>>>>>>>                  op start timeout=60 interval=0 \
>>>>>>>>>>>>>>>>>                  op stop timeout=60 interval=0
>>>>>>>>>>>>>>>>> primitive dlm ocf:pacemaker:controld \
>>>>>>>>>>>>>>>>>                  op monitor interval=60 timeout=60 \
>>>>>>>>>>>>>>>>>                  op start interval=0 timeout=90 \
>>>>>>>>>>>>>>>>>                  op stop interval=0 timeout=100
>>>>>>>>>>>>>>>>> clone base-clone base-group \
>>>>>>>>>>>>>>>>>                  meta interleave=true target-role=Started
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> cluster properties:
>>>>>>>>>>>>>>>>> property cib-bootstrap-options: \
>>>>>>>>>>>>>>>>>                  have-watchdog=true \
>>>>>>>>>>>>>>>>>                  stonith-enabled=true \
>>>>>>>>>>>>>>>>>                  stonith-timeout=80 \
>>>>>>>>>>>>>>>>>                  startup-fencing=true \
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Software versions:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> kernel version: 4.4.114-94.11-default
>>>>>>>>>>>>>>>>> pacemaker-1.1.16-4.8.x86_64
>>>>>>>>>>>>>>>>> corosync-2.3.6-9.5.1.x86_64
>>>>>>>>>>>>>>>>> ocfs2-kmp-default-4.4.114-94.11.3.x86_64
>>>>>>>>>>>>>>>>> ocfs2-tools-1.8.5-1.35.x86_64
>>>>>>>>>>>>>>>>> dlm-kmp-default-4.4.114-94.11.3.x86_64
>>>>>>>>>>>>>>>>> libdlm3-4.0.7-1.28.x86_64
>>>>>>>>>>>>>>>>> libdlm-4.0.7-1.28.x86_64
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Muhammad Sharfuddin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>> This email has been checked for viruses by Avast
>>>>>>>>>>>>>>>>> antivirus
>>>>>>>>>>>>>>>>> software.
>>>>>>>>>>>>>>>>> https://www.avast.com/antivirus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Muhammad Sharfuddin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>>
>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>>
>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>> Getting started:
>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started:
>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>> ---
>>>>>>>>> This email has been checked for viruses by Avast antivirus
>>>>>>>>> software.
>>>>>>>>> https://www.avast.com/antivirus
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>  
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Base Operating Systems

Red Hat

kwenning at redhat.com   




More information about the Users mailing list