[ClusterLabs] pcs stonith fence - Error: unable to fence

Mon Jan 20 11:06:05 EST 2020

On Sat, 2020-01-18 at 22:20 +0000, Strahil Nikolov wrote:
> Sorry for the spam.
> I figured out that I forgot to specify the domain for the 'drbd1' and
> thus it has reacted like that.
> The strange thing is that pcs allows me to fence a node , that is not
> in the cluster :)
> 
> Do you think that this behaviour is a bug?
> If yes, I can open an issue to the upstream
> 
> 
> Best Regards,
> Strahil Nikolov

Leaving pcs out of the picture for a moment, from pacemaker's view the
stonith_admin command is just passing along what the user requested,
and the fencing daemon determines whether it's a valid request or not
and fails the request appropriately. So technically it's not a bug.

However I see two possible areas of improvement:

- The status display should show not just that the request failed, but
why. There is a project already planned to show why fencing was
initiated, so this would be a good addition to that. It's just a matter
of having developer time to do it.

- Since pcs is at a higher level than stonith_admin, it could
require "--force" if a given node isn't in the cluster configuration.
Feel free to file an upstream request for that.

> В неделя, 19 януари 2020 г., 00:01:11 ч. Гринуич+2, Strahil Nikolov <
> hunter86_bg at yahoo.com> написа: 
> 
> 
> 
> 
> 
> Hi All,
> 
> 
> I am building a test cluster with fence_rhevm stonith agent on RHEL
> 7.7 and oVirt 4.3.
> When I fenced drbd3 from drbd1 using 'pcs stonith fence drbd3' - the
> fence action was successfull.
> 
> So then I decided to test the fencing the opposite way and it
> partially failed.
> 
> 
> 1. in oVirt the machine was powered off and then powered on properly
> - so the communication with the engine is OK
> 2. the command on drbd3 to fence drbd1 did stuck and then reported as
> failiure despite the VM was reset.
> 
> 
> 
> Now 'pcs status' is reporting the following:
> Failed Fencing Actions:
> * reboot of drbd1 failed: delegate=drbd3.localdomain,
> client=stonith_admin.1706, origin=drbd3.localdomain,
>    last-failed='Sat Jan 18 23:18:24 2020'
> 
> 
> 
> 
> My stonith is configured as follows:
> Stonith Devices: 
> Resource: ovirt_FENCE (class=stonith type=fence_rhevm) 
>  Attributes: ipaddr=engine.localdomain login=fencerdrbd at internal
> passwd=I_have_replaced_that pcmk_host_map=drbd1.localdomain:drbd1;drbd2.localdomain:drbd2;drbd3.localdomain:drbd
> 3 power_wait=3 ssl=1 ssl_secure=1 
>  Operations: monitor interval=60s (ovirt_FENCE-monitor-interval-60s) 
> Fencing Levels:
> 
> 
> 
> Do I need to add some other settings to the fence_rhevm stonith agent
> ?
> 
> 
> Manually running the status command from drbd2/drbd3 is OK:
> 
> 
> [root at drbd3 ~]# fence_rhevm -o status --ssl --ssl-secure -a
> engine.localdomain --username='fencerdrbd at internal'  
> --password=I_have_replaced_that -n drbd1 
> Status: ON
> 
> I'm attaching the logs from the drbd2 (DC) and drbd3.
> 
> 
> Thanks in advance for your suggestions.
> 
> 
> Best Regards,
> Strahil Nikolov
-- 
Ken Gaillot <kgaillot at redhat.com>