[ClusterLabs] Antw: Re: big trouble with a DRBD resource

Thu Aug 10 08:20:03 EDT 2017

>>> Lars Ellenberg <lars.ellenberg at linbit.com> schrieb am 10.08.2017 um 14:11
in
Nachricht <20170810121025.GB22663 at soda.linbit>:
> On Wed, Aug 09, 2017 at 06:48:01PM +0200, Lentes, Bernd wrote:
>> 
>> 
>> ----- Am 8. Aug 2017 um 15:36 schrieb Lars Ellenberg
lars.ellenberg at linbit.com:
>>  
>> > crm shell in "auto-commit"?
>> > never seen that.
>> 
>> i googled for "crmsh autocommit pacemaker" and found that: 
> https://github.com/ClusterLabs/crmsh/blob/master/ChangeLog 
>> See line 650. Don't know what that means.
>> > 
>> > You are sure you did not forget this necessary piece?
>> > ms WebDataClone WebData \
>> >    meta master-max="1" master-node-max="1" clone-max="2"
>> >    clone-node-max="1" notify="true"
>> 
>> I didn't come so far. I followed that guide 
>
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_fr

> om_Scratch/index.html#_configure_the_cluster_for_drbd),
>> but didn't use the shadow cib.
> 
> if you use crmsh "interactively",
> crmsh does implicitly use a shadow cib,
> and will only commit changes once you "commit",
> see "crm configure help commit"
> 
> At least that's my experience with crmsh for the last nine years or so.

I think the point is: If you work from inside "crm configure", then you need a
commit before exiting. If you provide the complete line (from the shell) you
obviously don't.

Regards,
Ulrich

> 
>> The cluster is in testing, not in production, so i thought "nothing
>> severe can happen". Misjudged. My error.
>> After configuring the primitive without the ms clone my resource
>> ClusterMon reacted promptly and sent 20000 snmp traps to my management
>> station in 193 seconds, which triggered 20000 e-Mails ...
>> I understand now that the cluster missed the ms clone configuration.
>> But so much traps in such a short period. Is that intended ? Or a bug ?
> 
> If you configure a resource to fail immediately,
> but in a way that pacemaker thinks can be "recovered" from
> by stoping and restarting, then pacemaker will do so.
> If that results in 20000 "actions" within 192 seconds,
> that's 100 actions per second, then that seems "quick",
> but not a bug per se.
> if every single such action triggers a trap,
> because you configured the system to send traps for every action,
> that's yet a different thing.
> 
> So what now?
> Where exactly is the "big trouble with DRBD"?
> Someone was "almost" following some tutorial, and got in trouble.
> 
> How could we keep that from happening to the next person?
> Any suggestions which component or behavior we should improve, and how?
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> : R&D, Integration, Ops, Consulting, Support
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org