Multi-master leader changes
Created by: garloff
In multi-controlnode cluster setups, we see numerous cluster changes, leading to a ~dozen of arbitrary failed CNCF (sonobuoy) tests. This is due to etcd missing heartbeats with subsequent leader elections (acc. to the RAFT protocol). We have a disabled workaround in the cluster-template.yaml that changes the hearbeat interval from 100ms to 250ms (and the leader election timeout to 2.5s). This reduces the occurance of failed beats by a factor of two, which is not enough ...