- Oct 26, 2023
-
-
Kurt Garloff authored
* Add user systemd unit and tmux start scripts. Even with some docu. * Improve docu. * Improve shutdown logic. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Oct 07, 2023
-
-
Kurt Garloff authored
* Delete keypair by name if creation failed. An laready existing keypair by that name is the most common reason for the failure, so cleaning up just in case is a good idea. Of course, there should not have been a left-over key ... * Also look for left-over keypairs in regular cleanups. * Better output for KEYPAIR creation. * Use all AZs for wavestack (default). * 1.92: Use az hints for VM networks. This should allow neutron to optimize things a bit. The VMs in a network are all in one AZ, so let neutron create that network in the right AZ. Can be disabled by passing NAZS=" ". Also call the one and only NET_JH and not NET_JH0. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Jul 04, 2023
-
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Use timeout binary rather than self-coded mech. * Also use diskless on gxscs. * Wrapper mytimeout for timeout. This is needed as myopenstack is a shell function and can not be called by timeout. * Also fix timeout detection ($? >= 124). timeout returns 124 for a normal timeout. Previously, we expected > 128 (128+SIGNUM). Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Use diskless flavors in drivers for PCO, GXSCS, WAVE. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Jul 03, 2023
-
-
Kurt Garloff authored
* Add logic to patch openstackclient to support --block-device. * Determine whether we NEED_BLKDEV and use it. * 1.90. Fix syntax errors. * Translate --block-device args from nova to openstack. * Use nova boot workaround if opentackclient is very old. * Default to Ubuntu 22.04 img and diskless flavors. * Avoid sed script to patch more than a line. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Jun 29, 2023
-
-
Kurt Garloff authored
* Generate ssh key with ssh-keygen (ed25519). Generation of key pairs in openstack is deprecated; so we do it locally and upload the pubkey. * Use new filename for keypair everywhere. * Make keypair type configurable, default rsa. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Ralf Heiringhoff authored
Signed-off-by:
Ralf Heiringhoff <ralf.heiringhoff@plusserver.com>
-
- Mar 14, 2023
-
-
Kurt Garloff authored
* Allow setting a CA certificate or passing other params. export OS_EXTRA_PARAMS="--os-cacert FILE.crt" can be set if needed. (This allows running against a CiaB system.) We only use it when talking to the endpoints directly. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Mar 07, 2023
-
-
Christian Berendt authored
* replace redundant Containerfile with a symlink * use OPENSTACK_VERSION instead of VERSION for the used OpenStack version * use Zed instead of Xena Signed-off-by:
Christian Berendt <berendt@osism.tech>
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Allow selecting external net (there may be several). This is done by exporting EXTSEARCH to have a way to select a working external network. * Document existence of EXTSEARCH. * Improve comment as per @berendt's suggestion. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
In gx-scs currently, the Loadbalancer service is extremely slow. When createing the LB instance (with two amphorae), we time out in the creation API call (which does NOT wait for them to become active). So we end up testing without the LB test. However, the LBs eventually come up and we fail to expect this and fail to clean up. So more and more LBs are hanging around until the 200 iterations are over. We also fail to cleanup networks by consequence. Change this: In cleanup, if LBs are enabled at all, but there were errors, looks for LBs and try to delete them. So we stop collecting them and also avoid failed network and router cleanup. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Feb 17, 2023
-
-
Kurt Garloff authored
* Support other LB providers (ovn) with -LP. * Allow passing loadbalancer provider via -LP <PROVIDER> * This results in a TCP loadbalancer with algo SOURCE_IP_PORT (which is the only algo that the ovn provider supports) * When configuring members, we actually had ommitted the --subnet-id parameter previously, as it was unneeded and lead to a failure with token authentication. This needed changing, as the member subnets need to be set explicitly when using the OVN provider. Still: It does not (yet?) work. Maybe having the VIP (for the listener) and the member ports in different networks is not supported? Signed-off-by:
Kurt Garloff <kurt@garloff.de> * LB member subnet is JHSUBNET[0]. Explicitly setting the --subnet-id for amphorae to the backend member actually breaks the LB's connection to the backend members. So we set it to the subnet from the VIP; as we access the LB via the floating IP, the request does indeed originate from the LB's VIP address in the VIP subnet (which we set to the JHSUBNET[0]). When using ovn provider, this makes the LB work IF the requests come from a host in the JHSUBNET, but not from the floating IP. So currently, this breaks. Sidenote: health-monitor is not supported for ovn provider pools. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * SG in VM needs to allow port 80 from 0/0 for OVN. The requests come from the real client IP (which can be the internet, i.e. 0/0) and not the LB's virtual IP. Thus adjust the security group to allow for it. The OVN provider does not support the health monitor, unfortunately. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Adjust max cycle time if we don't kill LB members. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Improve warning when LB backend kills are skipped. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Set the member's subnet to the backend server subnets. This causes the health-monitor to detect members with error operating status. It does correctly take them out of the rotation when accessed via the VIP, unfortunately not via the Floating IP though. https://bugs.launchpad.net/neutron/+bug/1956035 The api_monitor.sh does thus report a few errors on each iteration. Signed-off-by:
Kurt Garloff <kurt@garloff.de> --------- Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Jan 10, 2023
-
-
Kurt Garloff authored
Previously, we tried to use several, confusing numerous followup commands. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Dec 09, 2022
-
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Use python to calc percentiles. This is a performance issue -- doing math in bash can easily take several seconds when we have API calls from a day. For longer runtimes things get worse ... So pass the stats to a little python script. * Bump version to 1.86. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
No idea why it got lost in commit b971b193 (from PR #107). Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
Cosmetic. This was missing from ed808bba. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Upload files to swift containers without path in name. * Fix APIMonitor log file upload trigger. We had the wrong filenames in mind, as we use a separate $DATADIR now, so the check failed. Also be more verbose when uploading. * Fix double DATADIR. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Dec 01, 2022
-
-
Kurt Garloff authored
* Fix port cleanup for leftover load-balancers. The grep expression previously did not match, causing port deletion for octavia-lb ports in our subnet to be skipped. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Also cleanup leftover octavia ports in cleanup. There, we also need to shift the security group deletion until after we have cleaned up load balancer ports, as LB ports could have security groups assigned. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Hit all octavia ports not just -vrrp. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Enable LBs in CLEANUP call. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Do octavia port cleanup unconditionally. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Medium old OSC reports Networks in a diff format. So OSC-5.5.0 reports (openstack server list -f json): [ { "ID": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "Name": "HealthMon-Host", "Networks": { "kd500924-SCS-healthmonitor-network": [ "192.168.0.17", "78.138.66.252" ] } } ] whereas older version OSC-5.3.1 reports [ { "ID": "XXXXXXXXXXXXXXXXXXXXXXXXX", "Name": "testkg2", "Networks": "testkg2=192.168.33.95, 31.172.117.55", } ] Adjust IP parser to handle both ... Signed-off-by:
Kurt Garloff <kurt@garloff.de> * A bit more debugging info (commented out). Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Handle neutron not reporting device_id. We currently match ports by looking for the VM UUID in the port's device_id. This does no longer seem to work reliably on PCO; the field is not returned by the port list command. So look for the IP address reported from server list and match it with the ports to determine port ID. Sort of ugly, but there is no straightforward way to get port <-> VM relationships without looking at port details or matching both subnet and IP address. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Simplify old code path. We already know that we only go here if we're using the legacy tooling. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Assign external net to router before creating VMs. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Revert "Assign external net to router before creating VMs." This reverts commit b0a1991f. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Assign external net to router before creating VMs. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Clean up router when external-net-list fails. Also, ignore when setting external gateway fails. (The reason is that this is not required on OTC and the error is thus harmless.) Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Bump version number to 1.85. The change (assigning the external net early) is significant enough that we want to see from the version number whether or not we have it. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Nov 30, 2022
-
-
Kurt Garloff authored
* Handle neutron not reporting device_id. We currently match ports by looking for the VM UUID in the port's device_id. This does no longer seem to work reliably on PCO; the field is not returned by the port list command. So look for the IP address reported from server list and match it with the ports to determine port ID. Sort of ugly, but there is no straightforward way to get port <-> VM relationships without looking at port details or matching both subnet and IP address. * Simplify old code path. We already know that we only go here if we're using the legacy tooling. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Report timing and errors of LB connections to grafana. * 3 digits LBconn time measurement. * Add LBconn to dashboard. * Change range for bench graph to include LBConn vals. * Multiply LB dur by ten, so graphs in grafana align better. * Range for bench chart 0.5--32. We may have iperf measurements below 1 (Gbps), so extend scale a bit downwards. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
* Only wait for JHPORT b/f assigning FIP if needed. This was required on OTC, and we can still force the waiting by setting the FIPWAITPORTDEVOWNER environment. Signed-off-by:
Kurt Garloff <kurt@garloff.de> * Fix syntax for new FIPWAITPORTDEVOWNER. Set for OTC. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Nov 28, 2022
-
-
Kurt Garloff authored
* Improve dashboard setup documentation. * Some additional hints for setting up the monitoring VM. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Nov 23, 2022
-
-
Kurt Garloff authored
My own 4GiB openSUSE image nicely fits the smallest standardized SCS flavor: SCS-1L:1:5. This reduces the resource consumption for the openStack health monitor in the PCO environment. Link to the image: https://kfg.images.obs-website.eu-de.otc.t-systems.com/ Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Nov 20, 2022
-
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Nov 16, 2022
-
-
Kurt Garloff authored
This is an attempt to work around a situation where the FIP assignment that happens during the JumpHost VM booting and downloading packages (triggered by cloud-init userdata) does disrupt communication for a while. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Oct 24, 2022
-
-
Kurt Garloff authored
* Retry removing LB after waitdel. When JH creation fails, we clean up things in reverse order. The LB is then typcally still in PENDING_CREATE, which means we can't delete it yet. Waiting it to vanish then won't work either. If that happens, we should try to delete it again and wait again. This addresses the issue observed in #96. Signed-off-by:
Kurt Garloff <scs@garloff.de> * Wait shorter b/f retrying LB delete. Also for the implementation to work, we need to unset LBDSTATS before deleting again (otherwise the array grows larger than the list of resources, causing the wait loop to not break once all resources are gone). When retrying, we wait 5s rather than 2s now, increasing the chances that re-deleting actually works. This fixes #96. Signed-off-by:
Kurt Garloff <scs@garloff.de> * Bump version to 1.83. Note to previous commit: Restoring LBAASS from DELLBAASS was key to make second calls to deleteLBs and waitdelLBs actually do something. This belongs to #96. Signed-off-by:
Kurt Garloff <scs@garloff.de> Signed-off-by:
Kurt Garloff <scs@garloff.de> Co-authored-by:
chschilling <c.schilling@gmx.net>
-
Kurt Garloff authored
Only the resources waited for in waitlistResources would be properly reported to telegraf/influx/grafana, there was a typo in the ones waited for in waitResources. This cause waitJHPORT not to be visible in grafana. Fixed. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Oct 21, 2022
-
-
Kurt Garloff authored
This results in defaults from your cloud being used, which often is a good choice. Use this for the wavestack cloud. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Sep 27, 2022
-
-
Kurt Garloff authored
* Use current screenshot in dashboard/README.md Screenshot courtesy of gx-scs environment from PlusServer. * Document how to interpret the dashboards. ... using an example from gx-scs. * Mention no filters. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Sep 19, 2022
-
-
Kurt Garloff authored
* Add pointer to dashboard dir, update usage output. * Fixup typos. Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Sep 15, 2022
-
-
Kurt Garloff authored
Our logic to filter only for our own subnets was broken. It now works. Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
Kurt Garloff authored
We might otherwise overlook things that are significantly wrong ... Signed-off-by:
Kurt Garloff <kurt@garloff.de> Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Sep 09, 2022
-
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-
- Sep 05, 2022
-
-
Kurt Garloff authored
Signed-off-by:
Kurt Garloff <kurt@garloff.de>
-