Affected services:

  • Virtual servers DC2

Issues with Proxmox Cluster

Opened on Monday 27th January 2020, last updated

Resolved

This case has been resolved, all VMs moved to new cluster.

Posted by Andreas Haakonsen

Identified

We are continuously migrating servers to the new clusters. A few at a time at select times during the day to minimize impact on other VMs. This case will be marked as resolved when all VMs have been migrated.

Customers may experience 1-10 minutes outage when the server is moved, depending on your used disk size.

How do I know if my server has been migrated? If your node starts with pve and not pmx, then your server is in the new improved cluster.

Posted by Andreas Haakonsen

Identified

Migrated servers do not keep their snapshots. These are not possible for us to migrate unfortunately.

Posted by Andreas Haakonsen

Identified

More and more servers are being moved to the new, faster and much more reliable Cluster. Console feature has now been switched to work against the new cluster if you need to access your server after the migration.

Posted by Andreas Haakonsen

Identified

The Proxmox cluster has been stable the past 12 hours. We are migrating affected virtual servers to a new cluster gradually. If you experience your server performing a quick reboot this is the reason. Most migrations will be performed between 23.00 and 05.00 to minimize downtime.

We expect a much more stable operation on the new cluster, which is setup and configured differently.

Servers migrated to the new cluster will experience that the Console in ENIGMA is for the time being not working. This will be resolved as the larger part of VMs are migrated. You know your server has been migrated if the node name starts with pve*.

Posted by Andreas Haakonsen

Identified

It seems as we are not able to stabilize our Proxmox cluster. We are working on solutions to move affected servers to another cluster.

Posted by Andreas Haakonsen

Monitoring

We are upgrading affected nodes tonight 28/01 around 01:30 GMT+1. Nodes may require a restart which can reboot some VMs. We will mmigrate VMs before rebooting and you should be unaffected by this upgrade.

Posted by Andreas Haakonsen

Resolved

All VMs have booted and control panel functionality has been restored.

Posted by Andreas Haakonsen

Monitoring

If your VM is not responding, please contact us via a support ticket and we will take a look at it. There are a few servers that has not booted correctly.

Posted by Andreas Haakonsen

Monitoring

Most of the affected VMs have now booted. The rest is currently booting. I/O will be a bit slow the first hour as our storage resyncs.

Posted by Andreas Haakonsen

Monitoring

Nodes are being brought up one by one. We hope to have the whole cluster up within 16:30 GMT+1.

Posted by Andreas Haakonsen

Monitoring

Cluster is up and affected VMs are booting. We are monitoring the cluster.

Posted by Andreas Haakonsen

Identified

Affected hosts pmx1-16dc2 are currently booting and we expect the cluster to come back up shortly.

Posted by Andreas Haakonsen

Identified

HA is fencing nodes and we're seeing multiple nodes reboot. We are working to get all affected VMs up as quick as possible.

Posted by Andreas Haakonsen

Investigating

We're currently aware of the issue and are investigating the cause. We will provide further updates as we have them.

Posted by Andreas Haakonsen