Some systems are experiencing issues

Stickied Incidents

Wednesday 16th January 2019

Issues with VPS Cluster

We are investigating an issue with our VPS Cluster.
Update 12:50: We are working to restore a clean state on the storage cluster after the crash.
Update 13:07: Servers are being brought up again. Expect to have all affected VMs up within 13:30.
Update 13:19: Around 70% of servers has now been brought up again.
Update 13:31: All servers has been booted.

Preliminary RFO
There are an watchdog issue on some nodes, causing them to fail fencing and reboot when an issue occurs in the network or cluster. These events are disturbing the whole cluster, since the storage is network distributed across all nodes. The Ceph storage cluster is quite resiliant and we have 3/2 replication. What this means is that all data on the storage is replicated at least 3 times, and we can loose up to 2/3 of the cluster, given the cluster has enough free space left to operate. But when multiple nodes reboot at once, a slowdown occurs. The cluster automatically booting 100 VMs at the same time does not help either!

We will look into how we can improve this and avoid reboots in the future. The cluster is now stable. Some customers may experience a slower performance for an hour or so while services are started .e.g on servers.

Past Incidents

Saturday 5th January 2019

No incidents reported

Friday 4th January 2019

Virtual servers outage

We are investigating an issue with our VPS cluster causing an outage.
Update 04.01.2019 19:12: We are continuing to investigate. Are able to reach all nodes in cluster. Storage seems to be okay.
Update 04.01.2019 19:15: VMs are being powered up now.
Update 04.01.2019 20:50: We are continuing to see instabilities. Investigating.
Update 04.01.2019 20:57: A large DDoS attack is causing instabilities on our VPS Clusters management, causing hosts to loose quorum and reboot. We are trying to mitigate the attack and move our management to avoid it being affected.
Update 04.01.2019 21:10: The attack has been mitigated. We are working to secure the infrastructure so that it can handle an attack without affecting other virtual servers on the same cluster. Servers are currently booting.

Thursday 3rd January 2019

No incidents reported

Wednesday 2nd January 2019

No incidents reported

Tuesday 1st January 2019

No incidents reported

Monday 31st December 2018

Issue with API

We have an issue with our API causing problems logging into our control panel and ordering services. We are looking into the issue and it will be resolved shortly.
Update 31.12.2018 15:48GMT+2: Issue has now been resolved.

Sunday 30th December 2018

No incidents reported