This incident will be marked as resolved for now, however, we will continue monitoring and analyzing the issue internally. We have identified a combination of an invalid requeue mechanism attempting reprocessing on a corrupted image, which should have stopped processing after three attempts but a transient error in our queue connection seemed to have prevented the attempt counter from doing it's job. We have mechanisms in place to automatically recover from this by autoscaling according to the queue size, however it did not fire during the event (it does now). We must investigate the root cause, but as far as we can tell the system is fully back to normal.
May 1, 09:02 EDT
An aggressive forced scale operation have now brought uploads processing to normal performance. Root cause was not yet identified, we will continue investigating to avoid another degradation.
May 1, 06:50 EDT
The uploads queue was processed but there are still issues with upload. We are investigating and will report as we have new information.
May 1, 06:37 EDT
An autoscale algorithm did not correctly fire under load, resulting in high processing queues. We have manually increased capacity to mitigate the problem. Uploads should now work correctly, however the root cause was not yet identified.
May 1, 06:33 EDT
Uploads are currently not processing, we will update as soon as we have new information.
May 1, 06:31 EDT
We have detected slow uploads, we are investigating the issue
May 1, 06:17 EDT