Production server hardware failure (automatic resolution)

Incident Report for Electric Imp

Resolved

At 1857 UTC a production server suffered a hardware failure and became unreachable. By 1859 UTC devices automatically detected this and began migration to other production servers.

All devices on the affected server regained connectivity to other hosts by 1900 UTC; these devices would see an agent restart as their VM. During the affected period (1857-1900 UTC) requests to agents on the affected hosts may have returned 502 errors.

A replacement server was re-added to the pool at 1925 UTC.

Posted Mar 07, 2017 - 19:27 UTC