Summary: Around 01:16 UTC this morning, our reporting systems were unavailable for about 25 minutes. Data collection was not affected and no data was lost.
Details: one of our database cluster systems had a hardware failure related to the network cards. On-call personnel were notified. The system automatically failed over to another database system as per design. However, the nature of the failure caused the cluster to not completely fail over. The on-call team corrected the issue and restored services.
Action items: We are going to address the hardware issue and research ways to prevent the cluster from not completing the fail-over process when this type of event occurs.