Increased Errors in Knowledge Graph
Incident Report for Yext
Postmortem

Summary

On March 26, beginning at 11:41 AM, ET, and lasting until 11:50 AM, the Knowledge Graph experienced elevated error rates and page load failures.

Root Cause

An automated failover had occurred to handle hardware failures in our network system earlier in the day. When we restored the failed component, the system prematurely attempted to route traffic through the new component, resulting in a spike of connection errors. Once the restored component booted up successfully, error rates returned to normal.

Remediation

We have modified our process to ensure that the system does not prematurely route traffic through network components before they are ready.

Posted Apr 08, 2021 - 08:38 EDT

Resolved
This incident has been resolved.
Posted Mar 26, 2021 - 12:39 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Mar 26, 2021 - 12:03 EDT
Identified
The issue has been identified and a fix is being implemented.
Posted Mar 26, 2021 - 11:59 EDT
Investigating
We are currently investigating telemetry that indicates elevated error rates in the Knowledge Graph
Posted Mar 26, 2021 - 11:54 EDT
This incident affected: Customer Portal and Platform API (Knowledge and Administrative).