Delay to data updates in Answers, Pages, Live API
Incident Report for Yext
Postmortem

Summary

On March 30th, beginning at 4:42 p.m. ET, Yext engineers began tracking degradations in our backend events system. These degradations, caused by high loads, delayed the propagation of data updates from the Knowledge Graph to downstream services such as Answers, Pages, and the Live API. Mitigations were implemented to alleviate server load, and the system recovered to an operational state at 8:25 p.m. ET. At this point, the system was allowed to process the backlog of data updates.

The following morning, on March 31st, the events system suffered a critical failure at 4:35 a.m. ET, causing outages in the Customer Portal. Engineers restored the system and all services at 5:23 a.m. ET.

Remediation

We are immediately prioritizing previously planned work to upgrade our events system for greater resilience under high loads, and improved monitoring work is already underway to give us more visibility for the system.

Posted Apr 08, 2021 - 08:53 EDT

Resolved
This incident has been resolved.
Posted Mar 31, 2021 - 10:38 EDT
Monitoring
Pending data updates have been applied and we will monitor overnight for any additional issues.
Posted Mar 31, 2021 - 01:25 EDT
Update
Mitigations have been implemented for our events system and we are beginning to process the backlog of events. We will be monitoring closely as events are processed for any regressions in the system.
Posted Mar 30, 2021 - 21:41 EDT
Update
We are continuing to work on mitigations for our backend events system, and have also identified additional impact on our Self Serve flows and Configuration as Code systems. We will update as soon as we have more information.
Posted Mar 30, 2021 - 18:44 EDT
Identified
Starting about 15 minutes ago, our backend events system entered a degraded mode of operation due to elevated load. Data updates made to the Yext Knowledge Graph may not have propagated to Answers, Pages, or the Live API. Engineers are working to mitigate the issue and apply all pending data updates.
Posted Mar 30, 2021 - 16:59 EDT
This incident affected: Content (Content API) and Pages (Pages Generation).