On March 30th, beginning at 4:42 p.m. ET, Yext engineers began tracking degradations in our backend events system. These degradations, caused by high loads, delayed the propagation of data updates from the Knowledge Graph to downstream services such as Answers, Pages, and the Live API. Mitigations were implemented to alleviate server load, and the system recovered to an operational state at 8:25 p.m. ET. At this point, the system was allowed to process the backlog of data updates.
The following morning, on March 31st, the events system suffered a critical failure at 4:35 a.m. ET, causing outages in the Customer Portal. Engineers restored the system and all services at 5:23 a.m. ET.
We are immediately prioritizing previously planned work to upgrade our events system for greater resilience under high loads, and improved monitoring work is already underway to give us more visibility for the system.